Abstract:
Background: Tuberculosis is the leading cause of mortality among infectious
diseases worldwide. Evaluation of treatment outcome is used as a major indicator of
program quality performed by the health institutes. Since data mining can be applied
to explore interesting, useful and task oriented knowledge from huge amount of
data, this study implemented data mining to explore the pattern of tuberculosis and
to develop predictive model in relation to the treatment outcome.
Objective: To explore patterns from the tuberculosis data and develop predictive
model using data mining technology.
Methods: An open source data mining tool WEKA software was used in this study.
The study design was the standard procedure to data mining called Cross Industry
Standard process for Data Mining (CRISP-DM). A total of 4780 patient records were
taken for this study from the registration book of tuberculosis patients registered for
treatment in Debirebirhan hospital from October, 2001 to June, 2011.
Result: From the total 4780 registered patients 1320 (27.6%) were perform HIV test
and from those 468 (35.6%) were reactive for HIV. From pulmonary positive
tuberculosis cases 668 (51.5%) patients were performed sputum follow up test at 7th
month. The outcomes were cured 649 (50%), completed 1813 (37.9%), died 370 (7.7%),
failed 4 (0.3%), defaulted 458 (9.58%) and transferred out 1486 (31.1%). Multilayer
perceptron registers the highest accuracy of 85.8%. All the attributes used in this
study were considered as a predictor attributes to explore the pattern.
Conclusion and recommendation: All algorithms experimented in this study
showed a promising result. Sputum test result of 7th month for smear positive
patients was the most determinant predictor attribute for cured and failed classes.
Multilayer perceptron (MLP) was the best algorithm to classify and predict
tuberculosis data. The outcomes died, defaulted and failed classes accounted
17.4% which is serious problem as a public health concern. Further research will be
expected to be undertaken on large scale data and adding attributes like sign and
symptom of the patients.
1