Track: Data Analytics
Abstract
According to the World Health Organization report, 9 million people have been infected by active TB and about 1.5 to 2 million people lose their lives annually due to this disease. The study was conducted on 600 patients from Masih-e-Daneshvari tuberculosis research center. The K-Means clustering data mining algorithms and decision tree are used to perform the categorization and determining common indicators among patients. 3 clusters according to Dunn index were chosen as the optimal clusters. The cluster field added to data set and different decision trees used to find the highest accuracy. The C 5.0 tree has 97.6% accuracy. According the results of this study the most important factors identified are hemoglobin, age, sex, smoking, alcohol consumption and Creatinine. C 5.0 rules by 50% confidence are extracted.