According to the World Health Organization report, 9 million people have been infected by active TB and about 1.5 to 2 million people lose their lives annually due to this disease. The study was conducted on 600 patients from Masih-e-Daneshvari tuberculosis research center. The K-Means clustering data mining algorithms and decision tree are used to perform the categorization and determining common indicators among patients. 3 clusters according to Dunn index were chosen as the optimal clusters. The cluster field added to data set and different decision trees used to find the highest accuracy. The C 5.0 tree has 97.6% accuracy. According the results of this study the most important factors identified are hemoglobin, age, sex, smoking, alcohol consumption and Creatinine. C 5.0 rules by 50% confidence are extracted.
Track: Data Analytics
Published in: 6th Annual International Conference on Industrial Engineering and Operations Management, Kuala Lumpur, Malaysia
Publisher: IEOM Society International
Date of Conference: March 8
-10
, 2016
ISBN: 978-0-9855497-4-9
ISSN/E-ISSN: 2169-8767