Comparative Analysis of Integrating Multiple Filter-Based Feature Selection Methods Using Vector Magnitude Score on Text Classification

Ado, Abubakar; Bin Mat Deris, Mustafa; Binti Samsudin, Noor Azah; Abubakar Bichi, Abdulkadir; Aliyu, Ahmed

doi:10.46254/AN11.20210823

Track: Machine Learning

Abstract

High-dimensionality is one of the major problems that arise in text classification task. Usually, dimensional reduction techniques are used to reduce the feature dimensions to a minimum number without not or much affecting the classifiers’ performance. Among the techniques, filter-based is the widely used, aiming to select the informative features from the original features set. The filter methods proposed in literature falls into critical problem of being not much effective with respect to some datasets or classifiers. To overcome such issue, a number of works were presented combining different multiple filter methods. This approach improves classifiers' performance by maximizing the advantage of one method and minimizing the disadvantage of the other. In this paper, we studied the impact of combining multiple FS methods, comprising MI, Chi2, and t-test, on a text classification problem. V-score is adapted to combine and ranked the features produced by the chosen FS methods. Experiment is conducted on movie reviews dataset and classification accuracy is reported using NB and SVM. Both the methods were evaluated based on TFIDF and Count_vector feature representations. Experimental results demonstrate minor improvement in performance by combining two filter methods and no significant improvement by combining the three methods.

Comparative Analysis of Integrating Multiple Filter-Based Feature Selection Methods Using Vector Magnitude Score on Text Classification

Abubakar Ado, Mustafa Bin Mat Deris, Noor Azah Binti Samsudin, Abdulkadir Abubakar Bichi & Ahmed Aliyu

Publisher: IEOM Society International

Track: Machine Learning

Abstract

Related Research