Track: Machine Learning
Abstract
High-dimensionality is one of the major problems that arise in text classification task. Usually, dimensional reduction techniques are used to reduce the feature dimensions to a minimum number without not or much affecting the classifiers’ performance. Among the techniques, filter-based is the widely used, aiming to select the informative features from the original features set. The filter methods proposed in literature falls into critical problem of being not much effective with respect to some datasets or classifiers. To overcome such issue, a number of works were presented combining different multiple filter methods. This approach improves classifiers' performance by maximizing the advantage of one method and minimizing the disadvantage of the other. In this paper, we studied the impact of combining multiple FS methods, comprising MI, Chi2, and t-test, on a text classification problem. V-score is adapted to combine and ranked the features produced by the chosen FS methods. Experiment is conducted on movie reviews dataset and classification accuracy is reported using NB and SVM. Both the methods were evaluated based on TFIDF and Count_vector feature representations. Experimental results demonstrate minor improvement in performance by combining two filter methods and no significant improvement by combining the three methods.