Track: Data Analytics
Abstract
In this research, we examine if Word2Vec can be used as an input for deep learning in categorizing web news. Since each news site has its own categorization policy, we have to search target news at some categories. If we can retrieve target news according to our own categorization policy, it would not be necessary to search which category contains target news in each site. Therefore we categorize web news in this research by machine learning. For the analysis, we use Japanese text data delivered by Japanese web sites.
Bag-of-words is method of vectorial representation of word and it is often used as an input for text classification. Although the method is good for categorization because of higher accuracy; it has a problem in computational complexity in using a neural network. It is desirable to reduce dimension of input layer to resolve its problem. Thus, we propose to use Word2Vec for an input to reduce the dimension. Moreover, we examine the accuracy using the same input. Through the experiment, we found that it is practical to express words using Word2Vec as an input of deep learning for categorization document of web news.