Track: Undergraduate Research Competition
Abstract
Along with the evolution of information technology, various types of data can be gleaned with new methods. These comprise satellite images, Internet log information, text information exchanged through IoT (Internet of Things) devices, SNS, among others. Collectively, the gleaned data are called "alternative data." While a large amount of data is available, the characteristics of most of these have yet to be determined. Here, the characteristics mean survey errors intruded by the survey method and biases due to the sampling method from the sample population. Statistical data and settlement data generally announced by governments and companies are corrected on the basis of these characteristics, and the analysis is conducted taking into account this limitation. In this study, the impact of information from SNS on the market is assessed, followed by the nature of "alternative data." During this analysis, the characteristics of alternative data are focused on. We analyze the correlation between the appearance rate of keywords that frequently posted on SNS and the stock indices of industries. Whether or not SNS can be a market sensor is a primary interest. This motivates to implement the following methods: a program code is implemented to collect messages automatically related to specific keywords posted on SNS. During the period from April 18 to June 23, 2022, texts are collected associated with the 2022 Russian invasion of Ukraine including the keyword “Russia,” along with the corresponding posting date and time. The total number of data with this condition was 2,513,610. A multi-dimensional scaling is applied to the obtained data. The words that often appear simultaneously side by side are visually significant. Relying on this, clusters are identified to these words, then several words emerged as key features. The aforementioned data are first classified into two types; "war" and "economy." Then 2,000 data are selected from these to categorize as either 'positive,' 'negative,' or 'neutral.' A supervised learning on the neural network is applied to these data. On this network, the remaining data are classified into three categories. The accuracy of this classification appeared mostly accurate. The preliminary experiments yield the following implications: 1: Information retrievable from SNS impacts economic indicators in real time, 2: The approximate time-lag from posting on SNS to the market is some one hour. These results may conclude that SNS is useful as a market sensor. The proposed method realizes the analysis of big data obtained from SNS. This devised method is cost-effective and would be handy for small business owners.