Track: Machine Learning
Abstract
Over the last decade, a dramatic transform happened in information sources and their use in the digital era. Social
media networks have brought a new way of expressing the sentiments of individuals. The matter went beyond being
an expression of separate opinions of some individuals, as companies, official institutions and various organizations
have pages on the communication sites through which they share various developments, products, opinions, and
sometimes even official decisions. Social media become a medium with a huge amount of information where users
can view the opinion of other users that are classified into different sentiment classes and are increasingly growing as
a key factor in decision making. Twitter is a microblogging service built to describe what is happening anywhere
worldwide, at any moment. It’s a fascinating forum for more than 500 million messages per day from about 1.3 billion people. Twitter data is short, specific, and easily accessible, that’s why it has become one of the best sources for sentimental analysis and knowledge discovery by data streams mining. One of the major issues that affect data streams mining is that the underlying distribution of data may change over time leading to the phenomenon of concept drift. Identification and handling concept-drift in Data Streams is present area of interest. In this study, we present an approach to explore and understand the concept drift occurring in Twitter data streams. Two machine learning technique Naive Bayes Classifier and Extreme Gradient Boosting (XGBoost) Classifier were applied on more than 300K tweets from International Technology Companies, and to detect / understand concept drift and specify whether concept drift in a technology area is a radical or an incremental innovation.
Keywords
Tweets Mining, Social Media Analysis, Concept Drift, Data Stream, Technology Tweets.