Introduction
Financial forecasting has always been a challenging activity due to the complexity and unpredictability of the stock market dynamics. As a result, stock market analytics are usually done based on a wide range of determiners and predictors. Earlier, the major sources of information for financial forecasting used to come from specialized publications, financial periodicals, and companies’ reports; however, with the development of technology and popularization of the Internet, specialized financial information became much more accessible. Not only do the scholarly journals publish recent research studies and the companies make their financial reports public but also various social networks and real-time online news portals can serve as convenient resources for the mining of financial data suitable for forecasting.
Analysis
Timing is the crucial and most significant aspects of any kind of financial operations. In that way, the vast majority of financial players are focused on accomplishing the operations in which they are interested in the most accurate and beneficial time. The data on which the financial decision-making is based can be quantitative (numerical, such as a change in prices, loss of income, inflation, and currency exchange rates) or qualitative (verbal, such as news, rumors, expectations, or fears). The former kind of data is analyzed using statistics tools designed to handle numerical information, and the latter type of data is processed with the help of NLP and financial extraction.
When it comes to the extraction of information from online resources such as social networks and news portals, qualitative data is usually the major focus. However, in order to create a basis substantial for financial forecasting and valid decisions, a large body of information needs to be gathered and processed. For this purpose, there exist several different tools designed for the analysis of semantic frames underlying the textual data. This approach proved more effective that the bag-of-words (BOW) model. In contrast with the BOW model where a text is processed in regard to the meanings of individual words, semantic frames analysis is focused on the identification of several semantic binaries reflecting certain concepts. However, despite its benefits, this analysis is mainly suitable for the polarity tasks and has numerous weaknesses in interpretation and identification of more complex phenomena due to word sentence ambiguity.
Another form of resources for financial data mining are financial forums where the discussions represent condensed reactions of millions of users; there, the comments can be analyzed in regard to their positivity and negativity and then cross-compared to the fluctuations in the financial markets under research.
Conclusion
Finally, social networks serve as massive sources of information due to their focus on quick updates and instant content sharing. Such resources generate huge amounts of information on a daily basis and can be used for the mining of qualitative financial data for forecasting. Mining relevant data from Twitter, for example, would require the identification of users whose opinions are the most accurate. For that, the verbal content of posts of multiple financial professionals was analyzed for the purpose of finding the users whose predictions and opinions reflect the following changes in the stock market the closest. In such analyses, much attention needs to be given to the specification of the topic as they serve as one of the main limitations due to the flexibility of language and the multitude of themes, determinants, and fields that exist in financial analytics. Sentiment analysis is one of the most successful solutions to this issue. With the help of Latent Dirichlet Allocation (LDA), a researcher can process a series of pre-identified topics and compare them to the sentiment in the shared Twitter content. The addition of more criteria can increase the accuracy of predictions.