Role of Sentiment Analysis in Linguistics Essay

Exclusively available on Available only on IvyPanda® • No AI

Introduction

Sentiment Analysis (SA) is among the fields of Natural Language Processing (NLP). It is an intellectual process for a user to extricate their feelings and emotions (Devika et al., 2016). SA uses computational linguistics to identify, extract, quantify, and study affective forms and subjective information. This technique is applied widely to material such as survey responses, reviews, social media, and other online communications, healthcare material, and applications ranging from marketing, customer service, and clinical medicine. In SA, data from a particular database or a data source is analyzed, and the message is classified as either negative, neutral, or positive, depending on the scale used. Moreover, SA can be used to analyze the whole document, its sentence, or aspect. Emotional detection further can help identify specific emotions of the text (sadness, happiness, etc.) (Sunil & Sunita, 2020). The data extracted from the text is thus categorized to produce a general emotional and semantic assessment of the text message.

SA can be a part of the analysis of various forms of social media communication, including video and audio information. Such multimodal analysis requires transcripts of spoken language and the study of textual data along with visual frames (Akhtar et al., 2019). Thus, SA has long been a key aspect of NLP, enabling analysis of the semantic characteristics of a message (Akhtar et al., 2019). The study of textual opinions is the basis for emotion recognition because the practice is currently well developed.

Analysis of social media streams is mostly restricted and uses count-based parameters. Despite the fact that in the modern world, people are increasingly using video and audio channels to express their opinions, research in SA is mostly limited to text messages. The analysis of textual information for evaluating opinions is based on algorithms that “make use of words, phrases and relations, as well as dependencies among them” (Poria et al., 2017, p. 98). Recently, there has been considerable advancement in deep learning, which has transformed the ability to perform text analysis (Poecze et al., 2018). For example, an intent analysis is used to analyze the intention behind a user’s message, while contextual semantic search (CSS) involves splitting the message depending on the topic (Poecze et al., 2018). Currently, multilingual analysis is also available, which is used for messages containing parts in different languages (Sunil & Sunita, 2020). Polarity analysis, stance analysis, and emotion analysis are other techniques of analyzing users’ sentiments and opinions in social media data.

Polarity Analysis

Polarity analysis is one of the ways of carrying out a sentiment analysis of a given dataset. As the name suggests, it involves detecting the extremeness of the message. Fundamental polarity analysis involves testing whether the opinion expressed in a text or a document is positive, neutral, or negative. Advanced “ultra-polarity” sentiment categorization, for example, looks at emotional states such as happiness, anger, disgust, fear, sadness, and surprise (Ho et al., 2020). In data mining, such as the one used in social media networks, parity is used to quantify opinions since the opinions typically take a positive or negative tone. However, the messages can also be categorized as neutral if the opinion is in the middle of two polar signs.

The convention, however, is to classify the sentiments as to whether positive or negative of the given dataset. For instance, the work by Turney (2001) used thumbs up or thumbs down approach to classifying reviews. The experiment used an unsupervised learning algorithm where a review classification was predicted using the mean semantic orientation of review phrases containing adjectives and adverbs. A phrase’s semantic orientation was identified as positive if it had good associations; it was classified as negative if it had bad associations. Thus, polarity analysis is used to classify messages as neutral and non-neutral and to identify positive and negative text messages (Dey et al., 2017). To retrieve these labels requires the formation of datasets and their subsequent processing, for which different approaches are used.

The dictionary-based sentiment analysis technique is a computational method of measuring the type of feeling a text conveys (Reagan et al., 2017). In this context, the sentiment is binary in nature: negative or positive but could also have multiple classes such as anger, joy, or sadness. The technique relies heavily on a predefined set of sentiment-laden text (Wang et al., 2017). In this analysis, the number of positive terms is subtracted from negative terms. If the result of the operation is greater than 0, then this is positive polarity, but if it is less than 0, then the converse applies.

However, the dictionary-based approach can be applied as part of sentiment analysis using other techniques. Kaur et al. (2017) used a combination of a dictionary-based approach and three machine learning techniques for the sentiment analysis of Hinglish texts. A dataset of 300 Hinglish movie reviews was compiled for the study. The research results emphasize that “for Hinglish text, dictionary-based approach gives significant results in terms of all the performance evaluation metrics for all the iterations as compared to machine learning algorithms” (Kaur et al., 2017, p. 102). Thus, this technique is the basis for compiling datasets, which is especially useful for sentiment analysis of text messages in less common languages. Moreover, the datasets created in this way are then used for supervised and unsupervised analysis techniques.

The supervised learning technique of performing sentiment analysis involves using statistical models to uncover the underlying patterns in data. In supervised learning, like any other machine learning, the models have to be trained on already existing data before they can be deployed to decipher new data (Chauhan, 2017). Supervised learning, in this case, will involve labeling the training data based on the various categories on the polarity scale (Singh et al., 2017). The advantage of this model is that the programmer will not have to code the data explicitly, and therefore, the model requires little human intervention when deployed. For this reason, the technique can handle much larger data and perform better classification, including capturing language nuances.

Another technique used in polarity sentiment analysis is unsupervised learning. Unsupervised learning is similar to supervised learning in that they are both machine learning techniques. However, unlike supervised learning, which uses labeled data to train the statistical models, unsupervised learning allows the model to perform its own classification (Ahmad et al., 2017). Examples of these types of techniques include clustering algorithms.

Various algorithms can be used in combination in polarity analysis to achieve the best information accuracy. For example, Rout et al. (2018) applied both supervised and unsupervised algorithms for the analysis of unstructured texts presented in social media. The researchers first used vocabulary labeled positive and negative to define the emotional content of a message. For this, two formed sheets of synonyms were used, by which the search was conducted. Thus, the unsupervised algorithm allowed researchers to classify messages into two groups. The supervised model was then applied to identify specific emotional labels such as anger, fear, joy, etc. Thus, researchers can apply various techniques to extract more accurate and informative data for analyzing text messages.

Polarity analysis is a challenging task as the emotional labeling of words can be changed in different scenarios. In particular, polysemy may be a frequent source of confusion for researchers (Chen et al., 2019). The creation of algorithms is also hampered by other linguistic aspects. For example, conditional, interrogative, or valence-shifting sentences can flip polarity statements (Do et al., 2019). Additionally, user-generated texts are often different from standard texts, making it difficult to extract emotional markers (Do et al., 2019). To solve these problems, machine learning and the formation of clusters of datasets are used.

Emotion Analysis

Emotion analysis is a subset of sentiment analysis that involves extracting and analyzing emotions. Some researchers use the terms emotional analysis and sentiment analysis interchangeably (Kim & Klinger, 2021). However, Soleymani et al. (2017) note that sentiment analysis focuses more on identifying the polarity of the opinion expressed in the message. Emotion analysis is a fundamental component of affective computing where “affect” translates to emotion, while “computing” refers to measuring or calculating something (Hakak et al., 2017). This branch of computing is crucial in designing and simulating human effects, enabling the analysis of human/machine interactions (Hakak et al., 2017). Data used in emotion analysis can be in text, audio, imagery, or video. Some researchers use the terms emotional analysis and sentiment analysis interchangeably (Kim & Klinger, 2021). However, Soleymani et al. (2017) note that sentiment analysis focuses more on identifying the polarity of the opinion expressed in the message. In addition, analysis of sentiments and emotions of various data from sources such as the Internet has the merits, such as measuring the health of a community; this can be used to prevent suicides.

Data and text mining have become an indispensable part of organizational success due to the evolution of the Web. The data enables organizations to develop tailor-made products for customers. Big data has availed petabytes of data necessitating fast-tracking of research on ways to analyze this data. Emotion detection systems are heavily reliant on emotion models; they provide a definition for representing emotion (Acheampong et al., 2020). There are many types of emotion detection models; one is discrete emotion models that place emotions into distinctive categories or classes. The other type is dimensional emotion models (DiEM), which assume that emotions are dependent, hence representing them in a spatial space. Some DiEMs, according to Acheampong et al. (2020), are in the form of a circular 2-D model referred to as the circumplex effect, while others are in the form of a 2-D emotions wheel with outer and inner concentric circles.

In the modern world, multimodal emotional analysis is also gaining importance. Previously, research was conducted using only one type of data: text, auditory, or visual. However, more sophisticated sensors and the development of computer processing have allowed multimodal analysis. This approach makes it possible to use two and more domains for joint analysis, which increases the accuracy and detail of the results (Shoumy et al., 2020). For example, researchers can use the test and facial expressions to identify more specific emotional markers. These technologies are used on the basis of big data analysis and can be applied in various fields, including artificial intelligence, medicine, education, or marketing.

Stance Analysis

Computational approaches have focused mainly on detecting the polarity of product reviews through text classification as positive, neutral, or negative. However, it is also essential to determine the direction of the message to identify the favorability towards a given subject of interest (Sun et al., 2019). Stance analysis of textual data involves detecting whether the author of the text is in favor or against a specific target (Sun et al., 2019). This type of analysis uses computational linguistics. The target group may be an individual, an organization, government, product, or movement.

Despite the fact that stance analysis provides similar markers to sentiment analysis (positive, negative, or neutral, this approach has a number of differences. First of all, stance analysis considers the particular target, which is not an object for polarity analysis. Secondly, the stance and sentiment of one message may not agree and sometimes even be the opposite (Küçük & Can, 2020). Mohammad et al. (2016) clarify that the two approaches differ in that although sentiment analysis identifies the target, it does not establish a relationship between it and the expressed opinion. There are plenty of methods used in conducting this type of analysis. Still, they typically test whether the social media post is often in favor or against a target, even without mentioning the subject in the analyzed text. Stance detection is vital in analytical work to measure public opinion on social media, especially social and political issues. These matters are usually riddled with controversy, where people express opposing views towards distinct topics (Aldayel & Magdy, 2021). These topics include abortion, global warming, feminism which have been used heavily as target subjects. Others include political issues such as referendums and elections.

Social Media & Big Data Analytics

Big data is a field of data science concerned with ways of extracting information, analyzing, or handling datasets that are too large to be handled with traditional data-analysis software (Breur, 2016). Data that contains many fields typically provide more statistical power, while data with more attributes may increase the false discovery rate (Breur, 2016). Challenges associated with big data include capture, storage, analysis, search, transfer, querying, visualization, privacy, and updating. Big data is usually associated with the 5-Vs of data, namely, volume, value, velocity, variety, and veracity.

Big data analytics has increasingly become a vital field due to the popularity of the Internet, especially the Web 2.0 technology (Ghani et al., 2019). Not surprisingly, the adoption of social media applications has provided numerous opportunities and challenges for data science researchers. Social media sites generate copious amounts of data due to integrating daily activities and background details (Ghani et al., 2019). This voluminous information is what is called “big data.” Big data has had a huge impact on society. For example, in politics, big data was heavily used by the Obama campaign during his re-election in 2012 (Kumari, 2016). It has also been widely used in marketing where companies such as Google can recommend products based on search history, enriching the user experience and raising privacy issues simultaneously.

Review of Existing Researches on Social Media Activism in Twitter and Sentiment Analysis

The Twitter mining process is a great provision for researchers to collect data. Many researchers have employed the technique to review public opinion on various social, economic, and political subjects. One example of Twitter Mining and SA applied in political analysis is offered by Gull et al. (2016). This study was carried out in Pakistan to perform sentiment analysis in a political contest. The methodology involved selecting and extracting data from Twitter; in the form of posts. After the collection, the data was cleansed, transformed, and stored in a database. Analysis was done using Python’s SVM and naïve Bayes models. On a total of 80000 tweets (Gull et al., 2016). The results were visualized on a pie chart and bar graphs revealing different political sentiments in Pakistan. Particularly, Lahore was revealed to have the most negative perception of the Pakistan Peoples Party (PPP) (Gull et al., 2016). The results also showed that SVM was a better classifier than Naïve Bayes.

Another example of how Twitter data can be used to analyze social issues comes from a study by Scarborough (2018). The researcher was investigating the effectiveness of Twitter messages in measuring sentiment about gender. The data comprised over 100000 tweets where the researcher calculated Twitter sentiments on feminism at the state and county level (Scarborough, 2018). The researcher then juxtaposed the results against data from the General Social Survey. The models used in the study were qualitative naïve Bayes sentiment analysis. The results revealed that Twitter’s opinion on feminism was significantly correlated with gender attitudes (Scarborough, 2018). This observation showed that Twitter is a legitimate measure of public opinion on gender.

Police brutality is one of the most critical topics to dominate news cycles worldwide in the past decade. Once a police brutality incident occurs, millions of people take it to the Internet to express their opinions. After the 2015 death of Freddie Gray, a study (Oglesby-Neal et al., 2019) was conducted to investigate public opinion on the topic. Millions of tweets were used in the study, focusing on how the opinion changed in response to the event. The researchers collected 65 million tweets that contained the words “cop” or “police” from Twitter data tools. The researchers had chosen Twitter over Facebook because Twitter is not as restrictive as Facebook.

The categories for the sentiment analysis were positive, neutral, positive, and not applicable. The data were cleaned and analyzed using Stanford University’s CoreNLP, a natural language processing tool. The results showed a change in general sentiment towards police, becoming more negative over time. The study showed that in 2014, the positivity rate in tweets remained flat at a 2% rate while the rate of negativity in tweets was at 17% (Oglesby-Neal et al., 2019). The fluctuations that occurred in 2015 were mostly negative. Three months before the Freddie Gray incident, negative tweets about police were 20%, while two months after the incident, the rate of negativity had increased to 23% (Oglesby-Neal et al., 2019). The negative tweets’ total count rose after his death and occasionally spike during the Baltimore protests that occurred in the following weeks.

Another topic of interest that dominates social media discussions is the issue of LGBT+ rights. A study by Fitri et al. (2019) was carried out in Indonesia to perform sentiment analysis with a specific focus on an anti-LGBT campaign on Twitter. The campaign was widely discussed on Indonesian Twitter. The three classifications of sentiments were positive, neutral, or negative. Three algorithms were used in the analysis: naïve Bayes, random forest, and decision tree. The researchers chose naïve Bayes since it had demonstrated accuracy in sentiment analysis classifications (Xu, 2018). Several stages were involved in data preparation, namely, preprocessing, classification, and evaluation (Fitri et al., 2019). The analysis indicated that Indonesian tweeters had predominantly neutral opinions on LGBT+ issues. After training the models, testing was done, revealing that naïve Bayes had better accuracy metrics than random forest and decision tree, i.e., 82.9% accuracy compared to naïve Bayes 86.43% (Xu, 2018).

Many researchers also use sentiment analysis when evaluating Twitter text data to examine the attitude of the world’s population towards terrorism. For example, Mansour (2018) used this approach for global patterns associated with the perception of terrorist organizations. The results of this study make it possible to judge whether there are differences between the Western and Eastern worlds in terms of attitudes towards this phenomenon. Sentiment analysis made it possible to understand that, regardless of country of origin and residence, people perceive terrorism and terrorist organizations as a threat. In particular, negative markers significantly prevailed over positive ones. However, the author of the article also emphasizes that the results of the analysis “can be interpreted as that most of the collected tweets were against ISIS” (Mansour, 2018, p. 192). In the context of sentiment and stance analysis, this conclusion is not entirely correct. The techniques described can also be applied to investigate more local events, which presents value from a political perspective.

SA can be applied to study the problems of ethnic minorities. Nwofe (2017) researched Twitter posts to analyze people’s opinions on Biafraexit. Biafraexit is a movement for a referendum on the independence of one of the territories included in Nigeria, which has been widely discussed on social networks. This name was coined after the UK left Europe, which is known as Brexit. The analysis found that the ethnic minority calling for the referendum has socio-economic uncertainty about the Nigerian government. Moreover, it was found that the population of Nigeria values ethnic rather than national identity to a greater extent. It is noteworthy that the author of the article also makes recommendations for further policy based on the obtained data, which speaks of the value and political applications of SA.

References

Acheampong, F. A., Wenyu, C., & Nunoo‐Mensah, H. (2020). Text‐based emotion detection: Advances, challenges, and opportunities.Engineering Reports, 2(7), 1-24. Web.

Aldayel, A., & Magdy, W. (2021). Stance detection on social media: State of the art and trends. Information Processing & Management, 58(4), 1-33. Web.

Ahmad, M., Aftab, S., & Muhammad, S. (2017). Machine learning techniques for sentiment analysis: A review. International Journal of Multidisciplinary Sciences and Engineering, 8(3), 27-32.

Akhtar, M. S., Chauhan, D. S., Ghosal, D., Poria, S., Ekbal, A., & Bhattacharyya, P. (2019). Multi-task learning for multi-modal emotion recognition and sentiment analysis.Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1, 370-379. Web.

Chauhan, P. (2017). Sentiment analysis: A comparative study of supervised machine learning algorithms using rapid miner. International Journal for Research in Applied Science and Engineering Technology, V(XI), 80–89. Web.

Chen, Z., Teng, S., Zhang, W., Tang, H., Zhang, Z., He, J., Fang, X., & Fei, L. (2019). LSTM sentiment polarity analysis based on LDA clustering. In Y. Sun, T. Lu, X. Xie, L. Gao, & H. Fan. (eds), Computer-supported cooperative work and social computing (pp. 342-255). Springer. Web.

Breur, T. (2016). Statistical Power Analysis and the contemporary “crisis” in social sciences.Journal of Marketing Analytics, 4(2–3), 61–65. Web.

Devika, M. D., Sunitha, C., & Ganesh, A. (2016). Sentiment analysis: A comparative study on different approaches.Procedia Computer Science, 87, 44–49. Web.

Dey, K., Ritvik, S., & Kaushik, S. (2017). Twitter stance detection — A subjectivity and sentiment polarity-inspired two-phase approach.IEEE International Conference on Data Mining Workshops (ICDMW), New Orleans, LA, USA. Web.

Do, H. H., Pasad, P. W. C., Maag, A., & Alsadoon. (2019). Deep learning for aspect-based sentiment analysis: A comparative review. Expert Systems with Applications, 118(15), 272-299. Web.

Fitri, V. A., Andreswari, R., & Hasibuan, M. A. (2019). Sentiment analysis of social media twitter with a case of anti-LGBT campaign in Indonesia using naïve Bayes, decision tree, and random forest algorithm.Procedia Computer Science, 161, 765–772. Web.

Garcia-Garcia, J. M., Penichet, V., Lozano, M. D., Garrido, J. E., & Law, E. (2018). Multimodal affective computing to enhance the user experience of educational software applications.Mobile Information Systems, 1-10. Web.

Ghani, N. A., Hamid, S., Targio Hashem, I. A., & Ahmed, E. (2019). Social media big data analytics: A survey. Computers in Human Behavior, 101, 417–428. Web.

Gull, R., Shoaib, U., Rasheed, S., Abid, W., & Zahoor, B. (2016). Preprocessing of twitter’s data for opinion mining in political context.Procedia Computer Science, 96, 1560–1570. Web.

Hakak, N. M., Mohd, M., Kirmani, M., & Mohd, M. (2017). Emotion analysis: A survey.2017 International Conference on Computer, Communications and Electronics (Comptelix), 397–402. Web.

Ho, V. A., Nguyen, D. H.-C., Nguyen, D. H., Pham, L. T.-V., Nguyen, D.-V., Nguyen, K. V., & Nguyen, N. L.-T. (2020). Emotion recognition for Vietnamese social media text. In L.-M. Nguyen, X.-H. Phan, K. Hasida, & S. Tojo (Eds.), Computational Linguistics (Vol. 1215, pp. 319–333). Springer Singapore. Web.

Kaur, H., Mangat, V., & Krail, N. (2017). Dictionary-based sentiment analysis of Hinglish text and comparison with machine learning algorithms. International Journal of Metadata, Semantics and Ontologies, 12(2/3), 90-102. Web.

Kim, E., & Klinger, R. (2021). A survey on sentiment and emotion analysis for computational literary studies. Arxiv.org. Web.

Küçük, D., & Can, F. (2020). Stance detection: A survey. ACM Computing Surveys, 53(1). 1-36. Web.

Kumari, S. (2016). Impact of big data and social media on society. Global Journal for Research Analysis, 5, 437-438.

Mansour, S. (2018). Social media analysis of user’s responses to terrorism using sentiment analysis and text mining.Procedia Computer Science, 140, 95-103. Web.

Mohammad, S. M., Sobhani, P., & Kiritchenko, S. (2016). Stance and sentiment in tweets. ACM Transactions on Internet Technology, 13(3), 1-22. Web.

Nwofe, E. S. (2017). Pro-Biafran activists and the call for a referendum: A sentiment analysis of ‘Biafraexit’ on Twitter after UK’s vote to leave the European Union.Journal of Ethnic and Cultural Studies, 4(1), 65-81. Web.

Oglesby-Neal, A., Tiry, E., & Kim, K. (2019). A big-data approach to understanding public sentiment toward the police. Justice Policy Center.

Poecze, F., Ebster, C., & Strauss, C. (2018). Social media metrics and sentiment analysis to evaluate the effectiveness of social media posts.Procedia Computer Science, 130, 660–666. Web.

Poria, S., Cambria, E., Bajpai, R., & Hussain, A. (2017). Information Fusion, 37, 98-125. Web.

Reagan, A. J., Danforth, C. M., Tivnan, B., Williams, J. R., & Dodds, P. S. (2017). Sentiment analysis methods for understanding large-scale texts: A case for using continuum-scored words and word shift graphs.EPJ Data Science, 6(1), 1-21. Web.

Rout, J. K., Choo, K. R., Dash, A. K., Bakshi, S., Jena, S. K., & Williams, K. L. (2018). A model for sentiment and emotion analysis of unstructured social media text. Electronic Commerce Research, 18, 181-199. Web.

Scarborough, W. J. (2018). Feminist Twitter and gender attitudes: Opportunities and limitations to using Twitter in the study of public opinion.Socius. Web.

Shoumy, N. J., Ang, L. M., Seng, K. P., Rahaman, M., & Zia, T. (2020). Multimodal big data affective analytics: A comprehensive survey using text, audio, visual and physiological signals. Journal of Network and Computer Applications, 149, 1-31. Web.

Singh, J., Singh, G., & Singh, R. (2017). Optimization of sentiment analysis using machine learning classifiers.Human-Centric Computing and Information Sciences, 7(1), 1-12. Web.

Soleymani, M., Garcia, D., Jou, B., Schuller, B., Chang, S. F., & Pantic, M. (2017). A survey of multimodal sentiment analysis. Image and Vision Computing, 65, 3-14. Web.

Sun, Q., Wang, Z., Li, S., Zhu, Q., & Zhou, G. (2019). Stance detection via sentiment information and neural network model. Frontiers of Computer Science, 13(1), 127–138. Web.

Sunil, K., & Beniwal, S. (2020). Sentiment analysis: A tool for mining opinions and emotions. SSRN. Web.

Turney, P. D. (2001). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics – ACL ’02, 417-424. Web.

Wang, X., Ding, C., Zheng, W., & Wu, M. (2017). Sentiment analysis based on specific dictionary and sentence analysis. Proceedings of the 2017 International Conference on Economics and Management, Education, Humanities and Social Sciences (EMEHSS 2017). 2017 International Conference on Economics and Management, Education, Humanities and Social Sciences (EMEHSS 2017), Hangzhou, China. Web.

Xu, S. (2018). Bayesian Naïve Bayes classifiers to text classification. Journal of Information Science, 44(1), 48–59. Web.