Sentiment analysis: from pre-processing to applications in online communities

Fornacciari, Paolo

The high diffusion of social media is one of the most exciting novelty in these last years. Social media are not only used as a tool for messaging and sharing private things, but they are also used by people who want to share their opinion about some products or services. The huge amount of textual data produced by web social media has grown accordingly and there are obvious benefits for companies and governments in understanding what people think about their products and services, but it is also in the interests of public institutions to be able to collect, retrieve and preserve all the information related to specific events and their development over time. Sentiment Analysis, which is the set of Natural Language Techniques for the identification and the categorization of opinions expressed in a piece of text, is of particular interest in order to determine attitudes towards a particular topic and can be successfully applied to the messages left in online social media. However, most of the works regarding polarity classification usually consider text to infer sentiment and do not take into account that social networks are actually networked environments. For this reason, the combination of content and relationships is a core task of the recent literature on Sentiment Analysis. Starting from the classical state-of-the-art methodologies where only text is used to infer the emotions expressed in social networks messages, this thesis presents two main contributions. The first contribution has been mainly focused towards some preliminary considerations for any kind of sentiment analysis: the accurate preprocessing phase of some available datasets, action never performed in a complete and accurate way in the relevant literature, the study and implementation of a novel and suitable polishing method based on an iterative learning approach and the comparison of different types of classifiers. The second main contribution regarded the application of sentiment analysis to social networks in order to obtain a sort of combined approach: the network topology can contextualize the results of the Sentiment Analysis, while the polarity and the emotions expressed in the network can highlight the role of semantic connections in the hierarchy of the communities in the network itself. First, a sentiment has been associated to the nodes of Twitter graphs, showing the social connections, in order highlight the potential correlations, i.e., similar ways to participate into a community. Then, sentiment analysis was applied to particular communities of Facebook, applying both automatic emotion detection and social network analysis techniques. This permitted to study how emotions are influenced by different kinds of relationships. Finally, after an up-to-date analysis of the state of the art for the problem of troll detection, a systematic collection and grouping of features and a comparison among the different detected features with a machine learning approach, sentiment analysis was employed to detect malicious and anti-social behaviors in social networks, with the implementation of TrollPacifier, a novel holistic system for troll detection that demonstrated to reach a very high accuracy (95.5%). The obtained results demonstrate that sentiment analysis can corroborate social network analysis and that together they can result a powerful tool to deepen the knowledge of online social network themselves.

Sentiment analysis: from pre-processing to applications in online communities

Fornacciari, Paolo

2019

Abstract

The high diffusion of social media is one of the most exciting novelty in these last years. Social media are not only used as a tool for messaging and sharing private things, but they are also used by people who want to share their opinion about some products or services. The huge amount of textual data produced by web social media has grown accordingly and there are obvious benefits for companies and governments in understanding what people think about their products and services, but it is also in the interests of public institutions to be able to collect, retrieve and preserve all the information related to specific events and their development over time. Sentiment Analysis, which is the set of Natural Language Techniques for the identification and the categorization of opinions expressed in a piece of text, is of particular interest in order to determine attitudes towards a particular topic and can be successfully applied to the messages left in online social media. However, most of the works regarding polarity classification usually consider text to infer sentiment and do not take into account that social networks are actually networked environments. For this reason, the combination of content and relationships is a core task of the recent literature on Sentiment Analysis. Starting from the classical state-of-the-art methodologies where only text is used to infer the emotions expressed in social networks messages, this thesis presents two main contributions. The first contribution has been mainly focused towards some preliminary considerations for any kind of sentiment analysis: the accurate preprocessing phase of some available datasets, action never performed in a complete and accurate way in the relevant literature, the study and implementation of a novel and suitable polishing method based on an iterative learning approach and the comparison of different types of classifiers. The second main contribution regarded the application of sentiment analysis to social networks in order to obtain a sort of combined approach: the network topology can contextualize the results of the Sentiment Analysis, while the polarity and the emotions expressed in the network can highlight the role of semantic connections in the hierarchy of the communities in the network itself. First, a sentiment has been associated to the nodes of Twitter graphs, showing the social connections, in order highlight the potential correlations, i.e., similar ways to participate into a community. Then, sentiment analysis was applied to particular communities of Facebook, applying both automatic emotion detection and social network analysis techniques. This permitted to study how emotions are influenced by different kinds of relationships. Finally, after an up-to-date analysis of the state of the art for the problem of troll detection, a systematic collection and grouping of features and a comparison among the different detected features with a machine learning approach, sentiment analysis was employed to detect malicious and anti-social behaviors in social networks, with the implementation of TrollPacifier, a novel holistic system for troll detection that demonstrated to reach a very high accuracy (95.5%). The obtained results demonstrate that sentiment analysis can corroborate social network analysis and that together they can result a powerful tool to deepen the knowledge of online social network themselves.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				Dottorato di ricerca in Tecnologie dell'informazione
			
	Data di pubblicazione
	
				2019
			
	Lingua
	
				Inglese
			
	Parola chiave
	
				Sentiment Analysis
Social Network Analysis
ING-INF/05
			
	Nome Editore
	
				Università degli Studi di Parma
			
	Collezione di appartenenza
	
				Università degli Studi di Parma

File in questo prodotto:

File	Dimensione	Formato
TesiDottoratoFinale.pdf accesso solo da BNCF e BNCR Tipologia: Altro materiale allegato Dimensione 11.78 MB Formato Adobe PDF	11.78 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/134272

Il codice NBN di questa tesi è URN:NBN:IT:UNIPR-134272