Text analysis with deep learning and data augmentation

Karimi, Akbar; Akbar, Karimi

With the vast amount of textual data available on the Web, it is becoming increasingly difficult to analyze them manually. Therefore, there is a growing need to automatically process them for various applications such as opinion mining, sentiment classification, and question answering to name but a few. While traditional text analysis techniques such as N-gram language models can perform reasonably well, they still rely on manual feature engineering. Deep neural networks do away with manually designing features and allow us to create systems with the capability of end-to-end data processing. In order to do this effectively, they depend heavily on the amount of input data for training. However, the data can still be scarce for applications or domains that are newly worked on. In these cases, data augmentation techniques can be used to augment the input data to help networks perform better. In this dissertation, we make several contributions to text analysis by addressing some of its problems including Sentiment Analysis (SA), Toxic Language Detection (TLD), Text Classification (TC). Firstly, we introduce a novel deep architecture to address Aspect-Based Sentiment Analysis (ABSA), combining adversarial training, which is a form of data augmentation in the embedding space, with a state-of-the-art pre-trained language model called BERT. Then, we propose two additive modules that are attached on top of BERT and help improve the model performance. Furthermore, we introduce a simple bag-of-words model which performs reasonably well in detecting toxic language despite its simplicity. Moreover, we put forward a novel data augmentation technique in the input space, and show that it is fruitful for neural network models applied on various text classification data sets. Finally, collecting product image and comments from social media, we build an annotated multimodal dataset that can be utilized to address Aspect-Based Emotion Analysis (ABEA).

Text analysis with deep learning and data augmentation

Karimi, Akbar;Akbar, Karimi

2022

Abstract

With the vast amount of textual data available on the Web, it is becoming increasingly difficult to analyze them manually. Therefore, there is a growing need to automatically process them for various applications such as opinion mining, sentiment classification, and question answering to name but a few. While traditional text analysis techniques such as N-gram language models can perform reasonably well, they still rely on manual feature engineering. Deep neural networks do away with manually designing features and allow us to create systems with the capability of end-to-end data processing. In order to do this effectively, they depend heavily on the amount of input data for training. However, the data can still be scarce for applications or domains that are newly worked on. In these cases, data augmentation techniques can be used to augment the input data to help networks perform better. In this dissertation, we make several contributions to text analysis by addressing some of its problems including Sentiment Analysis (SA), Toxic Language Detection (TLD), Text Classification (TC). Firstly, we introduce a novel deep architecture to address Aspect-Based Sentiment Analysis (ABSA), combining adversarial training, which is a form of data augmentation in the embedding space, with a state-of-the-art pre-trained language model called BERT. Then, we propose two additive modules that are attached on top of BERT and help improve the model performance. Furthermore, we introduce a simple bag-of-words model which performs reasonably well in detecting toxic language despite its simplicity. Moreover, we put forward a novel data augmentation technique in the input space, and show that it is fruitful for neural network models applied on various text classification data sets. Finally, collecting product image and comments from social media, we build an annotated multimodal dataset that can be utilized to address Aspect-Based Emotion Analysis (ABEA).

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				Corso di Ingegneria e architettura
			
	Titolo inglese
	
				Text analysis with deep learning and data augmentation
			
	Data di pubblicazione
	
				15-giu-2022
			
	Lingua
	
				ENG
			
	Parola chiave
	
				Aspect-Based Sentiment Analysis
Text Classification
Toxic Language Detection
Data Augmentation
Aspect-Based Emotion Analysis
Text Analysis
INF/01
			
	Relatore, Supervisor, Advisor o Tutor
	
				Andrea, Prati
			
	Nome Editore
	
				Università degli studi di Parma. Dipartimento di Ingegneria e architettura
			
	Collezione di appartenenza
	
				Università degli Studi di Parma

File in questo prodotto:

File	Dimensione	Formato
Akbar_Karimi_PhD_Thesis_V2_a.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 5.94 MB Formato Adobe PDF Visualizza/Apri	5.94 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/196623

Il codice NBN di questa tesi è URN:NBN:IT:UNIPR-196623