Exploring Transformers: Journey Through Language Processing Architectures and Tasks

Delsanto, Matteo

This dissertation illustrates the research activities I carried out during my PhD years. It has as common thread the use of language models and Transformer architectures, that have emerged as the state-of-the-art technology in the Natural Language Processing field. An overview on recent language models is initially provided to frame my research in the context of the modern approaches to the automatic analysis and generation of natural language. Their evolution is traced, by starting from the basic Transformer architecture to advancements such as BERT and GPT-2, up to multimodal models such as OpenAI’s GPT-4 and Google’s Gemini, and their features are illustrated and discussed. It is then described how the encoder and decoder modules have been exploited for different tasks. Information Extraction from clinical reports and argument mining for detecting grammatical errors are addressed first, as examples of applications relying on transformers encoders. An example of linguistic analysis and categorization is then introduced, targeted at discriminating cognitively impaired subjects from healthy elderly controls: in this case, the analysis is conducted by exploiting a decoder block, whose output is also compared to standard n-gram based language models. Finally, it is shown how to employ the whole transformers architecture to cope with a foundational NLP task, such as word sense disambiguation. The obtained results are discussed and interpreted in the light of the main technological and cultural trends in the NLP field.

Exploring Transformers: Journey Through Language Processing Architectures and Tasks

DELSANTO, MATTEO

2024

Abstract

This dissertation illustrates the research activities I carried out during my PhD years. It has as common thread the use of language models and Transformer architectures, that have emerged as the state-of-the-art technology in the Natural Language Processing field. An overview on recent language models is initially provided to frame my research in the context of the modern approaches to the automatic analysis and generation of natural language. Their evolution is traced, by starting from the basic Transformer architecture to advancements such as BERT and GPT-2, up to multimodal models such as OpenAI’s GPT-4 and Google’s Gemini, and their features are illustrated and discussed. It is then described how the encoder and decoder modules have been exploited for different tasks. Information Extraction from clinical reports and argument mining for detecting grammatical errors are addressed first, as examples of applications relying on transformers encoders. An example of linguistic analysis and categorization is then introduced, targeted at discriminating cognitively impaired subjects from healthy elderly controls: in this case, the analysis is conducted by exploiting a decoder block, whose output is also compared to standard n-gram based language models. Finally, it is shown how to employ the whole transformers architecture to cope with a foundational NLP task, such as word sense disambiguation. The obtained results are discussed and interpreted in the light of the main technological and cultural trends in the NLP field.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				INFORMATICA
			
	Data di pubblicazione
	
				21-ott-2024
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				RADICIONI, Daniele Paolo
			
	Nome Editore
	
				Università degli Studi di Torino
			
	Collezione di appartenenza
	
				Università degli Studi di Torino

File in questo prodotto:

File	Dimensione	Formato
_PhD_Thesis_Matteo_Delsanto-1.pdf accesso aperto Dimensione 6.66 MB Formato Adobe PDF Visualizza/Apri	6.66 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/199137

Il codice NBN di questa tesi è URN:NBN:IT:UNITO-199137