Effective, efficient and reliable large language models

Santilli, Andrea

In recent years, Large Language Models (LLMs) have fundamentally transformed the field of Natural Language Processing (NLP), reshaping the landscape of AI research and applications. This thesis represents the culmination of four years of doctoral research, which began in 2020 when LLMs were still an emerging technology and GPT-3 had just been introduced. Over the course of this research, we have both observed and contributed to the advancement of some of the technologies underpinning LLMs, from their early stages to their current role as cutting-edge AI systems. Specifically, this thesis combines some of the works carried out during this time under three critical dimensions of LLMs: Effectiveness, Efficiency, and Reliability. On the Effectiveness dimension, we contributed to the development of instruction tuning - a key technique now ubiquitous in the training pipeline of LLMs. Our work demonstrated that smaller, instruction-tuned LLMs can outperform models up to 16 times their size, including GPT-3. We also developed PromptSource, an integrated development environment for creating, managing, and sharing natural language prompts, which has become a valuable resource for the NLP community. Both of these contributions were carried out during the BigScience Workshop, a year-long open research initiative by Hugging Face targeting the study of LLMs. Finally, along this dimension, we studied how to make these models handle multimodal database-like queries. Addressing the Efficiency dimension, we tackled the challenge of accelerating LLM inference. We introduced three novel parallel decoding algorithms that significantly speed up text generation without compromising output quality. This has since evolved into an active research area known as speculative or parallel decoding. Furthermore, we developed an efficient, language-specific instruction-tuned LLM for the Italian language, demonstrating a cost-effective approach to creating high-quality models for specific languages. Our research on Reliability addresses the critical issue of making these models reliable since they have been shown to systematically generate incorrect information - a phenomenon known as hallucinations. In this direction, we investigated whether it's possible to detect the model's confidence in its outputs. We conducted a comprehensive assessment of current uncertainty quantification methods and their evaluation protocols and explored novel approaches to combine these methods to improve the detection and quantification of uncertainty in LLM outputs. Our work paves the way for more Effective, Efficient, and Reliable large language models, addressing key challenges in their development and deployment while opening new avenues for future research in this rapidly evolving field.

Effective, efficient and reliable large language models

SANTILLI, ANDREA

2025

Abstract

In recent years, Large Language Models (LLMs) have fundamentally transformed the field of Natural Language Processing (NLP), reshaping the landscape of AI research and applications. This thesis represents the culmination of four years of doctoral research, which began in 2020 when LLMs were still an emerging technology and GPT-3 had just been introduced. Over the course of this research, we have both observed and contributed to the advancement of some of the technologies underpinning LLMs, from their early stages to their current role as cutting-edge AI systems. Specifically, this thesis combines some of the works carried out during this time under three critical dimensions of LLMs: Effectiveness, Efficiency, and Reliability. On the Effectiveness dimension, we contributed to the development of instruction tuning - a key technique now ubiquitous in the training pipeline of LLMs. Our work demonstrated that smaller, instruction-tuned LLMs can outperform models up to 16 times their size, including GPT-3. We also developed PromptSource, an integrated development environment for creating, managing, and sharing natural language prompts, which has become a valuable resource for the NLP community. Both of these contributions were carried out during the BigScience Workshop, a year-long open research initiative by Hugging Face targeting the study of LLMs. Finally, along this dimension, we studied how to make these models handle multimodal database-like queries. Addressing the Efficiency dimension, we tackled the challenge of accelerating LLM inference. We introduced three novel parallel decoding algorithms that significantly speed up text generation without compromising output quality. This has since evolved into an active research area known as speculative or parallel decoding. Furthermore, we developed an efficient, language-specific instruction-tuned LLM for the Italian language, demonstrating a cost-effective approach to creating high-quality models for specific languages. Our research on Reliability addresses the critical issue of making these models reliable since they have been shown to systematically generate incorrect information - a phenomenon known as hallucinations. In this direction, we investigated whether it's possible to detect the model's confidence in its outputs. We conducted a comprehensive assessment of current uncertainty quantification methods and their evaluation protocols and explored novel approaches to combine these methods to improve the detection and quantification of uncertainty in LLM outputs. Our work paves the way for more Effective, Efficient, and Reliable large language models, addressing key challenges in their development and deployment while opening new avenues for future research in this rapidly evolving field.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI INFORMATICA
			
	Corso di studio
	
				Informatica
			
	Data di pubblicazione
	
				15-gen-2025
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				RODOLA', EMANUELE
			
	Nome Editore
	
				Università degli Studi di Roma "La Sapienza"
			
	Collezione di appartenenza
	
				Università degli Studi di Roma La Sapienza

File in questo prodotto:

File	Dimensione	Formato
Tesi_dottorato_Santilli.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 8.16 MB Formato Adobe PDF Visualizza/Apri	8.16 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/188440

Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-188440