Enhancing Public Administration with Computational Linguistics: a Language Model for Italian Bureacrutic Language

Auriemma, Serena

This thesis addresses the automatic analysis of texts written in bureaucratic Italian through the development of resources and the identification of computational linguistics and NLP approaches applicable to data from the Italian Public Administration (PA), with the goal of supporting its digital transformation. The research focuses on two main areas of intervention: streamlining the processing of administrative documents and improving the readability of PA texts. Sector-specific languages, such as bureaucratic Italian, often pose challenges for general-purpose language models, which lack the linguistic knowledge required to accurately perform domain-specific tasks. To address this issue, the thesis describes the stages leading to the development of BureauBERTo, an encoder-based language model and the first to be specialized in the Italian bureaucratic domain. BureauBERTo’s performance was tested and compared to other models using supervised, unsupervised, and prompt-based learning approaches, demonstrating the effectiveness of specialized models in domain-specific tasks, even with limited annotated data. The research also showed that specialized encoders offer an efficient and more sustainable solution for discriminative tasks compared to current large language models, while ensuring internal data governance for public institutions and fostering AI applications that are accessible even to smaller entities within the public sector.

Enhancing Public Administration with Computational Linguistics: a Language Model for Italian Bureacrutic Language

AURIEMMA, SERENA

2025

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
				9-lug-2025
			
	Lingua
	
				Inglese
			
	Parola chiave
	
				Italian bureaucratic language
administrative data
public administration
BureauBERTo
encoder
language model
specialized model
further pre-training
fine-tuning
prompting
			
	Relatore, Supervisor, Advisor o Tutor
	
				Lenci, Alessandro
			
	Collezione di appartenenza
	
				Università degli Studi di Pisa

File in questo prodotto:

File	Dimensione	Formato
PhD_Tesi_Auriemma_PDFA_2025.pdf embargo fino al 11/07/2028 Licenza: Tutti i diritti riservati Dimensione 3.76 MB Formato Adobe PDF	3.76 MB	Adobe PDF
Report_attivit_svolte_dottorato_etd_pdfa.pdf non disponibili Licenza: Tutti i diritti riservati Dimensione 175.67 kB Formato Adobe PDF	175.67 kB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/217899

Il codice NBN di questa tesi è URN:NBN:IT:UNIPI-217899