Essays on Data Frameworks and Sustainable AI for Public Health

Priola, Maria Paola

This thesis explores data-centric and computational approaches to public health, integrating methods from data engineering, Artificial Intelligence (AI), and statistical evaluation. The overarching goal is to promote reliability, interpretability, and sustainability in the management and analysis of health data. The first chapter addresses the challenge of hallucinations in Large Language Models (LLMs). It presents a Retrieval Augmented Generation (RAG) framework grounded in external sources of knowledge and enhanced by domain-specific prompt engineering for healthcare. To evaluate reliability, the Negative Missing Information Scoring System (NMISS) is introduced, a system-level scoring that extends standard metrics with contextual verification. Empirical tests on Italian healthcare-related news articles show how RAG and NMISS together improve the trustworthiness of LLM outputs. The second chapter introduces a Multimodal hEalth Data lakehouse for ITAly (MEDITA), a multimodal Lakehouse designed for Italian public health data. By integrating structured and unstructured sources through adaptive pipelines, MEDITA provides a unified environment for statistical analysis, forecasting, and interactive exploration. This proof-of-concept demonstrates the feasibility of a national-scale infrastructure that bridges the gap between raw data availability and actionable insights. The third chapter focuses on sustainability in machine learning, framed within the paradigm of Green AI. It delivers a comprehensive study of MultiClass Classification (MCC) strategies, systematically comparing accuracy, training time, and environmental impact. A dedicated evaluation pipeline monitors energy consumption and CO2 emissions. Results reveal that lightweight classifiers achieve competitive accuracy at a fraction of the cost of heavy models, underscoring the importance of balancing predictive performance with environmental responsibility.

Essays on Data Frameworks and Sustainable AI for Public Health

PRIOLA, MARIA PAOLA

2026

Abstract

This thesis explores data-centric and computational approaches to public health, integrating methods from data engineering, Artificial Intelligence (AI), and statistical evaluation. The overarching goal is to promote reliability, interpretability, and sustainability in the management and analysis of health data. The first chapter addresses the challenge of hallucinations in Large Language Models (LLMs). It presents a Retrieval Augmented Generation (RAG) framework grounded in external sources of knowledge and enhanced by domain-specific prompt engineering for healthcare. To evaluate reliability, the Negative Missing Information Scoring System (NMISS) is introduced, a system-level scoring that extends standard metrics with contextual verification. Empirical tests on Italian healthcare-related news articles show how RAG and NMISS together improve the trustworthiness of LLM outputs. The second chapter introduces a Multimodal hEalth Data lakehouse for ITAly (MEDITA), a multimodal Lakehouse designed for Italian public health data. By integrating structured and unstructured sources through adaptive pipelines, MEDITA provides a unified environment for statistical analysis, forecasting, and interactive exploration. This proof-of-concept demonstrates the feasibility of a national-scale infrastructure that bridges the gap between raw data availability and actionable insights. The third chapter focuses on sustainability in machine learning, framed within the paradigm of Green AI. It delivers a comprehensive study of MultiClass Classification (MCC) strategies, systematically comparing accuracy, training time, and environmental impact. A dedicated evaluation pipeline monitors energy consumption and CO2 emissions. Results reveal that lightweight classifiers achieve competitive accuracy at a fraction of the cost of heavy models, underscoring the importance of balancing predictive performance with environmental responsibility.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				SCIENZE ECONOMICHE ED AZIENDALI
			
	Data di pubblicazione
	
				10-mar-2026
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				CONVERSANO, CLAUDIO
ORTU, MARCO
			
	Nome Editore
	
				Università degli Studi di Cagliari
			
	Collezione di appartenenza
	
				Università degli Studi di Cagliari

File in questo prodotto:

File	Dimensione	Formato
mariapaolapriola_tesidottorato_rev.pdf embargo fino al 10/03/2027 Licenza: Tutti i diritti riservati Dimensione 4.5 MB Formato Adobe PDF	4.5 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/360613

Il codice NBN di questa tesi è URN:NBN:IT:UNICA-360613