SOCIAL MEDIA MINING PER LA SORVEGLIANZA DELLA SICUREZZA ALIMENTARE IN ITALIA

Salaris, Silvano

Foodborne diseases represent a significant global health challenge, contributing to substantial morbidity and mortality each year. Conventional surveillance systems, while essential, are limited by delayed reporting and the under-detection of outbreaks. This thesis explores an innovative approach to food safety surveillance, leveraging social media mining and artificial intelligence (AI) to enhance the early detection and management of foodborne illnesses, thanks to advanced syndromic surveillance. The project is structured into three key temporal phases, reflecting a progressive evolution in methodology and technology. Initially, the first phase involved a comprehensive review of existing literature to understand how social media data and machine learning (ML) techniques have been employed to detect foodborne events1. This analysis covered studies spanning over a decade of research, focusing on the application of shallow and deep learning models to social media platforms such as Twitter and Yelp. Results indicated that shallow learning models dominated the field. Some gaps were identified, including a lack of standardized methodologies and the limited exploration of deep learning approaches. This work highlighted the potential of social media as a valuable source for syndromic surveillance, setting the stage for subsequent phases and revealing Twitter and shallow machine learning models as the most commonly used social media platform and methodologies for researchers exploring this area of study. Building on these insights, the second phase developed a system for detecting foodborne events using Twitter data2. This retrospective study focused on tweets in Italian, spanning six years, and used keywords related to foodborne pathogens to identify potential outbreaks. Using R for data processing, the system extracted and analyzed peaks in tweet frequency, applying text mining techniques to generate word clouds. These visualizations provided critical information about geographic areas, vehicles of infection, and implicated food industries. The analysis successfully linked 76 word clouds to documented outbreaks, further demonstrating the potential of social media data to complement traditional surveillance methods. However, 55 word clouds yielded unclear or irrelevant results, revealing challenges in data noise and specificity. Despite these limitations, the study confirmed the value of social media in uncovering unreported events and enhancing outbreak response. In response to policy changes limiting Twitter data access, the project transitioned in the third phase, by using Google Reviews data with a more advanced analysis approach thanks to the rise of large language models (LLMs)3. The SAFEGUARD system was developed to analyze restaurant reviews from Google, leveraging the power of gpt-4o-mini for sentiment analysis and classification. This system retrieves reviews using APIs, stores them in a MongoDB database, and processes them through an AI-powered pipeline to detect potential foodborne illness events. A case study in Padua, Italy, demonstrated the system’s efficacy, retrieving approximately 45.000 reviews from 719 establishments. The results identified approximately 2% of reviews as indicative of potential foodborne infections, with performance metrics including 93.9% accuracy, 14.3% precision, 100% recall, and an F1 score of 25%. These findings highlight the system’s reliability and scalability, underscoring the system's utility for proactive public health interventions.4 This thesis illustrates the progressive development of methodologies for foodborne disease detection, from the exploration of literature to the implementation of advanced AI-driven solutions. It underscores the critical role of social media and AI in addressing limitations of traditional surveillance systems.

SOCIAL MEDIA MINING PER LA SORVEGLIANZA DELLA SICUREZZA ALIMENTARE IN ITALIA

SALARIS, SILVANO

2025

Abstract

Foodborne diseases represent a significant global health challenge, contributing to substantial morbidity and mortality each year. Conventional surveillance systems, while essential, are limited by delayed reporting and the under-detection of outbreaks. This thesis explores an innovative approach to food safety surveillance, leveraging social media mining and artificial intelligence (AI) to enhance the early detection and management of foodborne illnesses, thanks to advanced syndromic surveillance. The project is structured into three key temporal phases, reflecting a progressive evolution in methodology and technology. Initially, the first phase involved a comprehensive review of existing literature to understand how social media data and machine learning (ML) techniques have been employed to detect foodborne events1. This analysis covered studies spanning over a decade of research, focusing on the application of shallow and deep learning models to social media platforms such as Twitter and Yelp. Results indicated that shallow learning models dominated the field. Some gaps were identified, including a lack of standardized methodologies and the limited exploration of deep learning approaches. This work highlighted the potential of social media as a valuable source for syndromic surveillance, setting the stage for subsequent phases and revealing Twitter and shallow machine learning models as the most commonly used social media platform and methodologies for researchers exploring this area of study. Building on these insights, the second phase developed a system for detecting foodborne events using Twitter data2. This retrospective study focused on tweets in Italian, spanning six years, and used keywords related to foodborne pathogens to identify potential outbreaks. Using R for data processing, the system extracted and analyzed peaks in tweet frequency, applying text mining techniques to generate word clouds. These visualizations provided critical information about geographic areas, vehicles of infection, and implicated food industries. The analysis successfully linked 76 word clouds to documented outbreaks, further demonstrating the potential of social media data to complement traditional surveillance methods. However, 55 word clouds yielded unclear or irrelevant results, revealing challenges in data noise and specificity. Despite these limitations, the study confirmed the value of social media in uncovering unreported events and enhancing outbreak response. In response to policy changes limiting Twitter data access, the project transitioned in the third phase, by using Google Reviews data with a more advanced analysis approach thanks to the rise of large language models (LLMs)3. The SAFEGUARD system was developed to analyze restaurant reviews from Google, leveraging the power of gpt-4o-mini for sentiment analysis and classification. This system retrieves reviews using APIs, stores them in a MongoDB database, and processes them through an AI-powered pipeline to detect potential foodborne illness events. A case study in Padua, Italy, demonstrated the system’s efficacy, retrieving approximately 45.000 reviews from 719 establishments. The results identified approximately 2% of reviews as indicative of potential foodborne infections, with performance metrics including 93.9% accuracy, 14.3% precision, 100% recall, and an F1 score of 25%. These findings highlight the system’s reliability and scalability, underscoring the system's utility for proactive public health interventions.4 This thesis illustrates the progressive development of methodologies for foodborne disease detection, from the exploration of literature to the implementation of advanced AI-driven solutions. It underscores the critical role of social media and AI in addressing limitations of traditional surveillance systems.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				MEDICINA SPECIALISTICA TRASLAZIONALE "G.B. MORGAGNI"
			
	Data di pubblicazione
	
				10-giu-2025
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				GREGORI, DARIO
			
	Nome Editore
	
				Università degli studi di Padova
			
	Collezione di appartenenza
	
				Università degli Studi di Padova

File in questo prodotto:

File	Dimensione	Formato
Tesi_Silvano_Salaris.pdf accesso aperto Dimensione 2.22 MB Formato Adobe PDF Visualizza/Apri	2.22 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/220386

Il codice NBN di questa tesi è URN:NBN:IT:UNIPD-220386