Foodborne diseases represent a significant global health challenge, contributing to substantial morbidity and mortality each year. Conventional surveillance systems, while essential, are limited by delayed reporting and the under-detection of outbreaks. This thesis explores an innovative approach to food safety surveillance, leveraging social media mining and artificial intelligence (AI) to enhance the early detection and management of foodborne illnesses, thanks to advanced syndromic surveillance. The project is structured into three key temporal phases, reflecting a progressive evolution in methodology and technology. Initially, the first phase involved a comprehensive review of existing literature to understand how social media data and machine learning (ML) techniques have been employed to detect foodborne events1. This analysis covered studies spanning over a decade of research, focusing on the application of shallow and deep learning models to social media platforms such as Twitter and Yelp. Results indicated that shallow learning models dominated the field. Some gaps were identified, including a lack of standardized methodologies and the limited exploration of deep learning approaches. This work highlighted the potential of social media as a valuable source for syndromic surveillance, setting the stage for subsequent phases and revealing Twitter and shallow machine learning models as the most commonly used social media platform and methodologies for researchers exploring this area of study. Building on these insights, the second phase developed a system for detecting foodborne events using Twitter data2. This retrospective study focused on tweets in Italian, spanning six years, and used keywords related to foodborne pathogens to identify potential outbreaks. Using R for data processing, the system extracted and analyzed peaks in tweet frequency, applying text mining techniques to generate word clouds. These visualizations provided critical information about geographic areas, vehicles of infection, and implicated food industries. The analysis successfully linked 76 word clouds to documented outbreaks, further demonstrating the potential of social media data to complement traditional surveillance methods. However, 55 word clouds yielded unclear or irrelevant results, revealing challenges in data noise and specificity. Despite these limitations, the study confirmed the value of social media in uncovering unreported events and enhancing outbreak response. In response to policy changes limiting Twitter data access, the project transitioned in the third phase, by using Google Reviews data with a more advanced analysis approach thanks to the rise of large language models (LLMs)3. The SAFEGUARD system was developed to analyze restaurant reviews from Google, leveraging the power of gpt-4o-mini for sentiment analysis and classification. This system retrieves reviews using APIs, stores them in a MongoDB database, and processes them through an AI-powered pipeline to detect potential foodborne illness events. A case study in Padua, Italy, demonstrated the system’s efficacy, retrieving approximately 45.000 reviews from 719 establishments. The results identified approximately 2% of reviews as indicative of potential foodborne infections, with performance metrics including 93.9% accuracy, 14.3% precision, 100% recall, and an F1 score of 25%. These findings highlight the system’s reliability and scalability, underscoring the system's utility for proactive public health interventions.4 This thesis illustrates the progressive development of methodologies for foodborne disease detection, from the exploration of literature to the implementation of advanced AI-driven solutions. It underscores the critical role of social media and AI in addressing limitations of traditional surveillance systems.
SOCIAL MEDIA MINING PER LA SORVEGLIANZA DELLA SICUREZZA ALIMENTARE IN ITALIA
SALARIS, SILVANO
2025
Abstract
Foodborne diseases represent a significant global health challenge, contributing to substantial morbidity and mortality each year. Conventional surveillance systems, while essential, are limited by delayed reporting and the under-detection of outbreaks. This thesis explores an innovative approach to food safety surveillance, leveraging social media mining and artificial intelligence (AI) to enhance the early detection and management of foodborne illnesses, thanks to advanced syndromic surveillance. The project is structured into three key temporal phases, reflecting a progressive evolution in methodology and technology. Initially, the first phase involved a comprehensive review of existing literature to understand how social media data and machine learning (ML) techniques have been employed to detect foodborne events1. This analysis covered studies spanning over a decade of research, focusing on the application of shallow and deep learning models to social media platforms such as Twitter and Yelp. Results indicated that shallow learning models dominated the field. Some gaps were identified, including a lack of standardized methodologies and the limited exploration of deep learning approaches. This work highlighted the potential of social media as a valuable source for syndromic surveillance, setting the stage for subsequent phases and revealing Twitter and shallow machine learning models as the most commonly used social media platform and methodologies for researchers exploring this area of study. Building on these insights, the second phase developed a system for detecting foodborne events using Twitter data2. This retrospective study focused on tweets in Italian, spanning six years, and used keywords related to foodborne pathogens to identify potential outbreaks. Using R for data processing, the system extracted and analyzed peaks in tweet frequency, applying text mining techniques to generate word clouds. These visualizations provided critical information about geographic areas, vehicles of infection, and implicated food industries. The analysis successfully linked 76 word clouds to documented outbreaks, further demonstrating the potential of social media data to complement traditional surveillance methods. However, 55 word clouds yielded unclear or irrelevant results, revealing challenges in data noise and specificity. Despite these limitations, the study confirmed the value of social media in uncovering unreported events and enhancing outbreak response. In response to policy changes limiting Twitter data access, the project transitioned in the third phase, by using Google Reviews data with a more advanced analysis approach thanks to the rise of large language models (LLMs)3. The SAFEGUARD system was developed to analyze restaurant reviews from Google, leveraging the power of gpt-4o-mini for sentiment analysis and classification. This system retrieves reviews using APIs, stores them in a MongoDB database, and processes them through an AI-powered pipeline to detect potential foodborne illness events. A case study in Padua, Italy, demonstrated the system’s efficacy, retrieving approximately 45.000 reviews from 719 establishments. The results identified approximately 2% of reviews as indicative of potential foodborne infections, with performance metrics including 93.9% accuracy, 14.3% precision, 100% recall, and an F1 score of 25%. These findings highlight the system’s reliability and scalability, underscoring the system's utility for proactive public health interventions.4 This thesis illustrates the progressive development of methodologies for foodborne disease detection, from the exploration of literature to the implementation of advanced AI-driven solutions. It underscores the critical role of social media and AI in addressing limitations of traditional surveillance systems.File | Dimensione | Formato | |
---|---|---|---|
Tesi_Silvano_Salaris.pdf
accesso aperto
Dimensione
2.22 MB
Formato
Adobe PDF
|
2.22 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/220386
URN:NBN:IT:UNIPD-220386