Toward AI-Assisted Laryngology: Real-Time Lesion Screening, Risk Classification, and Tumor Margin Assessment

Baldini, Chiara

Objective: Disorders of the upper aerodigestive tract, especially laryngeal heterogeneous lesions, present key clinical challenges for early diagnosis and therapeutic planning. Endoscopic examination is the current gold standard; however, operator-related factors and image quality have a significant impact on diagnostic accuracy. Moreover, intraoperative identification of tumor margins is often imprecise, leading to incomplete resections and increased recurrence risk. This research introduces a comprehensive Artificial Intelligence (AI)-assisted vision system designed to support clinicians in laryngeal lesion screening, diagnosis, and surgical margin assessment. The ultimate goal is to enhance the accuracy, speed, and objectivity of laryngeal cancer diagnosis and treatment planning, while reducing operator dependence and improving clinical workflow efficiency. Approach: The proposed framework combines multiple AI modules to autonomously process and interpret endoscopic data collected in standard clinical settings during laryngeal examinations, both in-office and intraoperatively. Deep learning approaches were utilized for addressing four main tasks: I. Informative frame selection to identify diagnostically relevant frames in endoscopic videos; II. Lesion detection using YOLO-based architectures enhanced with a super-resolution branch; III. Binary low or high risk classification and multi-class lesion discrimination through convolutional and transformer-based networks; IV. Laryngeal tumor margin segmentation exploiting foundation models such as the Segment Anything Model (SAM). A Latent Diffusion Model (LDM) was also designed to generate clinically guided synthetic laryngeal data, while a custom user-friendly graphical user interface (GUI) was implemented to integrate all AI modules for real-time clinical use and validation at the Unit of Otolaryngology and Head and Neck Surgery of the IRCCS San Martino Hospital (Genoa, Italy). A large multicenter White-Light (WL) and Narrow Band Imaging (NBI) endoscopic dataset was used to train the models, and external validation was carried out with data from as many international institutions. Main results: The framework achieved high and robust performance across the four different tasks: I. The informative frame classifier reached an F1-score of 96% with real-time performance (<0.03 seconds/frame); II. The proposed detection model (SRE-YOLO) achieved a mean Average Precision@Intersection over Union=0.5 > 80% internally and externally, with significant improvements using synthetic data augmentation III. To classify lesions, the transformer-based model reached F1-scores of 85–89% for Low-risk/High-risk differentiation and up to 76% for multiclass lesion type classification. IV. Segmentation experiments with finetuned SAM obtained a Dice Similarity Coefficient (DSC) above 90%, closely matching expert annotations. Integration into the VERA prototype enabled validation on > 110 patients, achieving 85% F1-score and strong concordance with clinicians in lesion risk prediction at the patient-level. Significance: This work introduces one of the first comprehensive AI pipelines for real-time laryngeal assessment, spanning from image quality evaluation to intraoperative tumor margin definition. Its clinical validation demonstrated that AI can provide objective, reproducible support to clinicians with different levels of expertise in both outpatient and surgical settings. Future directions include releasing the collected anonymized datasets to the public to ensure reproducibility and further research, incorporation of patient metadata for personalized treatment suggestions, models’ optimization, and large-scale clinical validation for accelerating the adoption of AI in laryngology.

Toward AI-Assisted Laryngology: Real-Time Lesion Screening, Risk Classification, and Tumor Margin Assessment

BALDINI, CHIARA

2026

Abstract

Objective: Disorders of the upper aerodigestive tract, especially laryngeal heterogeneous lesions, present key clinical challenges for early diagnosis and therapeutic planning. Endoscopic examination is the current gold standard; however, operator-related factors and image quality have a significant impact on diagnostic accuracy. Moreover, intraoperative identification of tumor margins is often imprecise, leading to incomplete resections and increased recurrence risk. This research introduces a comprehensive Artificial Intelligence (AI)-assisted vision system designed to support clinicians in laryngeal lesion screening, diagnosis, and surgical margin assessment. The ultimate goal is to enhance the accuracy, speed, and objectivity of laryngeal cancer diagnosis and treatment planning, while reducing operator dependence and improving clinical workflow efficiency. Approach: The proposed framework combines multiple AI modules to autonomously process and interpret endoscopic data collected in standard clinical settings during laryngeal examinations, both in-office and intraoperatively. Deep learning approaches were utilized for addressing four main tasks: I. Informative frame selection to identify diagnostically relevant frames in endoscopic videos; II. Lesion detection using YOLO-based architectures enhanced with a super-resolution branch; III. Binary low or high risk classification and multi-class lesion discrimination through convolutional and transformer-based networks; IV. Laryngeal tumor margin segmentation exploiting foundation models such as the Segment Anything Model (SAM). A Latent Diffusion Model (LDM) was also designed to generate clinically guided synthetic laryngeal data, while a custom user-friendly graphical user interface (GUI) was implemented to integrate all AI modules for real-time clinical use and validation at the Unit of Otolaryngology and Head and Neck Surgery of the IRCCS San Martino Hospital (Genoa, Italy). A large multicenter White-Light (WL) and Narrow Band Imaging (NBI) endoscopic dataset was used to train the models, and external validation was carried out with data from as many international institutions. Main results: The framework achieved high and robust performance across the four different tasks: I. The informative frame classifier reached an F1-score of 96% with real-time performance (<0.03 seconds/frame); II. The proposed detection model (SRE-YOLO) achieved a mean Average Precision@Intersection over Union=0.5 > 80% internally and externally, with significant improvements using synthetic data augmentation III. To classify lesions, the transformer-based model reached F1-scores of 85–89% for Low-risk/High-risk differentiation and up to 76% for multiclass lesion type classification. IV. Segmentation experiments with finetuned SAM obtained a Dice Similarity Coefficient (DSC) above 90%, closely matching expert annotations. Integration into the VERA prototype enabled validation on > 110 patients, achieving 85% F1-score and strong concordance with clinicians in lesion risk prediction at the patient-level. Significance: This work introduces one of the first comprehensive AI pipelines for real-time laryngeal assessment, spanning from image quality evaluation to intraoperative tumor margin definition. Its clinical validation demonstrated that AI can provide objective, reproducible support to clinicians with different levels of expertise in both outpatient and surgical settings. Future directions include releasing the collected anonymized datasets to the public to ensure reproducibility and further research, incorporation of patient metadata for personalized treatment suggestions, models’ optimization, and large-scale clinical validation for accelerating the adoption of AI in laryngology.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				100023 - Dipartimento di Informatica, bioingegneria, robotica e ingegneria dei sistemi
			
	Corso di studio
	
				XXXVIII CICLO - BIOINGEGNERIA E ROBOTICA - BIOENGINEERING AND ROBOTICS - BIOENGINEERING
			
	Data di pubblicazione
	
				23-feb-2026
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				DE MATTOS, LEONARDO
MOCCIA, SARA
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				MASSOBRIO, PAOLO
			
	Nome Editore
	
				Università degli studi di Genova
			
	Collezione di appartenenza
	
				Università degli Studi di Genova

File in questo prodotto:

File	Dimensione	Formato
phdunige_5549612.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 41.88 MB Formato Adobe PDF Visualizza/Apri	41.88 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/361667

Il codice NBN di questa tesi è URN:NBN:IT:UNIGE-361667