Objective: Disorders of the upper aerodigestive tract, especially laryngeal heterogeneous lesions, present key clinical challenges for early diagnosis and therapeutic planning. Endoscopic examination is the current gold standard; however, operator-related factors and image quality have a significant impact on diagnostic accuracy. Moreover, intraoperative identification of tumor margins is often imprecise, leading to incomplete resections and increased recurrence risk. This research introduces a comprehensive Artificial Intelligence (AI)-assisted vision system designed to support clinicians in laryngeal lesion screening, diagnosis, and surgical margin assessment. The ultimate goal is to enhance the accuracy, speed, and objectivity of laryngeal cancer diagnosis and treatment planning, while reducing operator dependence and improving clinical workflow efficiency. Approach: The proposed framework combines multiple AI modules to autonomously process and interpret endoscopic data collected in standard clinical settings during laryngeal examinations, both in-office and intraoperatively. Deep learning approaches were utilized for addressing four main tasks: I. Informative frame selection to identify diagnostically relevant frames in endoscopic videos; II. Lesion detection using YOLO-based architectures enhanced with a super-resolution branch; III. Binary low or high risk classification and multi-class lesion discrimination through convolutional and transformer-based networks; IV. Laryngeal tumor margin segmentation exploiting foundation models such as the Segment Anything Model (SAM). A Latent Diffusion Model (LDM) was also designed to generate clinically guided synthetic laryngeal data, while a custom user-friendly graphical user interface (GUI) was implemented to integrate all AI modules for real-time clinical use and validation at the Unit of Otolaryngology and Head and Neck Surgery of the IRCCS San Martino Hospital (Genoa, Italy). A large multicenter White-Light (WL) and Narrow Band Imaging (NBI) endoscopic dataset was used to train the models, and external validation was carried out with data from as many international institutions. Main results: The framework achieved high and robust performance across the four different tasks: I. The informative frame classifier reached an F1-score of 96% with real-time performance (<0.03 seconds/frame); II. The proposed detection model (SRE-YOLO) achieved a mean Average Precision@Intersection over Union=0.5 > 80% internally and externally, with significant improvements using synthetic data augmentation III. To classify lesions, the transformer-based model reached F1-scores of 85–89% for Low-risk/High-risk differentiation and up to 76% for multiclass lesion type classification. IV. Segmentation experiments with finetuned SAM obtained a Dice Similarity Coefficient (DSC) above 90%, closely matching expert annotations. Integration into the VERA prototype enabled validation on > 110 patients, achieving 85% F1-score and strong concordance with clinicians in lesion risk prediction at the patient-level. Significance: This work introduces one of the first comprehensive AI pipelines for real-time laryngeal assessment, spanning from image quality evaluation to intraoperative tumor margin definition. Its clinical validation demonstrated that AI can provide objective, reproducible support to clinicians with different levels of expertise in both outpatient and surgical settings. Future directions include releasing the collected anonymized datasets to the public to ensure reproducibility and further research, incorporation of patient metadata for personalized treatment suggestions, models’ optimization, and large-scale clinical validation for accelerating the adoption of AI in laryngology.
Toward AI-Assisted Laryngology: Real-Time Lesion Screening, Risk Classification, and Tumor Margin Assessment
BALDINI, CHIARA
2026
Abstract
Objective: Disorders of the upper aerodigestive tract, especially laryngeal heterogeneous lesions, present key clinical challenges for early diagnosis and therapeutic planning. Endoscopic examination is the current gold standard; however, operator-related factors and image quality have a significant impact on diagnostic accuracy. Moreover, intraoperative identification of tumor margins is often imprecise, leading to incomplete resections and increased recurrence risk. This research introduces a comprehensive Artificial Intelligence (AI)-assisted vision system designed to support clinicians in laryngeal lesion screening, diagnosis, and surgical margin assessment. The ultimate goal is to enhance the accuracy, speed, and objectivity of laryngeal cancer diagnosis and treatment planning, while reducing operator dependence and improving clinical workflow efficiency. Approach: The proposed framework combines multiple AI modules to autonomously process and interpret endoscopic data collected in standard clinical settings during laryngeal examinations, both in-office and intraoperatively. Deep learning approaches were utilized for addressing four main tasks: I. Informative frame selection to identify diagnostically relevant frames in endoscopic videos; II. Lesion detection using YOLO-based architectures enhanced with a super-resolution branch; III. Binary low or high risk classification and multi-class lesion discrimination through convolutional and transformer-based networks; IV. Laryngeal tumor margin segmentation exploiting foundation models such as the Segment Anything Model (SAM). A Latent Diffusion Model (LDM) was also designed to generate clinically guided synthetic laryngeal data, while a custom user-friendly graphical user interface (GUI) was implemented to integrate all AI modules for real-time clinical use and validation at the Unit of Otolaryngology and Head and Neck Surgery of the IRCCS San Martino Hospital (Genoa, Italy). A large multicenter White-Light (WL) and Narrow Band Imaging (NBI) endoscopic dataset was used to train the models, and external validation was carried out with data from as many international institutions. Main results: The framework achieved high and robust performance across the four different tasks: I. The informative frame classifier reached an F1-score of 96% with real-time performance (<0.03 seconds/frame); II. The proposed detection model (SRE-YOLO) achieved a mean Average Precision@Intersection over Union=0.5 > 80% internally and externally, with significant improvements using synthetic data augmentation III. To classify lesions, the transformer-based model reached F1-scores of 85–89% for Low-risk/High-risk differentiation and up to 76% for multiclass lesion type classification. IV. Segmentation experiments with finetuned SAM obtained a Dice Similarity Coefficient (DSC) above 90%, closely matching expert annotations. Integration into the VERA prototype enabled validation on > 110 patients, achieving 85% F1-score and strong concordance with clinicians in lesion risk prediction at the patient-level. Significance: This work introduces one of the first comprehensive AI pipelines for real-time laryngeal assessment, spanning from image quality evaluation to intraoperative tumor margin definition. Its clinical validation demonstrated that AI can provide objective, reproducible support to clinicians with different levels of expertise in both outpatient and surgical settings. Future directions include releasing the collected anonymized datasets to the public to ensure reproducibility and further research, incorporation of patient metadata for personalized treatment suggestions, models’ optimization, and large-scale clinical validation for accelerating the adoption of AI in laryngology.| File | Dimensione | Formato | |
|---|---|---|---|
|
phdunige_5549612.pdf
accesso aperto
Licenza:
Tutti i diritti riservati
Dimensione
41.88 MB
Formato
Adobe PDF
|
41.88 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/361667
URN:NBN:IT:UNIGE-361667