Introduction: Thyroid nodules categorized as Bethesda III and IV remain diagnostically challenging due to indeterminate cytology, frequently leading to unnecessary surgeries or delayed therapeutic decisions. Fine Needle Aspiration Biopsy (FNAB) yields inconclusive results in these categories, highlighting the need for alternative diagnostic and risk-stratification tools. Aim: To evaluate the diagnostic accuracy and clinical utility of an Artificial Intelligence (AI)–based classification system for thyroid nodules with indeterminate cytology (Bethesda III and IV), and to compare its performance with histopathological outcomes and observed clinical management. Methods: This diagnostic accuracy study was conducted at the University Hospital Center “Mother Teresa” in Tirana and Memorial Hospital in Fier, Albania, between January 2023 and September 2025. A total of 138 adult patients with thyroid nodules who underwent FNAB with digitized cytology images and had definitive surgical histopathology were included. The prespecified indeterminate subgroup comprised Bethesda III–IV nodules (n = 53). A pre-trained AI system was applied retrospectively and offline to FNAB cytology images to generate benign/malignant classifications and malignancy risk scores, which were compared against histopathology as the reference standard. AI outputs were not available to clinicians and did not influence clinical decision-making. Post-hoc analyses examined associations between AI classifications and observed initial management in Bethesda III–IV nodules. Results: Across the entire cohort (n = 138), the AI classifier demonstrated balanced diagnostic performance, with an overall accuracy of approximately 87% and strong rule-out capability (negative predictive value ~90%). FNAB showed excellent performance in evaluable categories (Bethesda II, V, and VI), with near-perfect specificity, but did not adequately address indeterminate nodules. Within the Bethesda III–IV subgroup, AI maintained comparable accuracy (~87%), with high negative predictive value (95%) and moderate positive predictive value (62%). Clinical management patterns were significantly associated with AI classifications: AI-benign nodules were more frequently managed conservatively (60%), whereas AI-malignant nodules more often underwent surgery (85%) (p = 0.009; OR ≈ 8). In multivariable logistic regression analysis, an AI-malignant classification independently predicted surgical intervention (OR 7.06, 95% CI 1.18–42.42; p = 0.032), after adjustment for age, sex, nodule laterality, and structural characteristics. Mixed versus cystic nodule structure was associated with lower odds of surgery (OR 0.18, 95% CI 0.03–0.97; p = 0.046). Conclusions: Prediction of treatment and follow-up in this study refers to post-hoc associations between AI-based classifications and observed clinical management, without AI-guided decision-making. The findings demonstrate that AI-based analysis of digitized FNAB slides provides accurate, non-invasive diagnostic and risk stratification capabilities, particularly in indeterminate thyroid nodules where conventional cytology has known limitations. By reliably classifying cases and showing alignment with histopathological outcomes and real-world management patterns, AI shows strong potential to inform future diagnostic pathways and to support clinical decision-making following prospective interventional validation.

The role of artificial intelligence in predicting treatment and follow up of thyroid disorders compared with fine needle aspiration

HAXHIU, ASFLORAL
2026

Abstract

Introduction: Thyroid nodules categorized as Bethesda III and IV remain diagnostically challenging due to indeterminate cytology, frequently leading to unnecessary surgeries or delayed therapeutic decisions. Fine Needle Aspiration Biopsy (FNAB) yields inconclusive results in these categories, highlighting the need for alternative diagnostic and risk-stratification tools. Aim: To evaluate the diagnostic accuracy and clinical utility of an Artificial Intelligence (AI)–based classification system for thyroid nodules with indeterminate cytology (Bethesda III and IV), and to compare its performance with histopathological outcomes and observed clinical management. Methods: This diagnostic accuracy study was conducted at the University Hospital Center “Mother Teresa” in Tirana and Memorial Hospital in Fier, Albania, between January 2023 and September 2025. A total of 138 adult patients with thyroid nodules who underwent FNAB with digitized cytology images and had definitive surgical histopathology were included. The prespecified indeterminate subgroup comprised Bethesda III–IV nodules (n = 53). A pre-trained AI system was applied retrospectively and offline to FNAB cytology images to generate benign/malignant classifications and malignancy risk scores, which were compared against histopathology as the reference standard. AI outputs were not available to clinicians and did not influence clinical decision-making. Post-hoc analyses examined associations between AI classifications and observed initial management in Bethesda III–IV nodules. Results: Across the entire cohort (n = 138), the AI classifier demonstrated balanced diagnostic performance, with an overall accuracy of approximately 87% and strong rule-out capability (negative predictive value ~90%). FNAB showed excellent performance in evaluable categories (Bethesda II, V, and VI), with near-perfect specificity, but did not adequately address indeterminate nodules. Within the Bethesda III–IV subgroup, AI maintained comparable accuracy (~87%), with high negative predictive value (95%) and moderate positive predictive value (62%). Clinical management patterns were significantly associated with AI classifications: AI-benign nodules were more frequently managed conservatively (60%), whereas AI-malignant nodules more often underwent surgery (85%) (p = 0.009; OR ≈ 8). In multivariable logistic regression analysis, an AI-malignant classification independently predicted surgical intervention (OR 7.06, 95% CI 1.18–42.42; p = 0.032), after adjustment for age, sex, nodule laterality, and structural characteristics. Mixed versus cystic nodule structure was associated with lower odds of surgery (OR 0.18, 95% CI 0.03–0.97; p = 0.046). Conclusions: Prediction of treatment and follow-up in this study refers to post-hoc associations between AI-based classifications and observed clinical management, without AI-guided decision-making. The findings demonstrate that AI-based analysis of digitized FNAB slides provides accurate, non-invasive diagnostic and risk stratification capabilities, particularly in indeterminate thyroid nodules where conventional cytology has known limitations. By reliably classifying cases and showing alignment with histopathological outcomes and real-world management patterns, AI shows strong potential to inform future diagnostic pathways and to support clinical decision-making following prospective interventional validation.
20-gen-2026
Inglese
D'ANDREA, Vito
MINGOLI, Andrea
Università degli Studi di Roma "La Sapienza"
93
File in questo prodotto:
File Dimensione Formato  
Tesi_dottorato_Haxhiu.pdf

accesso aperto

Licenza: Creative Commons
Dimensione 1.2 MB
Formato Adobe PDF
1.2 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/358654
Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-358654