Personalized Medicine and Process
Optimization: Analysis and Implementation of
Intelligent Tools to Support the Clinical Process

Wehbe, Alaa

Artificial Intelligence (AI) has profoundly transformed medical imaging, enabling new frontiers in disease detection, staging, and diagnostic decision support. Yet, despite re markableprogress, mostAItoolsremainsiloed—disconnectedfromPictureArchivingand Communication Systems (PACS) and Radiology Information Systems (RIS)—and often constrained by closed vocabularies that limit interpretability and clinical integration. This thesis addresses these challenges by designing a sequence of retrieval-driven, explainable AI frameworks that progressively bridge detection, retrieval, and multimodal reasoning for medical imaging, with a specific focus on lung cancer and thoracic radiography. In the first stage, lung cancer is adopted as a case study for closed-vocabulary analysis. A 3D volumetric pipeline was developed using the YOLOv8 architecture for simulta neous detection and subtype classification of Adenocarcinoma (ADC), Squamous Cell Carcinoma (SCC), and Small Cell Lung Cancer (SCLC) on CT scans. Experimental re sults demonstrated a mean Average Precision (mAP) of 97.1%, with the YOLOv8-Small variant achieving 96.1% precision, a recall of 0.91, and a detection speed of 0.22 sec onds—surpassing two-stage models such as Faster R-CNN and YOLOv7. The extracted features were subsequently used to train a custom TNMClassifier, achieving an overall accuracy of 98% in staging classification. This integration of subtype and stage detection established a strong foundation for clinically relevant representation learning. Buildingupontheseembeddings, thesecondstageintroducedaContent-BasedImage Medical Retrieval (CBIMR) system that leverages YOLOv8-derived features to retrieve clinically similar cases across cancer groups and TNM stages. The retrieval framework achieved a precision of 0.961, recall of 0.945, and mAP@0.5 of 0.971, effectively linking detection and retrieval pipelines. This detection-driven CBIR model demonstrated how region-aware embeddings can enhance interpretability, consistency, and case-based reasoning—key prerequisites for integration into PACS/RIS environments. The final stage transitions toward open-vocabulary and multimodal learning through the introduction of MedFL (Medical Florence), a unified framework for medical im age retrieval and report generation. Built upon the Florence-2 Vision–Language Model (VLM), MedFL employs parameter-efficient fine-tuning via LoRA on the MIMIC-CXR dataset and introduces a novel prompt-conditioned feature fusion strategy combining three complementary representations: CAPTION, DETAILED_CAPTION, and OD. The fused 2304-dimensional embeddings enable cross-modal retrieval and automated report gener ation within a unified space. Extensive experiments demonstrate that MedFL achieves Recall@10 = 0.930 for retrieval and BLEU-4 = 0.3363 for report generation, outper forming strong baselines such as CLIP, BioMedCLIP, and GatorTron-CLIP. Furthermore, MedFLattains mAP=0.35@0.5IoUonVinDr-CXR,confirmingrobustlocalization and visual–textual alignment. Collectively, this thesis advances the paradigm of AI in medical imaging from de tection to understanding—evolving from closed-vocabulary CT-based detection to open vocabulary, multimodal reasoning in radiology. By integrating visual and textual rep resentations into a unified, explainable, and retrieval-driven architecture, the proposed vi frameworks lay the groundworkforclinically deployable systems that can assist physicians in case retrieval, diagnostic interpretation, and automated reporting, ultimately supporting more transparent, personalized, and efficient medical decision-making. Keywords: Medical imaging, Content-Based Image Retrieval (CBIR), YOLOv8, TNM staging, Vision–LanguageModels, MedFL,Multimodallearning,Florence-2,Explainable AI, Radiology

Personalized Medicine and Process Optimization: Analysis and Implementation of Intelligent Tools to Support the Clinical Process

WEHBE, ALAA

2026

Abstract

Artificial Intelligence (AI) has profoundly transformed medical imaging, enabling new frontiers in disease detection, staging, and diagnostic decision support. Yet, despite re markableprogress, mostAItoolsremainsiloed—disconnectedfromPictureArchivingand Communication Systems (PACS) and Radiology Information Systems (RIS)—and often constrained by closed vocabularies that limit interpretability and clinical integration. This thesis addresses these challenges by designing a sequence of retrieval-driven, explainable AI frameworks that progressively bridge detection, retrieval, and multimodal reasoning for medical imaging, with a specific focus on lung cancer and thoracic radiography. In the first stage, lung cancer is adopted as a case study for closed-vocabulary analysis. A 3D volumetric pipeline was developed using the YOLOv8 architecture for simulta neous detection and subtype classification of Adenocarcinoma (ADC), Squamous Cell Carcinoma (SCC), and Small Cell Lung Cancer (SCLC) on CT scans. Experimental re sults demonstrated a mean Average Precision (mAP) of 97.1%, with the YOLOv8-Small variant achieving 96.1% precision, a recall of 0.91, and a detection speed of 0.22 sec onds—surpassing two-stage models such as Faster R-CNN and YOLOv7. The extracted features were subsequently used to train a custom TNMClassifier, achieving an overall accuracy of 98% in staging classification. This integration of subtype and stage detection established a strong foundation for clinically relevant representation learning. Buildingupontheseembeddings, thesecondstageintroducedaContent-BasedImage Medical Retrieval (CBIMR) system that leverages YOLOv8-derived features to retrieve clinically similar cases across cancer groups and TNM stages. The retrieval framework achieved a precision of 0.961, recall of 0.945, and mAP@0.5 of 0.971, effectively linking detection and retrieval pipelines. This detection-driven CBIR model demonstrated how region-aware embeddings can enhance interpretability, consistency, and case-based reasoning—key prerequisites for integration into PACS/RIS environments. The final stage transitions toward open-vocabulary and multimodal learning through the introduction of MedFL (Medical Florence), a unified framework for medical im age retrieval and report generation. Built upon the Florence-2 Vision–Language Model (VLM), MedFL employs parameter-efficient fine-tuning via LoRA on the MIMIC-CXR dataset and introduces a novel prompt-conditioned feature fusion strategy combining three complementary representations: CAPTION, DETAILED_CAPTION, and OD. The fused 2304-dimensional embeddings enable cross-modal retrieval and automated report gener ation within a unified space. Extensive experiments demonstrate that MedFL achieves Recall@10 = 0.930 for retrieval and BLEU-4 = 0.3363 for report generation, outper forming strong baselines such as CLIP, BioMedCLIP, and GatorTron-CLIP. Furthermore, MedFLattains mAP=0.35@0.5IoUonVinDr-CXR,confirmingrobustlocalization and visual–textual alignment. Collectively, this thesis advances the paradigm of AI in medical imaging from de tection to understanding—evolving from closed-vocabulary CT-based detection to open vocabulary, multimodal reasoning in radiology. By integrating visual and textual rep resentations into a unified, explainable, and retrieval-driven architecture, the proposed vi frameworks lay the groundworkforclinically deployable systems that can assist physicians in case retrieval, diagnostic interpretation, and automated reporting, ultimately supporting more transparent, personalized, and efficient medical decision-making. Keywords: Medical imaging, Content-Based Image Retrieval (CBIR), YOLOv8, TNM staging, Vision–LanguageModels, MedFL,Multimodallearning,Florence-2,Explainable AI, Radiology

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				100026 - Dipartimento di Ingegneria navale, elettrica, elettronica e delle telecomunicazioni
			
	Corso di studio
	
				XXXVIII CICLO - SCIENZE E TECNOLOGIE PER L'INGEGNERIA ELETTRONICA E DELLE TELECOMUNICAZIONI - ELETTROMAGNETISMO, ELETTRONICA, TELECOMUNICAZIONI
			
	Data di pubblicazione
	
				10-mar-2026
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				DELLEPIANE, SILVANA
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				VALLE, MAURIZIO
			
	Nome Editore
	
				Università degli studi di Genova
			
	Collezione di appartenenza
	
				Università degli Studi di Genova

File in questo prodotto:

File	Dimensione	Formato
phdunige_5509991.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 7.01 MB Formato Adobe PDF Visualizza/Apri	7.01 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/362460

Il codice NBN di questa tesi è URN:NBN:IT:UNIGE-362460