Artificial Intelligence performance in Medical Imaging has reached a level that could only be expected of human experts just a few years ago. Despite this remarkable achievement, the clinical adoption of these powerful systems remains limited. Deficiencies in reliability, transparency and robustness hinder successful clinical deployment: models that perform well in controlled settings frequently produce overconfident, opaque and brittle predictions once introduced in the clinical practice. Research on these aspects is frequently conducted in an isolated fashion, seldom covering only specific components. This fragmented effort limits the effectiveness of the proposed solutions in addressing the trustworthiness gap in Medical Imaging AI. This dissertation follows a framework made up of five interdependent pillars, namely uncertainty quantification, robustness to domain shift, explainability and interpretability, data authenticity and representativeness, and clinical validation, integrating and operationalizing them across diverse imaging modalities and clinical domains. This contribution is presented in four studies, each guided by a specific clinical need and integrating multiple pillars together. Bayesian and approximate Bayesian deep learning classification was first applied to cardiac amyloidosis classification from PET data in a data scarcity setting, where uncertainty quantification and manifold learning were integrated to jointly assess predictive confidence and representation quality, as well as their accordance on the model’s internals. In resting-state fMRI for the classification of autism spectrum disorder, the probabilistic framework was extended to combine Bayesian inference with attribution-based explainability, allowing for the evaluation of attribution stability and the construction of global representations of model reasoning. The focus then moved from global classification to segmentation of the choroid plexus in the brain, with the development of an uncertainty aware voxel-wise prediction pipeline for multi-site structural MRI, designed to investigate how epistemic and aleatoric signals of uncertainty behave across sites and populations. Finally, we moved from discriminative inference to data generation, developing a 3D conditional generative model for longitudinal FDG-PET in Alzheimer’s disease and validating it via clinically grounded metabolic analysis.Results across studies sublimate into a series of valuable integrative findings. To start, there’s evidence that approximate Bayesian inference can provide a scalable path to confidence estimation, preserving core benefits of probabilistic reasoning under the typical computational constraints of the real-world clinical practice. These confidence signals also enable a synergy with explainability and representation diagnostics, as Bayesian sampling allows for the study of explanation stability, while attribution methods and manifold learning provide a spatial component to otherwise abstract uncertainty measures. This created a stratified approach that provides valuable information that would remain hidden in a deterministic pipeline. The synergy with probabilistic approaches also extends to the problem of domain shift, where epistemic and aleatoric uncertainty proxies are shown to reliably track distribution mismatch in an unsupervised fashion, qualifying uncertainty as an active mechanism for pre-deployment validation and post-deployment monitoring of AI systems’ robustness. Finally, on the data side the same integrative logic applies: a clinically grounded validation of generative models ensures the reproduction of disease-specific valuable signatures in synthetic data, strengthening trust in generative modeling for data representativeness and authenticity. Across these findings, this dissertation argues that trustworthiness is a systemic quality, that arises and is empowered by the integration and interaction of complementary mechanisms and techniques, embedded throughout the entire imaging pipeline, starting from data acquisition and generation, all the way down to post-deployment monitoring.
Towards Integrated Trustworthy AI in Medical Imaging: Applications Across Modalities and Clinical Domains
BARGAGNA, FILIPPO
2026
Abstract
Artificial Intelligence performance in Medical Imaging has reached a level that could only be expected of human experts just a few years ago. Despite this remarkable achievement, the clinical adoption of these powerful systems remains limited. Deficiencies in reliability, transparency and robustness hinder successful clinical deployment: models that perform well in controlled settings frequently produce overconfident, opaque and brittle predictions once introduced in the clinical practice. Research on these aspects is frequently conducted in an isolated fashion, seldom covering only specific components. This fragmented effort limits the effectiveness of the proposed solutions in addressing the trustworthiness gap in Medical Imaging AI. This dissertation follows a framework made up of five interdependent pillars, namely uncertainty quantification, robustness to domain shift, explainability and interpretability, data authenticity and representativeness, and clinical validation, integrating and operationalizing them across diverse imaging modalities and clinical domains. This contribution is presented in four studies, each guided by a specific clinical need and integrating multiple pillars together. Bayesian and approximate Bayesian deep learning classification was first applied to cardiac amyloidosis classification from PET data in a data scarcity setting, where uncertainty quantification and manifold learning were integrated to jointly assess predictive confidence and representation quality, as well as their accordance on the model’s internals. In resting-state fMRI for the classification of autism spectrum disorder, the probabilistic framework was extended to combine Bayesian inference with attribution-based explainability, allowing for the evaluation of attribution stability and the construction of global representations of model reasoning. The focus then moved from global classification to segmentation of the choroid plexus in the brain, with the development of an uncertainty aware voxel-wise prediction pipeline for multi-site structural MRI, designed to investigate how epistemic and aleatoric signals of uncertainty behave across sites and populations. Finally, we moved from discriminative inference to data generation, developing a 3D conditional generative model for longitudinal FDG-PET in Alzheimer’s disease and validating it via clinically grounded metabolic analysis.Results across studies sublimate into a series of valuable integrative findings. To start, there’s evidence that approximate Bayesian inference can provide a scalable path to confidence estimation, preserving core benefits of probabilistic reasoning under the typical computational constraints of the real-world clinical practice. These confidence signals also enable a synergy with explainability and representation diagnostics, as Bayesian sampling allows for the study of explanation stability, while attribution methods and manifold learning provide a spatial component to otherwise abstract uncertainty measures. This created a stratified approach that provides valuable information that would remain hidden in a deterministic pipeline. The synergy with probabilistic approaches also extends to the problem of domain shift, where epistemic and aleatoric uncertainty proxies are shown to reliably track distribution mismatch in an unsupervised fashion, qualifying uncertainty as an active mechanism for pre-deployment validation and post-deployment monitoring of AI systems’ robustness. Finally, on the data side the same integrative logic applies: a clinically grounded validation of generative models ensures the reproduction of disease-specific valuable signatures in synthetic data, strengthening trust in generative modeling for data representativeness and authenticity. Across these findings, this dissertation argues that trustworthiness is a systemic quality, that arises and is empowered by the integration and interaction of complementary mechanisms and techniques, embedded throughout the entire imaging pipeline, starting from data acquisition and generation, all the way down to post-deployment monitoring.| File | Dimensione | Formato | |
|---|---|---|---|
|
PhD_Thesis_Filippo_Bargagna.pdf
embargo fino al 09/06/2029
Licenza:
Tutti i diritti riservati
Dimensione
8.61 MB
Formato
Adobe PDF
|
8.61 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/372430
URN:NBN:IT:UNIPI-372430