In the realm of Healthcare 4.0, the integration of information and communication technologies and Artificial Intelligence (AI) has opened new avenues for enhancing patient care, especially for patients suffering from complex chronic conditions such as cancer and COVID-19. These advancements may have a great potential to support healthcare professionals if introduced into the clinical routines; for instance, they can help predict patient outcomes and facilitate early and personalised interventions. However, they face significant challenges: among them, data scarcity, privacy concerns, and a need for interpretability hinder the translation of AI models from research to clinical settings. Indeed, the robustness of AI models hinges on the availability of diverse, high-quality and privacy-preserving medical data. Additionally, the increasing complexity of machine learning methods, especially those processing multimodal data that integrate multiple data sources like imaging and clinical records, can appear as “black boxes” to healthcare practitioners. This thesis explores generative approaches to address these challenges, offering two main contributions to the field of AI in healthcare: the first investigates data scarcity and privacy concerns, whilst the second focuses on the need to interpret models working on multimodal data. Synthetic data generated by Generative Adversarial Networks (GANs) emerges as a viable solution to the pressing challenge of data scarcity and privacy of AI in healthcare. However, GANs, despite excelling in generating high-quality samples rapidly, struggle to represent the whole variability of the training data, falling short of achieving comprehensive mode coverage. Indeed, employing synthetic data from GANs in downstream tasks requires capturing the full variability of the data distributions. This thesis addresses this critical limitation by introducing two approaches: LatentAugment and GAN Ensembles. LatentAugment is a novel data augmentation method that aims to enhance the diversity and fidelity of synthetic data generated by GANs. It achieves this by manipulating the GAN’s latent space to force synthetic samples to better reproduce the variability of real-world medical datasets. LatentAugment allows the researcher to fully realise the potential of synthetic data from a single GAN, enabling its use in downstream tasks where data availability is limited, i.e., there is data scarcity. GAN Ensembles shifts from a single GAN to multiple GANs. This approach stems from the hypothesis that no single GAN can fully encompass the diversity of real-world data. By solving a multi-objective optimisation problem, GAN Ensembles aim to select the optimal combination of GANs that yield high-quality and diverse synthetic data with minimal redundancy. Indeed, the method ensures that each model contributes uniquely to the ensemble. GAN Ensembles alleviate the burden for practitioners and researchers in selecting which GAN to use and determining the ideal sampling point during training, proving pivotal in applications requiring data privacy. Together, LatentAugment and GAN Ensembles represent a significant advancement in overcoming the lack of mode coverage of synthetic data from GANs, paving the way for more extensive adoption of synthetic data in scenarios plagued by data scarcity and privacy concerns. We now turn our attention to the second contribution of this thesis. Interpreting medical findings often involves using data from multiple exams or modalities, such as images or health records. Each modality provides unique insights into patient health, capturing different aspects of medical conditions. Thus, for an AI system to understand the complex mechanisms underlying disease, it needs to be able to interpret multimodal medical data. Hence, there is a need for multimodal networks that can effectively integrate diverse medical data streams, such as images and medical records. However, the efficacy of these multimodal networks in a clinical setting hinges not only on their capabilities to process input from various sources but also on their interpretability. This thesis leverages deep generative models to explain the decisions taken by multimodal networks. We develop a deep architecture explainable by design, which jointly learns modality reconstructions and sample classifications using tabular and imaging data. It first creates a multimodal embedded representation of the input modalities. Then, applying a latent shift mechanism that simulates a counterfactual prediction on the embedded representation, we reveal the features of each modality that contribute the most to the decision and compute a quantitative score indicating the modality’s importance. This thesis proposes methods for effectively translating AI advancements into clinical practice, addressing critical challenges in data scarcity, privacy, and interpretability in healthcare.
Deep Generative Models for Healthcare: Improved Generalisation and Interpretability
TRONCHIN, LORENZO
2024
Abstract
In the realm of Healthcare 4.0, the integration of information and communication technologies and Artificial Intelligence (AI) has opened new avenues for enhancing patient care, especially for patients suffering from complex chronic conditions such as cancer and COVID-19. These advancements may have a great potential to support healthcare professionals if introduced into the clinical routines; for instance, they can help predict patient outcomes and facilitate early and personalised interventions. However, they face significant challenges: among them, data scarcity, privacy concerns, and a need for interpretability hinder the translation of AI models from research to clinical settings. Indeed, the robustness of AI models hinges on the availability of diverse, high-quality and privacy-preserving medical data. Additionally, the increasing complexity of machine learning methods, especially those processing multimodal data that integrate multiple data sources like imaging and clinical records, can appear as “black boxes” to healthcare practitioners. This thesis explores generative approaches to address these challenges, offering two main contributions to the field of AI in healthcare: the first investigates data scarcity and privacy concerns, whilst the second focuses on the need to interpret models working on multimodal data. Synthetic data generated by Generative Adversarial Networks (GANs) emerges as a viable solution to the pressing challenge of data scarcity and privacy of AI in healthcare. However, GANs, despite excelling in generating high-quality samples rapidly, struggle to represent the whole variability of the training data, falling short of achieving comprehensive mode coverage. Indeed, employing synthetic data from GANs in downstream tasks requires capturing the full variability of the data distributions. This thesis addresses this critical limitation by introducing two approaches: LatentAugment and GAN Ensembles. LatentAugment is a novel data augmentation method that aims to enhance the diversity and fidelity of synthetic data generated by GANs. It achieves this by manipulating the GAN’s latent space to force synthetic samples to better reproduce the variability of real-world medical datasets. LatentAugment allows the researcher to fully realise the potential of synthetic data from a single GAN, enabling its use in downstream tasks where data availability is limited, i.e., there is data scarcity. GAN Ensembles shifts from a single GAN to multiple GANs. This approach stems from the hypothesis that no single GAN can fully encompass the diversity of real-world data. By solving a multi-objective optimisation problem, GAN Ensembles aim to select the optimal combination of GANs that yield high-quality and diverse synthetic data with minimal redundancy. Indeed, the method ensures that each model contributes uniquely to the ensemble. GAN Ensembles alleviate the burden for practitioners and researchers in selecting which GAN to use and determining the ideal sampling point during training, proving pivotal in applications requiring data privacy. Together, LatentAugment and GAN Ensembles represent a significant advancement in overcoming the lack of mode coverage of synthetic data from GANs, paving the way for more extensive adoption of synthetic data in scenarios plagued by data scarcity and privacy concerns. We now turn our attention to the second contribution of this thesis. Interpreting medical findings often involves using data from multiple exams or modalities, such as images or health records. Each modality provides unique insights into patient health, capturing different aspects of medical conditions. Thus, for an AI system to understand the complex mechanisms underlying disease, it needs to be able to interpret multimodal medical data. Hence, there is a need for multimodal networks that can effectively integrate diverse medical data streams, such as images and medical records. However, the efficacy of these multimodal networks in a clinical setting hinges not only on their capabilities to process input from various sources but also on their interpretability. This thesis leverages deep generative models to explain the decisions taken by multimodal networks. We develop a deep architecture explainable by design, which jointly learns modality reconstructions and sample classifications using tabular and imaging data. It first creates a multimodal embedded representation of the input modalities. Then, applying a latent shift mechanism that simulates a counterfactual prediction on the embedded representation, we reveal the features of each modality that contribute the most to the decision and compute a quantitative score indicating the modality’s importance. This thesis proposes methods for effectively translating AI advancements into clinical practice, addressing critical challenges in data scarcity, privacy, and interpretability in healthcare.File | Dimensione | Formato | |
---|---|---|---|
PhD_Tronchin_Lorenzo.pdf
accesso aperto
Dimensione
8.52 MB
Formato
Adobe PDF
|
8.52 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/122871
URN:NBN:IT:UNICAMPUS-122871