This thesis develops an Artificial Intelligence (AI) approach intended for accurate patient stratification and precise diagnostics/prognostics in clinical and preclinical applications. The rapid advance in high throughput technologies and bioinformatics tools is still far from linking precisely the genome-phenotype interactions with the biological mechanisms that underlie pathophysiological conditions. In practice, the incomplete knowledge on individual heterogeneity in complex diseases keeps forcing clinicians to settle for surrogate endpoints and therapies based on a generic one-size-fits-all approach. The working hypothesis is that AI can add new tools to elaborate and integrate together in new features or structures the rich information now available from high-throughput omics and bioimaging data, and that such re- structured information can be applied through predictive models for the precision medicine paradigm, thus favoring the creation of safer tailored treatments for specific patient subgroups. The computational techniques in this thesis are based on the combination of dimensionality reduction methods with Deep Learning (DL) architectures to learn meaningful transformations between the input and the predictive endpoint space. The rationale is that such transformations can introduce intermediate spaces offering more succinct representations, where data from different sources are summarized. The research goal was attacked at increasing levels of complexity, starting from single input modalities (omics and bioimaging of different types and scales), to their multimodal integration. The approach also deals with the key challenges for machine learning (ML) on biomedical data, i.e. reproducibility, stability, and interpretability of the models. Along this path, the thesis contribution is thus the development of a set of specialized AI models and a core framework of three tools of general applicability: i. A Data Analysis Plan (DAP) for model selection and evaluation of classifiers on omics and imaging data to avoid selection bias. ii. The histolab Python package that standardizes the reproducible pre-processing of Whole Slide Images (WSIs), supported by automated testing and easily integrable in DL pipelines for Digital Pathology. iii. Unsupervised and dimensionality reduction techniques based on the UMAP and TDA frameworks for patient subtyping. The framework has been successfully applied on public as well as original data in precision oncology and predictive toxicology. In the clinical setting, this thesis has developed1: 1. (DAPPER) A deep learning framework for evaluation of predictive models in Digital Pathology that controls for selection bias through properly designed data partitioning schemes. 2. (RADLER) A unified deep learning framework that combines radiomics fea- tures and imaging on PET-CT images for prognostic biomarker development in head and neck squamous cell carcinoma. The mixed deep learning/radiomics approach is more accurate than using only one feature type. 3. An ML framework for automated quantification tumor infiltrating lymphocytes (TILs) in onco-immunology, validated on original pathology Neuroblastoma data of the Bambino Gesu’ Children’s Hospital, with high agreement with trained pathologists. The network-based INF pipeline, which applies machine learning models over the combination of multiple omics layers, also providing compact biomarker signatures. INF was validated on three TCGA oncogenomic datasets. In the preclinical setting the framework has been applied for: 1. Deep and machine learning algorithms to predict DILI status from gene expression (GE) data derived from cancer cell lines on the CMap Drug Safety dataset. 2. (ML4TOX) Deep Learning and Support Vector Machine models to predict potential endocrine disruption of environmental chemicals on the CERAPP dataset. 3. (PathologAI) A deep learning pipeline combining generative and convolutional models for preclinical digital pathology. Developed as an internal project within the FDA/NCTR AIRForce initiative and applied to predict necrosis on images from the TG-GATEs project, PathologAI aims to improve accuracy and reduce labor in the identification of lesions in predictive toxicology. Furthermore, GE microarray data were integrated with histology features in a unified multi-modal scheme combining imaging and omics data. The solutions were developed in collaboration with domain experts and considered promising for application.
AI for Omics and Imaging Models in Precision Medicine and Toxicology
Bussola, Nicole
2022
Abstract
This thesis develops an Artificial Intelligence (AI) approach intended for accurate patient stratification and precise diagnostics/prognostics in clinical and preclinical applications. The rapid advance in high throughput technologies and bioinformatics tools is still far from linking precisely the genome-phenotype interactions with the biological mechanisms that underlie pathophysiological conditions. In practice, the incomplete knowledge on individual heterogeneity in complex diseases keeps forcing clinicians to settle for surrogate endpoints and therapies based on a generic one-size-fits-all approach. The working hypothesis is that AI can add new tools to elaborate and integrate together in new features or structures the rich information now available from high-throughput omics and bioimaging data, and that such re- structured information can be applied through predictive models for the precision medicine paradigm, thus favoring the creation of safer tailored treatments for specific patient subgroups. The computational techniques in this thesis are based on the combination of dimensionality reduction methods with Deep Learning (DL) architectures to learn meaningful transformations between the input and the predictive endpoint space. The rationale is that such transformations can introduce intermediate spaces offering more succinct representations, where data from different sources are summarized. The research goal was attacked at increasing levels of complexity, starting from single input modalities (omics and bioimaging of different types and scales), to their multimodal integration. The approach also deals with the key challenges for machine learning (ML) on biomedical data, i.e. reproducibility, stability, and interpretability of the models. Along this path, the thesis contribution is thus the development of a set of specialized AI models and a core framework of three tools of general applicability: i. A Data Analysis Plan (DAP) for model selection and evaluation of classifiers on omics and imaging data to avoid selection bias. ii. The histolab Python package that standardizes the reproducible pre-processing of Whole Slide Images (WSIs), supported by automated testing and easily integrable in DL pipelines for Digital Pathology. iii. Unsupervised and dimensionality reduction techniques based on the UMAP and TDA frameworks for patient subtyping. The framework has been successfully applied on public as well as original data in precision oncology and predictive toxicology. In the clinical setting, this thesis has developed1: 1. (DAPPER) A deep learning framework for evaluation of predictive models in Digital Pathology that controls for selection bias through properly designed data partitioning schemes. 2. (RADLER) A unified deep learning framework that combines radiomics fea- tures and imaging on PET-CT images for prognostic biomarker development in head and neck squamous cell carcinoma. The mixed deep learning/radiomics approach is more accurate than using only one feature type. 3. An ML framework for automated quantification tumor infiltrating lymphocytes (TILs) in onco-immunology, validated on original pathology Neuroblastoma data of the Bambino Gesu’ Children’s Hospital, with high agreement with trained pathologists. The network-based INF pipeline, which applies machine learning models over the combination of multiple omics layers, also providing compact biomarker signatures. INF was validated on three TCGA oncogenomic datasets. In the preclinical setting the framework has been applied for: 1. Deep and machine learning algorithms to predict DILI status from gene expression (GE) data derived from cancer cell lines on the CMap Drug Safety dataset. 2. (ML4TOX) Deep Learning and Support Vector Machine models to predict potential endocrine disruption of environmental chemicals on the CERAPP dataset. 3. (PathologAI) A deep learning pipeline combining generative and convolutional models for preclinical digital pathology. Developed as an internal project within the FDA/NCTR AIRForce initiative and applied to predict necrosis on images from the TG-GATEs project, PathologAI aims to improve accuracy and reduce labor in the identification of lesions in predictive toxicology. Furthermore, GE microarray data were integrated with histology features in a unified multi-modal scheme combining imaging and omics data. The solutions were developed in collaboration with domain experts and considered promising for application.File | Dimensione | Formato | |
---|---|---|---|
NBussola_thesis_PhD.pdf
Open Access dal 01/01/2023
Dimensione
84.85 MB
Formato
Adobe PDF
|
84.85 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/176618
URN:NBN:IT:UNITN-176618