Computational oncogenomics has a pivotal role in cancer research due to the inherent complexity and heterogeneity of cancer diseases that often pose an insurmountable barrier to traditional research approaches. Comprehensive omics-based investigations and advanced computational methods are demanded to understand molecular intricacies underlying tumors, answering unsolved biological and clinical issues and contributing to precision medicine. This PhD research belongs to the computational oncogenomics area and emphasises the synergistic use of omics data processing and Data Science techniques to tackle complex clinical challenges in cancer research. It was focused on developing original workflows to provide clinically relevant insights and stratifications for different types of cancer patients. Indeed, computational oncogenomics aims not only to decipher omics landscapes of cancer but also to provide clinicians with valuable indications of patient subgroups with distinct characteristics and demanding personalized therapeutic strategies. In such a context, Machine Learning-based solutions emerged as a powerful key to dissecting cancer inner heterogeneity and unlocking robust and clinically relevant patient-centric predictions. All the methodologies and applications of this PhD project were developed to address hot topics in biomedical research, following the typical Data Science life cycle. Great attention was therefore devoted to omics data integration and exploration, as well as to the feature engineering and selection phase. Predictive modelling and result evaluation were tailored to face the peculiar issues of oncogenomics research scenarios and provide interpretation and validation both from computational and clinical-biological perspectives. To obtain robust results, it was indeed decisive to design, implement and carefully assess suitable and fully legit Machine Learning workflows, also taking particular care of clinically and biologically validating the achieved findings. To this aim, my research proceeded with strict collaborations with experts in Medicine and Biology, which offered me a clearer view of the needs and goals to meet for each task. The endeavour of this PhD research was directed first to the enhancement of an R/Bioconductor package for efficient investigation and integration of omics data. Then, it delved into robust Machine Learning-based cancer subtyping for reliable patient predictions, also transitioning towards multi-label classification to represent even the inner heterogeneity of many patients. Lastly, it focused on mutation-based stratifications to identify variants with therapeutic or prognostic roles in patient groups of critical clinical handling. Applications to breast and colorectal cancer proved the computational efficacy and clinical/biological relevance of the obtained results. This work achieved remarkable advances in cancer subtyping, including designing a new feature selection method to tackle unbalanced classification scenarios; investigating multi-omics, deep and semi-supervised solutions in light of the increasing omics data availability; introducing multilabel subtyping strategies, which reflect underlying cellular heterogeneity, enhance patient molecular characterization and improve the clinical value of the predictions considering both primary and secondary assignments. In addition, the integration of class discovery, transfer learning, and multi-label predictions demonstrated its efficacy in finding a more exhaustive colorectal cancer stratification, named EXA-CRIS. Its classes were also traced on a different dataset type, leveraging an adaptation strategy designed to optimize the choice of the most suitable features for the predictive task. Lastly, innovative mutation-based feature engineering and supervised frameworks, used to recognize critical subgroups of cancer patients, were combined with variant prioritisation approaches and search for actionable genes to find new potential therapeutic targets. Thus, this research has placed its priority on implementing and meticulously assessing comprehensive Omics Data Science workflows to dissect cancer inner heterogeneity from different omics perspectives, all contributing to shaping new trajectories towards precision medicine.
L’oncogenomica computazionale riveste un ruolo centrale nella ricerca sul cancro vista la complessità e l’eterogeneità intrinseca delle malattie tumorali, che spesso costituiscono una barriera insormontabile per gli approcci di ricerca tradizionali. Indagini omiche complete e metodi computazionali avanzati sono necessari per comprendere le intricate basi molecolari dei tumori e rispondere a domande biologiche e cliniche irrisolte, contribuendo alla medicina di precisione. Questa ricerca di Dottorato, svolta nell’ambito dell’oncogenomica computazionale, ha enfatizzato l’uso sinergico di tecniche di elaborazione dei dati omici e di Data Science per affrontare complessi problemi clinici nella ricerca sul cancro. È stata incentrata sullo sviluppo di workflow innovativi per fornire indicazioni e stratificazioni clinicamente rilevanti per pazienti oncologici di diversi tipi. Infatti, l’oncogenomica non mira solo a decifrare la complessità tumorale da diverse prospettive omiche, ma anche a fornire ai medici indicazioni preziose su sottogruppi di pazienti con caratteristiche distinte e che richiedono strategie terapeutiche personalizzate. In tale contesto, le soluzioni basate sul Machine Learning sono emerse come una potente chiave per scomporre e analizzare l’eterogeneità interna del cancro e per fornire predizioni robuste e clinicamente rilevanti su ogni paziente. Tutte le metodologie e le applicazioni di questo progetto di Dottorato sono state sviluppate per affrontare argomenti di forte interesse nella ricerca biomedica, seguendo il tipico ciclo di analisi Data Science. Grande attenzione è stata quindi dedicata all’integrazione ed esplorazione dei dati omici, nonché alla fase di ingegnerizzazione e selezione delle features. Le fasi di modellazione dei predittori e valutazione dei risultati sono state pensate appositamente per affrontare le peculiari problematiche degli scenari di ricerca oncogenomica e garantire interpretazione e validità sia dal punto di vista computazionale che clinico-biologico. Per ottenere risultati robusti, è stato decisivo progettare, implementare e valutare attentamente i workflow più adatti e legittimi basati su Machine Learning, prestando particolare cura alla validazione clinica e biologica delle evidenze raggiunte. A tal fine, la mia ricerca si è avvalsa di strette collaborazioni con esperti in medicina e biologia, che mi hanno offerto una visione più chiara delle esigenze e degli obiettivi da raggiungere in ogni scenario. L’impegno di questa ricerca di Dottorato è stato innanzitutto diretto al potenziamento di un pacchetto R/Bioconductor per esplorazione ed integrazione efficiente dei dati omici. Ha poi approfondito la sottotipizzazione del cancro, utilizzando tecniche di Machine Learning per fornire predizioni affidabili, muovendosi poi verso la classificazione multi-label per rappresentare anche l’eterogeneità interna di molti pazienti. Infine, si è focalizzato sulle stratificazioni basate su mutazioni per identificare varianti con ruoli terapeutici o prognostici in gruppi di pazienti di più critica gestione clinica. Le applicazioni al cancro al seno e al colon-retto hanno dimostrato l’efficacia computazionale e la rilevanza clinica/biologica dei risultati ottenuti. I notevoli progressi raggiunti nella sottotipizzazione del cancro includono: la progettazione e implementazione di un nuovo metodo di feature selection per affrontare scenari di classificazione molto sbilanciati; l’indagine di soluzioni multi-omiche, deep e semi-supervised alla luce della disponibilità crescente dei dati omici; l’introduzione di strategie di sottotipizzazione multi-label, che riflettono la sottostante eterogeneità cellulare, migliorano la caratterizzazione molecolare del paziente e aumentano il valore clinico delle previsioni considerando assegnazioni sia primarie che secondarie. Inoltre, l’integrazione di class discovery, transfer learning e predizioni multi-label ha dimostrato la sua efficacia nel trovare una stratificazione più esaustiva del cancro al colon-retto, chiamata EXA-CRIS. Le sue classi sono state rintracciate anche su un diverso tipo di dataset sfruttando una strategia di adaptation progettata per ottimizzare la scelta delle features più adatte al compito predittivo. Infine, l’innovativa ingegnerizzazione delle features basate su mutazioni e l’utilizzo di framework supervised per riconoscere sottogruppi critici di pazienti oncologici, sono stati combinati con approcci di prioritizzazione delle varianti e di ricerca di geni actionable per trovare nuovi potenziali bersagli terapeutici. Questa ricerca ha pertanto posto la sua priorità nell’implementare e valutare attentamente i workflow di Omics Data Science più adeguati per indagare l’eterogeneità interna del cancro da diverse prospettive omiche, contribuendo ad individuare nuove traiettorie verso la medicina di precisione.
Machine learning in oncogenomics : a key to dissecting cancer inner heterogeneity
Silvia, Cascianelli
2024
Abstract
Computational oncogenomics has a pivotal role in cancer research due to the inherent complexity and heterogeneity of cancer diseases that often pose an insurmountable barrier to traditional research approaches. Comprehensive omics-based investigations and advanced computational methods are demanded to understand molecular intricacies underlying tumors, answering unsolved biological and clinical issues and contributing to precision medicine. This PhD research belongs to the computational oncogenomics area and emphasises the synergistic use of omics data processing and Data Science techniques to tackle complex clinical challenges in cancer research. It was focused on developing original workflows to provide clinically relevant insights and stratifications for different types of cancer patients. Indeed, computational oncogenomics aims not only to decipher omics landscapes of cancer but also to provide clinicians with valuable indications of patient subgroups with distinct characteristics and demanding personalized therapeutic strategies. In such a context, Machine Learning-based solutions emerged as a powerful key to dissecting cancer inner heterogeneity and unlocking robust and clinically relevant patient-centric predictions. All the methodologies and applications of this PhD project were developed to address hot topics in biomedical research, following the typical Data Science life cycle. Great attention was therefore devoted to omics data integration and exploration, as well as to the feature engineering and selection phase. Predictive modelling and result evaluation were tailored to face the peculiar issues of oncogenomics research scenarios and provide interpretation and validation both from computational and clinical-biological perspectives. To obtain robust results, it was indeed decisive to design, implement and carefully assess suitable and fully legit Machine Learning workflows, also taking particular care of clinically and biologically validating the achieved findings. To this aim, my research proceeded with strict collaborations with experts in Medicine and Biology, which offered me a clearer view of the needs and goals to meet for each task. The endeavour of this PhD research was directed first to the enhancement of an R/Bioconductor package for efficient investigation and integration of omics data. Then, it delved into robust Machine Learning-based cancer subtyping for reliable patient predictions, also transitioning towards multi-label classification to represent even the inner heterogeneity of many patients. Lastly, it focused on mutation-based stratifications to identify variants with therapeutic or prognostic roles in patient groups of critical clinical handling. Applications to breast and colorectal cancer proved the computational efficacy and clinical/biological relevance of the obtained results. This work achieved remarkable advances in cancer subtyping, including designing a new feature selection method to tackle unbalanced classification scenarios; investigating multi-omics, deep and semi-supervised solutions in light of the increasing omics data availability; introducing multilabel subtyping strategies, which reflect underlying cellular heterogeneity, enhance patient molecular characterization and improve the clinical value of the predictions considering both primary and secondary assignments. In addition, the integration of class discovery, transfer learning, and multi-label predictions demonstrated its efficacy in finding a more exhaustive colorectal cancer stratification, named EXA-CRIS. Its classes were also traced on a different dataset type, leveraging an adaptation strategy designed to optimize the choice of the most suitable features for the predictive task. Lastly, innovative mutation-based feature engineering and supervised frameworks, used to recognize critical subgroups of cancer patients, were combined with variant prioritisation approaches and search for actionable genes to find new potential therapeutic targets. Thus, this research has placed its priority on implementing and meticulously assessing comprehensive Omics Data Science workflows to dissect cancer inner heterogeneity from different omics perspectives, all contributing to shaping new trajectories towards precision medicine.| File | Dimensione | Formato | |
|---|---|---|---|
|
Final_PhD_Thesis_Cascianelli_Silvia.pdf
accesso solo da BNCF e BNCR
Licenza:
Tutti i diritti riservati
Dimensione
21.76 MB
Formato
Adobe PDF
|
21.76 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/206418
URN:NBN:IT:POLIMI-206418