Motivation: Recent advances in DNA sequencing technologies have allowed the detailed characterization of genomes in large cohorts of tumors, highlighting their extreme heterogeneity, with no two tumors sharing the same complement of somatic mutations. Such heterogeneity hinders our ability to identify somatic mutations important for the disease, including mutations that determine clinically relevant phenotypes (e.g., cancer subtypes). Several tools have been developed to identify somatic mutations related to cancer phenotypes. However, such tools identify correlations between somatic mutations and cancer phenotypes, with no guarantee of highlighting causal relations. Results: This thesis is centered around ALLSTAR, a novel tool I developed as a result of a joint collaboration between the Veneto Institute of Oncology and the Department of Information Engineering at the University of Padova. The tool is able to infer reliable causal relations between combinations of somatic mutations and cancer phenotypes. ALLSTAR ranks causal rules based on the highest impact in terms of average effect on the phenotype. Since proving that the underlying computational problem is NP-hard, I developed a branch-and-bound approach, employing protein-protein interaction networks and novel bounds for pruning the search space, while properly correcting for multiple hypothesis testing. The extensive experimental evaluation on synthetic data shows that ALLSTAR is able to identify reliable causal relations in large cancer cohorts. Moreover, the reliable causal rules identified in cancer data show that my approach is able to retrieve several somatic mutations known to be relevant for cancer phenotypes, as well as novel biologically meaningful relations. Availability and Implementation: Code, data, and scripts to reproduce the experiments are available at https://github.com/VandinLab/ALLSTAR.
ALLSTAR: un nuovo algoritmo bioinformatico per inferire regole causali tra mutazioni somatiche e fenotipi tumorali
COLLESEI, ANTONIO
2024
Abstract
Motivation: Recent advances in DNA sequencing technologies have allowed the detailed characterization of genomes in large cohorts of tumors, highlighting their extreme heterogeneity, with no two tumors sharing the same complement of somatic mutations. Such heterogeneity hinders our ability to identify somatic mutations important for the disease, including mutations that determine clinically relevant phenotypes (e.g., cancer subtypes). Several tools have been developed to identify somatic mutations related to cancer phenotypes. However, such tools identify correlations between somatic mutations and cancer phenotypes, with no guarantee of highlighting causal relations. Results: This thesis is centered around ALLSTAR, a novel tool I developed as a result of a joint collaboration between the Veneto Institute of Oncology and the Department of Information Engineering at the University of Padova. The tool is able to infer reliable causal relations between combinations of somatic mutations and cancer phenotypes. ALLSTAR ranks causal rules based on the highest impact in terms of average effect on the phenotype. Since proving that the underlying computational problem is NP-hard, I developed a branch-and-bound approach, employing protein-protein interaction networks and novel bounds for pruning the search space, while properly correcting for multiple hypothesis testing. The extensive experimental evaluation on synthetic data shows that ALLSTAR is able to identify reliable causal relations in large cancer cohorts. Moreover, the reliable causal rules identified in cancer data show that my approach is able to retrieve several somatic mutations known to be relevant for cancer phenotypes, as well as novel biologically meaningful relations. Availability and Implementation: Code, data, and scripts to reproduce the experiments are available at https://github.com/VandinLab/ALLSTAR.File | Dimensione | Formato | |
---|---|---|---|
tesi_definitiva_Antonio_Collesei.pdf
accesso aperto
Dimensione
3.25 MB
Formato
Adobe PDF
|
3.25 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/160856
URN:NBN:IT:UNIPD-160856