Accurate interpretation of germline and somatic variants plays a pivotal role in clinical practice, providing the foundation for the correct diagnosis and targeted therapy in personalized medicine. However, variant classification is a complex process that requires the integration of multiple, often incomplete sources of information, prompting the development of bioinformatic tools that enable clinicians to prioritize mutations for validation and patient consulting. The primary objective of my PhD project is to identify and evaluate consistent methods for discovering mutations of scientific and clinical significance. To this aim, in 2020, we designed RENOVO, an algorithm harnessing machine learning. The algorithm was trained on the extensive ClinVar database and validated across multiple external databases. RENOVO exhibited high accuracy in classifying known pathogenic and benign variants. It provides a pathogenicity likelihood score, aiding in the interpretation of de novo or variants of unknown significance. Our comprehensive evaluation of RENOVO's predictions, spanning a four-year period, demonstrated its efficacy, as RENOVO correctly classified 82.6% of variants that underwent reclassification. This validation confirms RENOVO’s potential to predict future reclassification of variants currently deemed uncertain or conflicting in clinical significance. We applied variant interpretation tools, including RENOVO, to observational trials addressing clinical needs in clinical cancer genomics, on ovarian and breast cancers and myeloproliferative neoplasms (MPN) who did not previously show positivity for known MPN genetic drivers (“triple negative disease”). In the latter study, we found a high prevalence of mutations in a known cancer-associated gene, KMT2C. However, we noticed that KMT2C variants were highly dependent on the alignment algorithm used, due to high homology with a known pseudogene. Further investigation led to the discovery of a phenomenon that we named Genotype-Dependent Mismapping (GDM). Genomic regions subject to GDM are highly homologous regions that differ by one or few highly prevalent polymorphisms. As a consequence, the mappability of the region depends entirely on the polymorphic allele, which acts as a binary “switch”: one allele allows unambiguous mapping, whereas the other renders the region unmappable due to complete homology. We developed a tool to measure the probability of GDM for large genomic regions and we are currently applying our tool to existing studies to reassess the prevalence of known oncogenic variants. Overall, this project presents an effective strategy for addressing genomic challenges associated with variant detection, validation, and classification, emphasizing a patient-centric methodology.

NOVEL APPROACHES IN GENETIC VARIANT DISCOVERY AND CLASSIFICATION TACKLING UNSOLVED PROBLEMS IN CANCER CLINICAL GENOMICS

BONETTI, EMANUELE
2025

Abstract

Accurate interpretation of germline and somatic variants plays a pivotal role in clinical practice, providing the foundation for the correct diagnosis and targeted therapy in personalized medicine. However, variant classification is a complex process that requires the integration of multiple, often incomplete sources of information, prompting the development of bioinformatic tools that enable clinicians to prioritize mutations for validation and patient consulting. The primary objective of my PhD project is to identify and evaluate consistent methods for discovering mutations of scientific and clinical significance. To this aim, in 2020, we designed RENOVO, an algorithm harnessing machine learning. The algorithm was trained on the extensive ClinVar database and validated across multiple external databases. RENOVO exhibited high accuracy in classifying known pathogenic and benign variants. It provides a pathogenicity likelihood score, aiding in the interpretation of de novo or variants of unknown significance. Our comprehensive evaluation of RENOVO's predictions, spanning a four-year period, demonstrated its efficacy, as RENOVO correctly classified 82.6% of variants that underwent reclassification. This validation confirms RENOVO’s potential to predict future reclassification of variants currently deemed uncertain or conflicting in clinical significance. We applied variant interpretation tools, including RENOVO, to observational trials addressing clinical needs in clinical cancer genomics, on ovarian and breast cancers and myeloproliferative neoplasms (MPN) who did not previously show positivity for known MPN genetic drivers (“triple negative disease”). In the latter study, we found a high prevalence of mutations in a known cancer-associated gene, KMT2C. However, we noticed that KMT2C variants were highly dependent on the alignment algorithm used, due to high homology with a known pseudogene. Further investigation led to the discovery of a phenomenon that we named Genotype-Dependent Mismapping (GDM). Genomic regions subject to GDM are highly homologous regions that differ by one or few highly prevalent polymorphisms. As a consequence, the mappability of the region depends entirely on the polymorphic allele, which acts as a binary “switch”: one allele allows unambiguous mapping, whereas the other renders the region unmappable due to complete homology. We developed a tool to measure the probability of GDM for large genomic regions and we are currently applying our tool to existing studies to reassess the prevalence of known oncogenic variants. Overall, this project presents an effective strategy for addressing genomic challenges associated with variant detection, validation, and classification, emphasizing a patient-centric methodology.
21-gen-2025
Inglese
SORANZO, NICOLE
PASINI, DIEGO
Università degli Studi di Milano
104
File in questo prodotto:
File Dimensione Formato  
phd_unimi_R13132.pdf

accesso aperto

Dimensione 6.24 MB
Formato Adobe PDF
6.24 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/189835
Il codice NBN di questa tesi è URN:NBN:IT:UNIMI-189835