Accurate interpretation of germline and somatic variants plays a pivotal role in clinical practice, providing the foundation for the correct diagnosis and targeted therapy in personalized medicine. However, variant classification is a complex process that requires the integration of multiple, often incomplete sources of information, prompting the development of bioinformatic tools that enable clinicians to prioritize mutations for validation and patient consulting. The primary objective of my PhD project is to identify and evaluate consistent methods for discovering mutations of scientific and clinical significance. To this aim, in 2020, we designed RENOVO, an algorithm harnessing machine learning. The algorithm was trained on the extensive ClinVar database and validated across multiple external databases. RENOVO exhibited high accuracy in classifying known pathogenic and benign variants. It provides a pathogenicity likelihood score, aiding in the interpretation of de novo or variants of unknown significance. Our comprehensive evaluation of RENOVO's predictions, spanning a four-year period, demonstrated its efficacy, as RENOVO correctly classified 82.6% of variants that underwent reclassification. This validation confirms RENOVO’s potential to predict future reclassification of variants currently deemed uncertain or conflicting in clinical significance. We applied variant interpretation tools, including RENOVO, to observational trials addressing clinical needs in clinical cancer genomics, on ovarian and breast cancers and myeloproliferative neoplasms (MPN) who did not previously show positivity for known MPN genetic drivers (“triple negative disease”). In the latter study, we found a high prevalence of mutations in a known cancer-associated gene, KMT2C. However, we noticed that KMT2C variants were highly dependent on the alignment algorithm used, due to high homology with a known pseudogene. Further investigation led to the discovery of a phenomenon that we named Genotype-Dependent Mismapping (GDM). Genomic regions subject to GDM are highly homologous regions that differ by one or few highly prevalent polymorphisms. As a consequence, the mappability of the region depends entirely on the polymorphic allele, which acts as a binary “switch”: one allele allows unambiguous mapping, whereas the other renders the region unmappable due to complete homology. We developed a tool to measure the probability of GDM for large genomic regions and we are currently applying our tool to existing studies to reassess the prevalence of known oncogenic variants. Overall, this project presents an effective strategy for addressing genomic challenges associated with variant detection, validation, and classification, emphasizing a patient-centric methodology.
NOVEL APPROACHES IN GENETIC VARIANT DISCOVERY AND CLASSIFICATION TACKLING UNSOLVED PROBLEMS IN CANCER CLINICAL GENOMICS
BONETTI, EMANUELE
2025
Abstract
Accurate interpretation of germline and somatic variants plays a pivotal role in clinical practice, providing the foundation for the correct diagnosis and targeted therapy in personalized medicine. However, variant classification is a complex process that requires the integration of multiple, often incomplete sources of information, prompting the development of bioinformatic tools that enable clinicians to prioritize mutations for validation and patient consulting. The primary objective of my PhD project is to identify and evaluate consistent methods for discovering mutations of scientific and clinical significance. To this aim, in 2020, we designed RENOVO, an algorithm harnessing machine learning. The algorithm was trained on the extensive ClinVar database and validated across multiple external databases. RENOVO exhibited high accuracy in classifying known pathogenic and benign variants. It provides a pathogenicity likelihood score, aiding in the interpretation of de novo or variants of unknown significance. Our comprehensive evaluation of RENOVO's predictions, spanning a four-year period, demonstrated its efficacy, as RENOVO correctly classified 82.6% of variants that underwent reclassification. This validation confirms RENOVO’s potential to predict future reclassification of variants currently deemed uncertain or conflicting in clinical significance. We applied variant interpretation tools, including RENOVO, to observational trials addressing clinical needs in clinical cancer genomics, on ovarian and breast cancers and myeloproliferative neoplasms (MPN) who did not previously show positivity for known MPN genetic drivers (“triple negative disease”). In the latter study, we found a high prevalence of mutations in a known cancer-associated gene, KMT2C. However, we noticed that KMT2C variants were highly dependent on the alignment algorithm used, due to high homology with a known pseudogene. Further investigation led to the discovery of a phenomenon that we named Genotype-Dependent Mismapping (GDM). Genomic regions subject to GDM are highly homologous regions that differ by one or few highly prevalent polymorphisms. As a consequence, the mappability of the region depends entirely on the polymorphic allele, which acts as a binary “switch”: one allele allows unambiguous mapping, whereas the other renders the region unmappable due to complete homology. We developed a tool to measure the probability of GDM for large genomic regions and we are currently applying our tool to existing studies to reassess the prevalence of known oncogenic variants. Overall, this project presents an effective strategy for addressing genomic challenges associated with variant detection, validation, and classification, emphasizing a patient-centric methodology.File | Dimensione | Formato | |
---|---|---|---|
phd_unimi_R13132.pdf
accesso aperto
Dimensione
6.24 MB
Formato
Adobe PDF
|
6.24 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/189835
URN:NBN:IT:UNIMI-189835