NOVEL APPROACHES IN GENETIC VARIANT DISCOVERY AND CLASSIFICATION TACKLING UNSOLVED PROBLEMS IN CANCER CLINICAL GENOMICS

Bonetti, Emanuele

Accurate interpretation of germline and somatic variants plays a pivotal role in clinical practice, providing the foundation for the correct diagnosis and targeted therapy in personalized medicine. However, variant classification is a complex process that requires the integration of multiple, often incomplete sources of information, prompting the development of bioinformatic tools that enable clinicians to prioritize mutations for validation and patient consulting. The primary objective of my PhD project is to identify and evaluate consistent methods for discovering mutations of scientific and clinical significance. To this aim, in 2020, we designed RENOVO, an algorithm harnessing machine learning. The algorithm was trained on the extensive ClinVar database and validated across multiple external databases. RENOVO exhibited high accuracy in classifying known pathogenic and benign variants. It provides a pathogenicity likelihood score, aiding in the interpretation of de novo or variants of unknown significance. Our comprehensive evaluation of RENOVO's predictions, spanning a four-year period, demonstrated its efficacy, as RENOVO correctly classified 82.6% of variants that underwent reclassification. This validation confirms RENOVO’s potential to predict future reclassification of variants currently deemed uncertain or conflicting in clinical significance. We applied variant interpretation tools, including RENOVO, to observational trials addressing clinical needs in clinical cancer genomics, on ovarian and breast cancers and myeloproliferative neoplasms (MPN) who did not previously show positivity for known MPN genetic drivers (“triple negative disease”). In the latter study, we found a high prevalence of mutations in a known cancer-associated gene, KMT2C. However, we noticed that KMT2C variants were highly dependent on the alignment algorithm used, due to high homology with a known pseudogene. Further investigation led to the discovery of a phenomenon that we named Genotype-Dependent Mismapping (GDM). Genomic regions subject to GDM are highly homologous regions that differ by one or few highly prevalent polymorphisms. As a consequence, the mappability of the region depends entirely on the polymorphic allele, which acts as a binary “switch”: one allele allows unambiguous mapping, whereas the other renders the region unmappable due to complete homology. We developed a tool to measure the probability of GDM for large genomic regions and we are currently applying our tool to existing studies to reassess the prevalence of known oncogenic variants. Overall, this project presents an effective strategy for addressing genomic challenges associated with variant detection, validation, and classification, emphasizing a patient-centric methodology.

NOVEL APPROACHES IN GENETIC VARIANT DISCOVERY AND CLASSIFICATION TACKLING UNSOLVED PROBLEMS IN CANCER CLINICAL GENOMICS

BONETTI, EMANUELE

2025

Abstract

Accurate interpretation of germline and somatic variants plays a pivotal role in clinical practice, providing the foundation for the correct diagnosis and targeted therapy in personalized medicine. However, variant classification is a complex process that requires the integration of multiple, often incomplete sources of information, prompting the development of bioinformatic tools that enable clinicians to prioritize mutations for validation and patient consulting. The primary objective of my PhD project is to identify and evaluate consistent methods for discovering mutations of scientific and clinical significance. To this aim, in 2020, we designed RENOVO, an algorithm harnessing machine learning. The algorithm was trained on the extensive ClinVar database and validated across multiple external databases. RENOVO exhibited high accuracy in classifying known pathogenic and benign variants. It provides a pathogenicity likelihood score, aiding in the interpretation of de novo or variants of unknown significance. Our comprehensive evaluation of RENOVO's predictions, spanning a four-year period, demonstrated its efficacy, as RENOVO correctly classified 82.6% of variants that underwent reclassification. This validation confirms RENOVO’s potential to predict future reclassification of variants currently deemed uncertain or conflicting in clinical significance. We applied variant interpretation tools, including RENOVO, to observational trials addressing clinical needs in clinical cancer genomics, on ovarian and breast cancers and myeloproliferative neoplasms (MPN) who did not previously show positivity for known MPN genetic drivers (“triple negative disease”). In the latter study, we found a high prevalence of mutations in a known cancer-associated gene, KMT2C. However, we noticed that KMT2C variants were highly dependent on the alignment algorithm used, due to high homology with a known pseudogene. Further investigation led to the discovery of a phenomenon that we named Genotype-Dependent Mismapping (GDM). Genomic regions subject to GDM are highly homologous regions that differ by one or few highly prevalent polymorphisms. As a consequence, the mappability of the region depends entirely on the polymorphic allele, which acts as a binary “switch”: one allele allows unambiguous mapping, whereas the other renders the region unmappable due to complete homology. We developed a tool to measure the probability of GDM for large genomic regions and we are currently applying our tool to existing studies to reassess the prevalence of known oncogenic variants. Overall, this project presents an effective strategy for addressing genomic challenges associated with variant detection, validation, and classification, emphasizing a patient-centric methodology.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				Dipartimento di Oncologia ed Emato-Oncologia
			
	Corso di studio
	
				MEDICINA DEI SISTEMI
			
	Data di pubblicazione
	
				21-gen-2025
			
	Lingua
	
				Inglese
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				SORANZO, NICOLE
PASINI, DIEGO
			
	Nome Editore
	
				Università degli Studi di Milano
			
	Numero di pagine
	
				104
			
	Collezione di appartenenza
	
				Università degli Studi di Milano

File in questo prodotto:

File	Dimensione	Formato
phd_unimi_R13132.pdf accesso aperto Dimensione 6.24 MB Formato Adobe PDF Visualizza/Apri	6.24 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/189835

Il codice NBN di questa tesi è URN:NBN:IT:UNIMI-189835