In 2003, after more than a decade of research, the Human Genome Project (HGP) was completed. The goals of the HGP were to identify the sequence of the 3 billion units of DNA that go into making a human genome, as well as to identify all of the genes located in this vast amount of data. The individual genes within the long strands of DNA, and the elements that control the genes, are still in the process of being identified completely. One of the early hopes of the genomic project was to pinpoint specific genes that cause genetic diseases. Now we know the answer is more complex, most of the genetic diseases are complex and caused by a combination of genetic, environmental and lifestyle factors. Nevertheless, the information gained from the HGP and basic research in last years has the potential to forever transform healthcare. We have already entered into an era in which it is now possible to analyze complete human genomes (through re-sequencing) or targeted sequencing tests within reasonable time frames and at a reasonable cost. Recently the American president Obama has launched Precision Medicine Initiative (PMI) which seeks to identify genetically-based drivers of disease in order to develop new, more effective treatments. Precision medicine consists in the use of new methods of molecular analysis to better manage a patient’s disease or to predict the predisposition toward the disease. This involves the introduction of new diagnostic tests, many of which are derived from Next-Generation Sequencing (NGS) technologies. Researchers are combing through segments of this data to look for genetic variants, potentially meaningful differences that might eventually result in a treatment. Currently, the medical doctors are focused in personal medicine, which use personal information, such as the clinical, genetic, genomic and environmental data. Because these factors are different for every person, then so are also the basis of their diseases, including their onset, their course, and how they might respond to drugs or other interventions. The NGS technologies have been remarkably successful in finding the causes of Mendelian and rare diseases. This represents a huge advance in our ability to provide correct diagnoses for patients with rare inherited disorders and their families. Not only can rapid and safe diagnostics of virtually all known single-gene defects now be established, but novel causes of disease in previously unsolved cases can also be identified. All of this is leading increasingly to the use of these NGS technologies in the medical diagnostics. This thesis aims to assess the main protocols to search for disease-associated genetic variants through NSG as well as the reliability of the genetic information acquired. In order to assess the overall procedure necessary to detect disease-associated genetic variants, several case studies were made so to assess individually single sub-procedures regarding variant calling, variant annotation and variant prioritization pipelines. In addition to knowledge of the disease-associated genetic variants, it is also important to understand how these affect the encoded proteins. A deep characterization of the structure/function relationships of the wild type and mutated protein is thus needed for a complete assessment of the putative effect of the variant. Further, PMI is introducing diagnostic testing which will be used for selecting appropriate and optimal therapies based on the genetic context of a patient, i.e., pharmacogenomics, thus to introduce new personalized drugs and antibodies designed to counter the influence of specific molecular drivers, e.g., the drug Imatinib was designed to inhibit an altered enzyme produced by a fused version of two genes found in Chronic Myelogenous Leukemia (CML). All of this is accomplished by analyzing and characterizing proteins, which are considered the task force of a gene. However, we face the objective difficulties of using experimental techniques in a protein, representing a very expensive effort in terms of time and money. This calls for computational biology techniques, which can be used to study the effects of a variant on a protein. Protein bioinformatics explains all aspects of proteins including sequence and structure analysis, prediction of protein structures, protein folding, protein stability, and protein interactions through several bioinformatics tools available in the literature for protein analysis, characterization and prediction. This thesis aims also to develop an efficient computational protocol able to give insights into the structural/functional effects of the disease-associated genetic variant at the protein level through protein bioinformatics tools. In order to apply these methods to develop this computational protocol, several cases of study were performed using protein bioinformatics tools, each one with several mutations on proteins associated to different genetic diseases.

Functional characterization of disease-associated genetic variants: insights from protein bioinformatics

Marin Vargas, Sergio Paul
2016

Abstract

In 2003, after more than a decade of research, the Human Genome Project (HGP) was completed. The goals of the HGP were to identify the sequence of the 3 billion units of DNA that go into making a human genome, as well as to identify all of the genes located in this vast amount of data. The individual genes within the long strands of DNA, and the elements that control the genes, are still in the process of being identified completely. One of the early hopes of the genomic project was to pinpoint specific genes that cause genetic diseases. Now we know the answer is more complex, most of the genetic diseases are complex and caused by a combination of genetic, environmental and lifestyle factors. Nevertheless, the information gained from the HGP and basic research in last years has the potential to forever transform healthcare. We have already entered into an era in which it is now possible to analyze complete human genomes (through re-sequencing) or targeted sequencing tests within reasonable time frames and at a reasonable cost. Recently the American president Obama has launched Precision Medicine Initiative (PMI) which seeks to identify genetically-based drivers of disease in order to develop new, more effective treatments. Precision medicine consists in the use of new methods of molecular analysis to better manage a patient’s disease or to predict the predisposition toward the disease. This involves the introduction of new diagnostic tests, many of which are derived from Next-Generation Sequencing (NGS) technologies. Researchers are combing through segments of this data to look for genetic variants, potentially meaningful differences that might eventually result in a treatment. Currently, the medical doctors are focused in personal medicine, which use personal information, such as the clinical, genetic, genomic and environmental data. Because these factors are different for every person, then so are also the basis of their diseases, including their onset, their course, and how they might respond to drugs or other interventions. The NGS technologies have been remarkably successful in finding the causes of Mendelian and rare diseases. This represents a huge advance in our ability to provide correct diagnoses for patients with rare inherited disorders and their families. Not only can rapid and safe diagnostics of virtually all known single-gene defects now be established, but novel causes of disease in previously unsolved cases can also be identified. All of this is leading increasingly to the use of these NGS technologies in the medical diagnostics. This thesis aims to assess the main protocols to search for disease-associated genetic variants through NSG as well as the reliability of the genetic information acquired. In order to assess the overall procedure necessary to detect disease-associated genetic variants, several case studies were made so to assess individually single sub-procedures regarding variant calling, variant annotation and variant prioritization pipelines. In addition to knowledge of the disease-associated genetic variants, it is also important to understand how these affect the encoded proteins. A deep characterization of the structure/function relationships of the wild type and mutated protein is thus needed for a complete assessment of the putative effect of the variant. Further, PMI is introducing diagnostic testing which will be used for selecting appropriate and optimal therapies based on the genetic context of a patient, i.e., pharmacogenomics, thus to introduce new personalized drugs and antibodies designed to counter the influence of specific molecular drivers, e.g., the drug Imatinib was designed to inhibit an altered enzyme produced by a fused version of two genes found in Chronic Myelogenous Leukemia (CML). All of this is accomplished by analyzing and characterizing proteins, which are considered the task force of a gene. However, we face the objective difficulties of using experimental techniques in a protein, representing a very expensive effort in terms of time and money. This calls for computational biology techniques, which can be used to study the effects of a variant on a protein. Protein bioinformatics explains all aspects of proteins including sequence and structure analysis, prediction of protein structures, protein folding, protein stability, and protein interactions through several bioinformatics tools available in the literature for protein analysis, characterization and prediction. This thesis aims also to develop an efficient computational protocol able to give insights into the structural/functional effects of the disease-associated genetic variant at the protein level through protein bioinformatics tools. In order to apply these methods to develop this computational protocol, several cases of study were performed using protein bioinformatics tools, each one with several mutations on proteins associated to different genetic diseases.
2016
Inglese
mendelian disease, genetic variant, mutation, protein bioinformatics, bioinformatics
84
File in questo prodotto:
File Dimensione Formato  
Thesis.pdf

accesso solo da BNCF e BNCR

Dimensione 9.8 MB
Formato Adobe PDF
9.8 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/113535
Il codice NBN di questa tesi è URN:NBN:IT:UNIVR-113535