The study of pan-genomes has become of crucial interest in genetic analysis. The reference genomes currently available are, in fact, no longer sufficient to represent all the variability within a species since they should collect genetic information that varies from individual to individual of the same population. A pan-genome is, therefore, a catalogue of all the genetic variations found within a population, species, or clade. Over time, different types of pan-genomes have been developed and different approaches for its construction have been presented. However, there is no single exact method: the biological characteristics of the species analysed and the type of data with which we work must direct the researcher toward the most appropriate approach. This thesis aims to develop the pan-genome of the common bean species (Phaseolus vulgaris) since it presents a wide spectrum of variability due to the coexistence of two different gene pools (Mesoamerican and Andean), each of which has undergone its domestication process. The pan-genome has been developed by integrating a method based on comparing five high-quality genomes (including the actual reference genome) and a method that also brings information from WGS data representing 339 bean accessions. During the analyses, the construction pipeline was also set through an in-depth study of all the parameters needed to include the entire variability of the species, avoiding redundancies in the added material. The homology between accessions and the study of orthologous and paralogous genes of the species are just some of the metrics considered in this pipeline. Also, developing an adequate coverage threshold to call the presence/absence of genes on each accession considering the variability of coverage of the WGS data, was one of the crucial points of the work. The result of this work is a pan-genome in which the number of genes has increased compared to that of the current reference of the bean species; furthermore, through the process of the Presence/Absence Variations (PAVs) calling, the resulting pan-genome shows how only less than 60% of its gene content is common to all varieties of the species, while the remaining 40% of the genes are specific only to some of them, demonstrating how the use of a single reference genome is reductive concerning the great complexity of the bean species.

Building the bean pan-genome: how to build it and exploit its potential

VINCENZI, LEONARDO
2025

Abstract

The study of pan-genomes has become of crucial interest in genetic analysis. The reference genomes currently available are, in fact, no longer sufficient to represent all the variability within a species since they should collect genetic information that varies from individual to individual of the same population. A pan-genome is, therefore, a catalogue of all the genetic variations found within a population, species, or clade. Over time, different types of pan-genomes have been developed and different approaches for its construction have been presented. However, there is no single exact method: the biological characteristics of the species analysed and the type of data with which we work must direct the researcher toward the most appropriate approach. This thesis aims to develop the pan-genome of the common bean species (Phaseolus vulgaris) since it presents a wide spectrum of variability due to the coexistence of two different gene pools (Mesoamerican and Andean), each of which has undergone its domestication process. The pan-genome has been developed by integrating a method based on comparing five high-quality genomes (including the actual reference genome) and a method that also brings information from WGS data representing 339 bean accessions. During the analyses, the construction pipeline was also set through an in-depth study of all the parameters needed to include the entire variability of the species, avoiding redundancies in the added material. The homology between accessions and the study of orthologous and paralogous genes of the species are just some of the metrics considered in this pipeline. Also, developing an adequate coverage threshold to call the presence/absence of genes on each accession considering the variability of coverage of the WGS data, was one of the crucial points of the work. The result of this work is a pan-genome in which the number of genes has increased compared to that of the current reference of the bean species; furthermore, through the process of the Presence/Absence Variations (PAVs) calling, the resulting pan-genome shows how only less than 60% of its gene content is common to all varieties of the species, while the remaining 40% of the genes are specific only to some of them, demonstrating how the use of a single reference genome is reductive concerning the great complexity of the bean species.
2025
Inglese
61
File in questo prodotto:
File Dimensione Formato  
PhD_thesis_Vincenzi_Leonardo_.pdf

accesso aperto

Dimensione 4.04 MB
Formato Adobe PDF
4.04 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/202399
Il codice NBN di questa tesi è URN:NBN:IT:UNIVR-202399