Since their discovery, transposable elements (TEs) have been found to be ubiquitously present in eukaryotic genomes. These elements, which represent a major part of the repetitive component of plant species, are capable of changing their location within the genome, generating genomic plasticity by inducing various chromosomal mutations and allelic diversity, thus contributing to the evolution of their host. The Asteraceae is one of the largest and most economically important families of flowering plants and includes very important crop species, such as sunflower, globe artichoke, and lettuce. Despite the economic importance, only partial data are available on genome composition and organization of this family. Taking advantage of the increasing availability of plant genomic sequences, the principal aim of my Ph. D. project has been to create ASTER-REP: a comprehensive database of sequences isolated from species belonging to the Asteraceae family, for studying structure and function of TEs. The six Asteraceae species whose fully sequenced genome assemblies are available in the National Center for Biotechnology Information (NCBI) GenBank database, and which were selected for the TE discovery, are: Helianthus annuus, Lactuca sativa, Cynara cardunculus var. scolymus, Artemisia annua, Carthamus tinctorius and Chrysanthemum seticuspe. Based on the most current classification system, TEs were identified for the following five orders: LTR-RE, SINE, TIR, MITE, and Helitron. A total of 334,747 full-length TEs were identified and included in ASTER-REP. The database is set up on a Linux-Apache-MySQL-PHP (LAMP) system, and its intuitive use allows the user to choose the desired sequences by selecting the species, TE class and order and, where possible, TE superfamily and lineage. The result of the search can be visualized and downloaded as FASTA and GFF files. ASTER-REP represents a useful tool for studies on TE diversity and dynamics, and it could help to decipher the genome structure and to infer about the evolution process occurred during Asteraceae separation. Furthermore, the discovery methods used can be applied to other plant species, favouring studies on the structure of the genomes and in particular of TEs. The repetitive component of the genomes of the selected species was investigated by comparative analysis, inferring the role that it may have played in evolution and speciation. For each species under consideration, Illumina paired-end reads, available in the GenBank database, were analysed through a clustering process using the RepeatExplorer2 software, which allowed us to estimate the abundance and variability of the repeat types. The large difference found between species is probably due to the fact that, after the separation of species, individual genomes undertook different evolutionary dynamics in terms of composition and abundance of repeat elements. Being long terminal repeat retrotransposons (LTR-REs) the most represented elements within the repetitive component of plant genomes, the attention was subsequently focused on this order of elements. Firstly, a pool of LTR-REs from all six species was subjected to phylogenetic analyses to verify the evolutionary relationships present between the species, confirming the annotation previously attributed to these TEs; then, an insertion time analysis of the same elements estimated their proliferation from around 15 million years ago, highlighting that the species show different insertion time profiles, specific to the different LTR-RE lineages. During evolution, the coding regions of TEs may undergo modifications leading to the loss of their self-replicative capability, acquisition of new functions, and beginning of their evolution under phenotypic selective pressure: they become novel genes, defined as exapted transposable element genes (ETEs). Focusing on an important model species that is the sunflower Helianthus annuus, whose repetitive component, mostly represented by TEs, amounts to about 80% of its genome, possible ETEs generated starting from LTR-REs and TIR elements were searched within this species. The sunflower genes showing similarity with TEs were investigated for the characteristics that distinguish TEs from genes, namely repetitiveness, similarity with already known TEs, siRNA coverage, and expression. Through this process, 3,530 sunflower genes were elected as validated ETEs. Their functional characterisation showed a significant involvement in disparate cellular functions, suggesting that ETEs affected several biological processes during sunflower evolution. The identification and characterisation of ETEs in sunflower highlighted the crucial role that the exaptation phenomenon plays in the creation of sequences with new functions, thus contributing to species evolution.

ASTER-REP, a database of Asteraceae sequences for studying structure and function of transposable elements

VENTIMIGLIA, MARIA
2022

Abstract

Since their discovery, transposable elements (TEs) have been found to be ubiquitously present in eukaryotic genomes. These elements, which represent a major part of the repetitive component of plant species, are capable of changing their location within the genome, generating genomic plasticity by inducing various chromosomal mutations and allelic diversity, thus contributing to the evolution of their host. The Asteraceae is one of the largest and most economically important families of flowering plants and includes very important crop species, such as sunflower, globe artichoke, and lettuce. Despite the economic importance, only partial data are available on genome composition and organization of this family. Taking advantage of the increasing availability of plant genomic sequences, the principal aim of my Ph. D. project has been to create ASTER-REP: a comprehensive database of sequences isolated from species belonging to the Asteraceae family, for studying structure and function of TEs. The six Asteraceae species whose fully sequenced genome assemblies are available in the National Center for Biotechnology Information (NCBI) GenBank database, and which were selected for the TE discovery, are: Helianthus annuus, Lactuca sativa, Cynara cardunculus var. scolymus, Artemisia annua, Carthamus tinctorius and Chrysanthemum seticuspe. Based on the most current classification system, TEs were identified for the following five orders: LTR-RE, SINE, TIR, MITE, and Helitron. A total of 334,747 full-length TEs were identified and included in ASTER-REP. The database is set up on a Linux-Apache-MySQL-PHP (LAMP) system, and its intuitive use allows the user to choose the desired sequences by selecting the species, TE class and order and, where possible, TE superfamily and lineage. The result of the search can be visualized and downloaded as FASTA and GFF files. ASTER-REP represents a useful tool for studies on TE diversity and dynamics, and it could help to decipher the genome structure and to infer about the evolution process occurred during Asteraceae separation. Furthermore, the discovery methods used can be applied to other plant species, favouring studies on the structure of the genomes and in particular of TEs. The repetitive component of the genomes of the selected species was investigated by comparative analysis, inferring the role that it may have played in evolution and speciation. For each species under consideration, Illumina paired-end reads, available in the GenBank database, were analysed through a clustering process using the RepeatExplorer2 software, which allowed us to estimate the abundance and variability of the repeat types. The large difference found between species is probably due to the fact that, after the separation of species, individual genomes undertook different evolutionary dynamics in terms of composition and abundance of repeat elements. Being long terminal repeat retrotransposons (LTR-REs) the most represented elements within the repetitive component of plant genomes, the attention was subsequently focused on this order of elements. Firstly, a pool of LTR-REs from all six species was subjected to phylogenetic analyses to verify the evolutionary relationships present between the species, confirming the annotation previously attributed to these TEs; then, an insertion time analysis of the same elements estimated their proliferation from around 15 million years ago, highlighting that the species show different insertion time profiles, specific to the different LTR-RE lineages. During evolution, the coding regions of TEs may undergo modifications leading to the loss of their self-replicative capability, acquisition of new functions, and beginning of their evolution under phenotypic selective pressure: they become novel genes, defined as exapted transposable element genes (ETEs). Focusing on an important model species that is the sunflower Helianthus annuus, whose repetitive component, mostly represented by TEs, amounts to about 80% of its genome, possible ETEs generated starting from LTR-REs and TIR elements were searched within this species. The sunflower genes showing similarity with TEs were investigated for the characteristics that distinguish TEs from genes, namely repetitiveness, similarity with already known TEs, siRNA coverage, and expression. Through this process, 3,530 sunflower genes were elected as validated ETEs. Their functional characterisation showed a significant involvement in disparate cellular functions, suggesting that ETEs affected several biological processes during sunflower evolution. The identification and characterisation of ETEs in sunflower highlighted the crucial role that the exaptation phenomenon plays in the creation of sequences with new functions, thus contributing to species evolution.
18-gen-2022
Italiano
Asteraceae
database
exaptation
transposable elements
Mascagni, Flavia
File in questo prodotto:
File Dimensione Formato  
Abstract.pdf

embargo fino al 27/01/2062

Dimensione 146.19 kB
Formato Adobe PDF
146.19 kB Adobe PDF
PhD_activities_report.pdf

embargo fino al 27/01/2062

Dimensione 180.35 kB
Formato Adobe PDF
180.35 kB Adobe PDF
PhD_thesis_AgriFoodEnv_Unipi_Ventimiglia.pdf

embargo fino al 27/01/2062

Dimensione 12.29 MB
Formato Adobe PDF
12.29 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/216567
Il codice NBN di questa tesi è URN:NBN:IT:UNIPI-216567