The understanding of the complex wheat genome can be facilitated by the study of its wild relatives. Triticum urartu and Aegilops tauschii are the diploid donors of the A and D genome in the hexaploid bread wheat (Triticum aestivum, AuAuBBDD), respectively. The recently sequenced genomes of these two wild relatives represent an additional precious resource to understand wheat biology, genome structure and evolution. However, to date, the studies mostly focused on the coding component of the genome, even if, nowadays, we know that 90% of the eukaryotic genome is transcribed, but only a small fraction corresponds to protein coding RNAs, indicating that a large portion of the transcriptome is actually non-coding. In addition to the well-known structural non-coding RNAs, the advent of RNA-seq technology allowed the study of novel classes of RNAs with key regulatory roles in various biological processes, the so called regulatory non-coding RNAs that comprise two classes: small regulatory RNAs (sRNAs), such as micro RNAs (miRNAs) and short interfering RNAs (siRNAs), and long non-coding RNAs (lncRNAs). These two categories have peculiar characteristics that distinguish them from each other; however, both have been shown to play important regulatory roles in animals and plants. In this thesis work, we aimed at exploring the regulatory non-coding RNAs in the wheat wild relatives T. urartu and Ae. tauschii, in order to create the first comprehensive atlas of non coding RNA in these two species. Specifically, we performed a miRNA analysis in Ae. tauschii and then we focused on the investigation of lncRNAs in both T. urartu and Ae. tauschii. For the study of Ae. tauschii miRNAs, we retrieved a set of seven public small RNA-seq libraries, produced from seedling, spike and seed. Using the ShortStack pipeline (Axtell, 2013), we were able to annotate 151 MIR genes (208 mature sequences) belonging to 98 different families. Eighty-seven out 151 were classified as putative novel species-specific MIRNA loci and differential expression analysis revealed that 67 miRNAs were strongly modulated across seedling, spike and seed. Moreover, through sequence similarity search, 20 loci were found to be perfectly conserved through the evolutionary processes, between Ae. tauschii and the hexaploid bread wheat. To perform a comprehensive annotation of the lncRNAs in T. urartu and Ae. tauschii, we analysed sixty-eight public RNA-seq libraries generated from several organs and conditions, specifically, from T. urartu root, shoot, leaf (in cold stress and control conditions), seedling and spike and from Ae. tauschii root, shoot, leaf, seedling, seed, spike, pistil, sheath, stem and stamen. A two steps pipeline was applied. Transcriptome reconstruction was achieved as follows: short-reads were aligned two times to the respective reference genome to benefit from the splice junction information retrieved from the first mapping iteration; successively, the entire transcriptome was independently de novo reconstructed using a genome-guided approach. Identification of lncRNAs was accomplished on the basis of the main features of lncRNA: a stringent pipeline was applied to filter out bona-fide lncRNAs from the whole set of the transcripts de-novo reconstructed. We predicted 14,515 T. urartu and 20,908 Ae. tauschii bona-fide lncRNAs, showing features similar to those of other plant and animal counterparts, such as a reduced transcript length and number of exons. Thousands lncRNAs were significantly modulated in different organs and exhibited organ specific expression with a predominant accumulation in reproductive organs. Interestingly, most of the organ-specific lncRNAs were found to be associated with transposable elements (TEs). Although the majority of T. urartu and Ae. tauschii lncRNAs appears to be species-specific, we found some lncRNAs perfectly conserved between T. urartu and Ae. tauschii. In addition, we verified the conservation of thousands of lncRNA cis regulatory sequences and as many lncRNAs transcripts between pre-domestication and post-domestication species. Our work provides the first comprehensive atlas of wheat wild relatives regulatory non-coding RNAs and shed new light on their characteristics and conservation across different species.

Regulatory non-coding RNAs: an explorative study from wild grasses to wheat

2019

Abstract

The understanding of the complex wheat genome can be facilitated by the study of its wild relatives. Triticum urartu and Aegilops tauschii are the diploid donors of the A and D genome in the hexaploid bread wheat (Triticum aestivum, AuAuBBDD), respectively. The recently sequenced genomes of these two wild relatives represent an additional precious resource to understand wheat biology, genome structure and evolution. However, to date, the studies mostly focused on the coding component of the genome, even if, nowadays, we know that 90% of the eukaryotic genome is transcribed, but only a small fraction corresponds to protein coding RNAs, indicating that a large portion of the transcriptome is actually non-coding. In addition to the well-known structural non-coding RNAs, the advent of RNA-seq technology allowed the study of novel classes of RNAs with key regulatory roles in various biological processes, the so called regulatory non-coding RNAs that comprise two classes: small regulatory RNAs (sRNAs), such as micro RNAs (miRNAs) and short interfering RNAs (siRNAs), and long non-coding RNAs (lncRNAs). These two categories have peculiar characteristics that distinguish them from each other; however, both have been shown to play important regulatory roles in animals and plants. In this thesis work, we aimed at exploring the regulatory non-coding RNAs in the wheat wild relatives T. urartu and Ae. tauschii, in order to create the first comprehensive atlas of non coding RNA in these two species. Specifically, we performed a miRNA analysis in Ae. tauschii and then we focused on the investigation of lncRNAs in both T. urartu and Ae. tauschii. For the study of Ae. tauschii miRNAs, we retrieved a set of seven public small RNA-seq libraries, produced from seedling, spike and seed. Using the ShortStack pipeline (Axtell, 2013), we were able to annotate 151 MIR genes (208 mature sequences) belonging to 98 different families. Eighty-seven out 151 were classified as putative novel species-specific MIRNA loci and differential expression analysis revealed that 67 miRNAs were strongly modulated across seedling, spike and seed. Moreover, through sequence similarity search, 20 loci were found to be perfectly conserved through the evolutionary processes, between Ae. tauschii and the hexaploid bread wheat. To perform a comprehensive annotation of the lncRNAs in T. urartu and Ae. tauschii, we analysed sixty-eight public RNA-seq libraries generated from several organs and conditions, specifically, from T. urartu root, shoot, leaf (in cold stress and control conditions), seedling and spike and from Ae. tauschii root, shoot, leaf, seedling, seed, spike, pistil, sheath, stem and stamen. A two steps pipeline was applied. Transcriptome reconstruction was achieved as follows: short-reads were aligned two times to the respective reference genome to benefit from the splice junction information retrieved from the first mapping iteration; successively, the entire transcriptome was independently de novo reconstructed using a genome-guided approach. Identification of lncRNAs was accomplished on the basis of the main features of lncRNA: a stringent pipeline was applied to filter out bona-fide lncRNAs from the whole set of the transcripts de-novo reconstructed. We predicted 14,515 T. urartu and 20,908 Ae. tauschii bona-fide lncRNAs, showing features similar to those of other plant and animal counterparts, such as a reduced transcript length and number of exons. Thousands lncRNAs were significantly modulated in different organs and exhibited organ specific expression with a predominant accumulation in reproductive organs. Interestingly, most of the organ-specific lncRNAs were found to be associated with transposable elements (TEs). Although the majority of T. urartu and Ae. tauschii lncRNAs appears to be species-specific, we found some lncRNAs perfectly conserved between T. urartu and Ae. tauschii. In addition, we verified the conservation of thousands of lncRNA cis regulatory sequences and as many lncRNAs transcripts between pre-domestication and post-domestication species. Our work provides the first comprehensive atlas of wheat wild relatives regulatory non-coding RNAs and shed new light on their characteristics and conservation across different species.
24-giu-2019
Italiano
ALBERTINI, EMIDIO
CAVALLINI, ANDREA
PUCCIARIELLO, CHIARA
DELL'ACQUA, MATTEO
Scuola Superiore di Studi Universitari e Perfezionamento "S. Anna" di Pisa
File in questo prodotto:
File Dimensione Formato  
Alice_Pieri_PhD_thesis.pdf

Open Access dal 03/05/2022

Tipologia: Altro materiale allegato
Dimensione 4.78 MB
Formato Adobe PDF
4.78 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/150625
Il codice NBN di questa tesi è URN:NBN:IT:SSSUP-150625