Long non-coding RNAs (lncRNAs) are recognized as a new class of regulatory molecules associated with organisms complexity despite very little is known about their functions in the cellular processes. Due to their overall low expression level and tissue-specificity, the identification and annotation of lncRNA genes still remains challenging. LncRNAs show a low level of sequence conservation, but an evolutionary constraint on lncRNA sequences is localized at splicing regulatory elements suggesting that the recognition of the intron boundaries and their splicing is a crucial step required for their function. We exploited recent annotations by the GENCODE compendium to characterize the splicing features of long non-coding genes, in comparison to protein-coding ones, in the human and mouse genome by using bioinformatics approaches. A significant difference in the splice sites usage was observed between the two gene classes. While the frequency of non-canonical GC-AG splice junctions represents about 0.8% of total splice sites in protein-coding genes, we identified a remarkable enrichment of the GC-AG splice sites in long non-coding genes, both in human (3.0%) and mouse (1.9%). In addition, we identified peculiar characteristics of the GC-AG introns in terms of their intron length, a positional bias in the first intron, their donor and acceptor splice sites strength, poly-pyrimidine tract, and alternative polyadenylation signaling. Genes containing GC-AG introns were found conserved in many species across large evolutionary distances and a functional analysis pointed toward their enrichment in specific biological processes such as DNA repair. Moreover, GC-AG introns appeared more prone to alternative splicing and enriched in a special alternative splicing mechanism termed wobble-splicing. Wobble-splicing appeared to be a rare mechanism, subjected to tissue-specific regulation and involved in inducing subtle changes in the expressed isoforms with a putative regulatory role. Taken together, our data suggests that GC-AG introns represent new regulatory elements mainly associated with lncRNAs, which could contribute to the evolution of complexity, adding a new layer in gene expression regulation.

Long non-coding RNAs (lncRNAs) are recognized as a new class of regulatory molecules associated with organisms complexity despite very little is known about their functions in the cellular processes. Due to their overall low expression level and tissue-specificity, the identification and annotation of lncRNA genes still remains challenging. LncRNAs show a low level of sequence conservation, but an evolutionary constraint on lncRNA sequences is localized at splicing regulatory elements suggesting that the recognition of the intron boundaries and their splicing is a crucial step required for their function. We exploited recent annotations by the GENCODE compendium to characterize the splicing features of long non-coding genes, in comparison to protein-coding ones, in the human and mouse genome by using bioinformatics approaches. A significant difference in the splice sites usage was observed between the two gene classes. While the frequency of non-canonical GC-AG splice junctions represents about 0.8% of total splice sites in protein-coding genes, we identified a remarkable enrichment of the GC-AG splice sites in long non-coding genes, both in human (3.0%) and mouse (1.9%). In addition, we identified peculiar characteristics of the GC-AG introns in terms of their intron length, a positional bias in the first intron, their donor and acceptor splice sites strength, poly-pyrimidine tract, and alternative polyadenylation signaling. Genes containing GC-AG introns were found conserved in many species across large evolutionary distances and a functional analysis pointed toward their enrichment in specific biological processes such as DNA repair. Moreover, GC-AG introns appeared more prone to alternative splicing and enriched in a special alternative splicing mechanism termed wobble-splicing. Wobble-splicing appeared to be a rare mechanism, subjected to tissue-specific regulation and involved in inducing subtle changes in the expressed isoforms with a putative regulatory role. Taken together, our data suggests that GC-AG introns represent new regulatory elements mainly associated with lncRNAs, which could contribute to the evolution of complexity, adding a new layer in gene expression regulation.

Genome-wide Characterization of the Genomic and Splicing Features of Long Non-coding RNAs Using Bioinformatics Approaches

ABOU ALEZZ, MONAH
2020

Abstract

Long non-coding RNAs (lncRNAs) are recognized as a new class of regulatory molecules associated with organisms complexity despite very little is known about their functions in the cellular processes. Due to their overall low expression level and tissue-specificity, the identification and annotation of lncRNA genes still remains challenging. LncRNAs show a low level of sequence conservation, but an evolutionary constraint on lncRNA sequences is localized at splicing regulatory elements suggesting that the recognition of the intron boundaries and their splicing is a crucial step required for their function. We exploited recent annotations by the GENCODE compendium to characterize the splicing features of long non-coding genes, in comparison to protein-coding ones, in the human and mouse genome by using bioinformatics approaches. A significant difference in the splice sites usage was observed between the two gene classes. While the frequency of non-canonical GC-AG splice junctions represents about 0.8% of total splice sites in protein-coding genes, we identified a remarkable enrichment of the GC-AG splice sites in long non-coding genes, both in human (3.0%) and mouse (1.9%). In addition, we identified peculiar characteristics of the GC-AG introns in terms of their intron length, a positional bias in the first intron, their donor and acceptor splice sites strength, poly-pyrimidine tract, and alternative polyadenylation signaling. Genes containing GC-AG introns were found conserved in many species across large evolutionary distances and a functional analysis pointed toward their enrichment in specific biological processes such as DNA repair. Moreover, GC-AG introns appeared more prone to alternative splicing and enriched in a special alternative splicing mechanism termed wobble-splicing. Wobble-splicing appeared to be a rare mechanism, subjected to tissue-specific regulation and involved in inducing subtle changes in the expressed isoforms with a putative regulatory role. Taken together, our data suggests that GC-AG introns represent new regulatory elements mainly associated with lncRNAs, which could contribute to the evolution of complexity, adding a new layer in gene expression regulation.
15-dic-2020
Inglese
Long non-coding RNAs (lncRNAs) are recognized as a new class of regulatory molecules associated with organisms complexity despite very little is known about their functions in the cellular processes. Due to their overall low expression level and tissue-specificity, the identification and annotation of lncRNA genes still remains challenging. LncRNAs show a low level of sequence conservation, but an evolutionary constraint on lncRNA sequences is localized at splicing regulatory elements suggesting that the recognition of the intron boundaries and their splicing is a crucial step required for their function. We exploited recent annotations by the GENCODE compendium to characterize the splicing features of long non-coding genes, in comparison to protein-coding ones, in the human and mouse genome by using bioinformatics approaches. A significant difference in the splice sites usage was observed between the two gene classes. While the frequency of non-canonical GC-AG splice junctions represents about 0.8% of total splice sites in protein-coding genes, we identified a remarkable enrichment of the GC-AG splice sites in long non-coding genes, both in human (3.0%) and mouse (1.9%). In addition, we identified peculiar characteristics of the GC-AG introns in terms of their intron length, a positional bias in the first intron, their donor and acceptor splice sites strength, poly-pyrimidine tract, and alternative polyadenylation signaling. Genes containing GC-AG introns were found conserved in many species across large evolutionary distances and a functional analysis pointed toward their enrichment in specific biological processes such as DNA repair. Moreover, GC-AG introns appeared more prone to alternative splicing and enriched in a special alternative splicing mechanism termed wobble-splicing. Wobble-splicing appeared to be a rare mechanism, subjected to tissue-specific regulation and involved in inducing subtle changes in the expressed isoforms with a putative regulatory role. Taken together, our data suggests that GC-AG introns represent new regulatory elements mainly associated with lncRNAs, which could contribute to the evolution of complexity, adding a new layer in gene expression regulation.
BIONE, SILVIA
Università degli studi di Pavia
File in questo prodotto:
File Dimensione Formato  
abou_alezz_thesis_final.pdf

accesso aperto

Dimensione 14.06 MB
Formato Adobe PDF
14.06 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/84885
Il codice NBN di questa tesi è URN:NBN:IT:UNIPV-84885