During the three years of PhD, I have been working mainly on two projects extensively aimed at the characterization of lncRNAs’ splicing features and at understanding the GC-AG introns involvement in it. I have also been involved on a third project, aimed at the understanding of the aberrant alternative splicing occurring in Acute Myeloid Leukemia (AML) patients. The first part of the research work described in this thesis started with the characterization of long non-coding RNAs in comparison with protein-coding genes (PCGs). Long non-coding RNAs (lncRNAs) are recognized as a new class of regulatory molecules, despite very little is known about their functions in the cellular processes. Due to their overall low expression level and tissue-specificity, the identification and annotation of lncRNA genes still remains challenging. LncRNAs show a low level of sequence conservation, but an evolutionary constraint on lncRNA sequences is localized at splicing regulatory elements suggesting that the recognition of the intron boundaries and their splicing is a crucial step required for their function. We exploited recent annotations by the GENCODE compendium to characterize the splicing features of long non-coding genes, in comparison to protein-coding ones, in the human and mouse genome by using bioinformatics approaches. A significant difference in the splice sites usage was observed between the two gene classes. While the frequency of non-canonical GC-AG splice junctions represents about 0.8% of total splice sites in protein-coding genes, we identified a remarkable enrichment of the GC-AG splice sites in long non-coding genes, both in human (3.0%) and mouse (1.9%). In addition, we identified peculiar characteristics of the GC-AG introns in terms of their intron length, a positional bias in the first intron, their donor and acceptor splice sites strength, poly-pyrimidine tract, and alternative polyadenylation signaling. Genes containing GC-AG introns were found conserved in many species across large evolutionary distances and a functional analysis pointed toward their enrichment in specific biological processes such as DNA repair. The second part of the project caught the wave of first part discovery of the peculiar involvement of GC-AG introns in gene expression regulation. Indeed, the GC-AG introns appeared more prone to alternative 5’ splice site (A5) splicing. Digging deeply into the characterization of their features, we found a particular enrichment of GC-AG introns in a subtype of A5, termed wobble splicing (A5-WS), in both human and mouse genomes and especially in PCGs. In this context, GC-AG introns showed peculiar features, if compared with the canonical counterpart (GT-AG introns). In general, our study highlights the importance of wobble splicing in the creation of a wider transcriptome and consequently proteome, not confining it to mere splicing noise, as previously suggested. Indeed, from our results, wobble splicing appeared to be as an evolutionary mechanism used to introduce subtle insertions and deletion, augmenting the coding potential, thanks also to the presence of non-canonical GC-AG introns. The third part of my research saw me involved in a collaboration with Prof. L. Holmfeldt from the Immunology, Genetics and Pathology laboratory at Rudbeck Institute (Uppsala, Sweden). Exploiting RNA-seq data from Rudbeck’s Acute Myeloid Leukemia (AML) patients and enlarging the patient cohort using Leucegene and BEAT AML datasets, we started an investigation on the splicing defects occurring in AML patients showing mutations in genes encoding for splicing factors, especially in the SRSF2 gene.

Analisi del ruolo degli introni GC-AG nell'espressione genica e del loro coinvolgimento nello splicing alternativo

CELLI, LUDOVICA
2023

Abstract

During the three years of PhD, I have been working mainly on two projects extensively aimed at the characterization of lncRNAs’ splicing features and at understanding the GC-AG introns involvement in it. I have also been involved on a third project, aimed at the understanding of the aberrant alternative splicing occurring in Acute Myeloid Leukemia (AML) patients. The first part of the research work described in this thesis started with the characterization of long non-coding RNAs in comparison with protein-coding genes (PCGs). Long non-coding RNAs (lncRNAs) are recognized as a new class of regulatory molecules, despite very little is known about their functions in the cellular processes. Due to their overall low expression level and tissue-specificity, the identification and annotation of lncRNA genes still remains challenging. LncRNAs show a low level of sequence conservation, but an evolutionary constraint on lncRNA sequences is localized at splicing regulatory elements suggesting that the recognition of the intron boundaries and their splicing is a crucial step required for their function. We exploited recent annotations by the GENCODE compendium to characterize the splicing features of long non-coding genes, in comparison to protein-coding ones, in the human and mouse genome by using bioinformatics approaches. A significant difference in the splice sites usage was observed between the two gene classes. While the frequency of non-canonical GC-AG splice junctions represents about 0.8% of total splice sites in protein-coding genes, we identified a remarkable enrichment of the GC-AG splice sites in long non-coding genes, both in human (3.0%) and mouse (1.9%). In addition, we identified peculiar characteristics of the GC-AG introns in terms of their intron length, a positional bias in the first intron, their donor and acceptor splice sites strength, poly-pyrimidine tract, and alternative polyadenylation signaling. Genes containing GC-AG introns were found conserved in many species across large evolutionary distances and a functional analysis pointed toward their enrichment in specific biological processes such as DNA repair. The second part of the project caught the wave of first part discovery of the peculiar involvement of GC-AG introns in gene expression regulation. Indeed, the GC-AG introns appeared more prone to alternative 5’ splice site (A5) splicing. Digging deeply into the characterization of their features, we found a particular enrichment of GC-AG introns in a subtype of A5, termed wobble splicing (A5-WS), in both human and mouse genomes and especially in PCGs. In this context, GC-AG introns showed peculiar features, if compared with the canonical counterpart (GT-AG introns). In general, our study highlights the importance of wobble splicing in the creation of a wider transcriptome and consequently proteome, not confining it to mere splicing noise, as previously suggested. Indeed, from our results, wobble splicing appeared to be as an evolutionary mechanism used to introduce subtle insertions and deletion, augmenting the coding potential, thanks also to the presence of non-canonical GC-AG introns. The third part of my research saw me involved in a collaboration with Prof. L. Holmfeldt from the Immunology, Genetics and Pathology laboratory at Rudbeck Institute (Uppsala, Sweden). Exploiting RNA-seq data from Rudbeck’s Acute Myeloid Leukemia (AML) patients and enlarging the patient cohort using Leucegene and BEAT AML datasets, we started an investigation on the splicing defects occurring in AML patients showing mutations in genes encoding for splicing factors, especially in the SRSF2 gene.
3-apr-2023
Inglese
SASSERA, DAVIDE
Università degli studi di Pavia
File in questo prodotto:
File Dimensione Formato  
Celli_PhDthesis_final.pdf

Open Access dal 13/10/2024

Dimensione 22.54 MB
Formato Adobe PDF
22.54 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/84658
Il codice NBN di questa tesi è URN:NBN:IT:UNIPV-84658