Epigenetics is the study of heritable changes in gene expression or phenotype that occur without involving alterations to the DNA sequence. It encompasses modifications to the structure and function of DNA and its associated proteins, which can influence gene activity and regulation. One of the key mechanisms in epigenetic regulation is DNA methylation which involves the addition of a methyl group to the DNA molecule, generally to cytosine bases. In mammals, methylation mostly occurs in CG-rich regions known as CpG islands, whereas in plants methylation may occur in CG, CHG and CHH contexts, being H any base except for guanine. Most of our understanding about the role of DNA methylation in plants comes from the study of the model plant Arabidopsis thaliana, where significant DNA methylation is found within pericentromeric regions and repetitive sequences which are usually targeted by the DNA methylation machinery to prevent their mobilization and avoid possible gene structural disruption or mis-regulation. Though, approximately one third of the Arabidopsis genes show methylation only in the CG context within the transcribed regions and this type of DNA methylation is associated with medium to high levels of gene expression (Zhang et al., 2006) (Adam J. Bewick and Schmitz, 2017). Methylation of gene promoter regions, especially near transcription start sites, typically leads to gene silencing or reduced gene expression, acting as a repressive mark and preventing the transcriptional machinery from accessing the gene to initiate transcription (Paszkowski and Whitham, 2001). DNA methylation in all the three cytosine contexts (CG, CHG and CHH) is generally associated with chromatin packing and thus with transcriptional inactivity. Indeed, this pattern is mostly found within transposable element sequences, which are usually targeted by the DNA methylation machinery to prevent their mobilization and avoid possible gene structural disruption or mis-regulation. Thus, TEs are major targets of epigenetic processes and genomic structural elements. However, their proportion in plant genome is very variable because of a very dynamic evolutionary history (Chuong, Elde and Feschotte, 2017). Indeed, TEs represent only 20% of the current genome of A. thaliana (Underwood, Henderson and Martienssen, 2017) but they can account for much greater fractions of other plant genomes. For instance, the TE content rises to about 40% in rice (Matsumoto et al., 2005), 80% in wheat (Charles et al., 2008) and 85% in corn (Vicient, 2010). The TE arrangement in the genome is a critical feature too, as repetitive elements can be mainly enriched in pericentromeric heterochromatic regions and less frequent in chromosome arms as in Arabidopsis (Zhang, Lang and J.-K. Zhu, 2018) or deeply infiltrated also in the gene space as in corn (Morgante et al., 2005). Last, DNA methylation patterns can exhibit different features depending on the subtype of repetitive elements. TEs can be divided into Class I elements known as retrotransposons, which replicate through RNA and cDNA intermediates, and Class II elements or DNA transposons, which replicate via a DNA intermediate. In the first chapter of the thesis, I profiled the DNA methylation genomic landscape of the Lombardy poplar by analysing the methylation patterns of DNA extracted from leaf tissue. Previous epigenetic studies involving the Lombardy poplar used the genome reference of the related species Populus trichocarpa to align the bisulfite converted reads, obtaining on average only 46% of the reads mapped against the reference genome (Broeck et al., 2023). However, within the Epidiverse project, a reference genome was generated de novo and specifically for the consortium. As a first step, I determined the read mapping efficiency and the distribution of methylcytosines across the genome to evaluate to which extent these patterns overlap with those observed in other studies of the same cultivar. Similar analyses to the ones generated during the first part of the thesis have been previously performed in another clonally propagated tree that is Vitis vinifera (Mirko Celii, doctoral thesis), allowing for a direct comparison with our results that will be presented during this chapter. Following these initial steps, I focused more specifically on DNA methylation patterns in genes and TEs for general characterization purposes and to verify that both gene prediction and TE identification in the new genome reference were corroborated by the expected DNA methylation profiles observed in most investigated plants. Additional analyses were carried out for a deeper characterization of genic structural features and expression patterns, in preparation for follow-up analyses described in Chapter 2. In many plant species, a subset of genes are characterized by the presence of CG methylation within the gene body and medium-high levels of gene expression. However, the process by which CG methylation accumulation occurs within genic regions remains unclear. In the last decade it has emerged the hypothesis that gene body methylation could be a by-product of the methylation mechanisms of the repetitive DNA present in non-coding regions of genes caused by a molecular imperfection in the maintenance and removal of methylation marks in genes (Adam J Bewick and Schmitz, 2017; Muyle et al., 2022). The idea for the second chapter was to exploit the unique genotype present in a clonal population of the Lombardy poplar to investigate instances of DNA methylation variation that could help understand the process of accumulation or maintenance of gene body methylation. The combination of DNA methylation information across all the individuals can allow for the detection of new patterns by the sum of methylation signal coming from several individuals. Moreover, every single cytosine position can be analyzed on its own with information coming from multiple individuals that could be used as replicates. Merging BS-seq data can aid in the identification of new genomic elements or regions of interest with differential patterns of DNA methylation, or even the discovery of subtle methylation signals undetectable when the analysis is restricted to few individuals. Since variation of methylation within genes and the surrounding sequences has been associated with expression differences, recognizing subtle differences and identifying the underlying cause is important for understanding the contribution of methylation in gene expression in plant genomes. We will combine methylation studies with general gene expression information of leaves tissues from three poplar replicates to study possible effects of gene expression within the accumulation of methylation in genes. The data collected and analyzed in this study allowed for the identification of DNA methylation polymorphisms that can be directly linked to processes of DNA methylation spreading from TEs subject to epigenetic silencing and offer new insights into how DNA methylation is accumulated in genic sequences in connection with their expression and gene body methylation levels.

Dynamics of DNA methylation variation in the clonally propagated perennial tree Populus nigra cv. ‘italica’

PEREZ-BELLO GIL, PALOMA
2023

Abstract

Epigenetics is the study of heritable changes in gene expression or phenotype that occur without involving alterations to the DNA sequence. It encompasses modifications to the structure and function of DNA and its associated proteins, which can influence gene activity and regulation. One of the key mechanisms in epigenetic regulation is DNA methylation which involves the addition of a methyl group to the DNA molecule, generally to cytosine bases. In mammals, methylation mostly occurs in CG-rich regions known as CpG islands, whereas in plants methylation may occur in CG, CHG and CHH contexts, being H any base except for guanine. Most of our understanding about the role of DNA methylation in plants comes from the study of the model plant Arabidopsis thaliana, where significant DNA methylation is found within pericentromeric regions and repetitive sequences which are usually targeted by the DNA methylation machinery to prevent their mobilization and avoid possible gene structural disruption or mis-regulation. Though, approximately one third of the Arabidopsis genes show methylation only in the CG context within the transcribed regions and this type of DNA methylation is associated with medium to high levels of gene expression (Zhang et al., 2006) (Adam J. Bewick and Schmitz, 2017). Methylation of gene promoter regions, especially near transcription start sites, typically leads to gene silencing or reduced gene expression, acting as a repressive mark and preventing the transcriptional machinery from accessing the gene to initiate transcription (Paszkowski and Whitham, 2001). DNA methylation in all the three cytosine contexts (CG, CHG and CHH) is generally associated with chromatin packing and thus with transcriptional inactivity. Indeed, this pattern is mostly found within transposable element sequences, which are usually targeted by the DNA methylation machinery to prevent their mobilization and avoid possible gene structural disruption or mis-regulation. Thus, TEs are major targets of epigenetic processes and genomic structural elements. However, their proportion in plant genome is very variable because of a very dynamic evolutionary history (Chuong, Elde and Feschotte, 2017). Indeed, TEs represent only 20% of the current genome of A. thaliana (Underwood, Henderson and Martienssen, 2017) but they can account for much greater fractions of other plant genomes. For instance, the TE content rises to about 40% in rice (Matsumoto et al., 2005), 80% in wheat (Charles et al., 2008) and 85% in corn (Vicient, 2010). The TE arrangement in the genome is a critical feature too, as repetitive elements can be mainly enriched in pericentromeric heterochromatic regions and less frequent in chromosome arms as in Arabidopsis (Zhang, Lang and J.-K. Zhu, 2018) or deeply infiltrated also in the gene space as in corn (Morgante et al., 2005). Last, DNA methylation patterns can exhibit different features depending on the subtype of repetitive elements. TEs can be divided into Class I elements known as retrotransposons, which replicate through RNA and cDNA intermediates, and Class II elements or DNA transposons, which replicate via a DNA intermediate. In the first chapter of the thesis, I profiled the DNA methylation genomic landscape of the Lombardy poplar by analysing the methylation patterns of DNA extracted from leaf tissue. Previous epigenetic studies involving the Lombardy poplar used the genome reference of the related species Populus trichocarpa to align the bisulfite converted reads, obtaining on average only 46% of the reads mapped against the reference genome (Broeck et al., 2023). However, within the Epidiverse project, a reference genome was generated de novo and specifically for the consortium. As a first step, I determined the read mapping efficiency and the distribution of methylcytosines across the genome to evaluate to which extent these patterns overlap with those observed in other studies of the same cultivar. Similar analyses to the ones generated during the first part of the thesis have been previously performed in another clonally propagated tree that is Vitis vinifera (Mirko Celii, doctoral thesis), allowing for a direct comparison with our results that will be presented during this chapter. Following these initial steps, I focused more specifically on DNA methylation patterns in genes and TEs for general characterization purposes and to verify that both gene prediction and TE identification in the new genome reference were corroborated by the expected DNA methylation profiles observed in most investigated plants. Additional analyses were carried out for a deeper characterization of genic structural features and expression patterns, in preparation for follow-up analyses described in Chapter 2. In many plant species, a subset of genes are characterized by the presence of CG methylation within the gene body and medium-high levels of gene expression. However, the process by which CG methylation accumulation occurs within genic regions remains unclear. In the last decade it has emerged the hypothesis that gene body methylation could be a by-product of the methylation mechanisms of the repetitive DNA present in non-coding regions of genes caused by a molecular imperfection in the maintenance and removal of methylation marks in genes (Adam J Bewick and Schmitz, 2017; Muyle et al., 2022). The idea for the second chapter was to exploit the unique genotype present in a clonal population of the Lombardy poplar to investigate instances of DNA methylation variation that could help understand the process of accumulation or maintenance of gene body methylation. The combination of DNA methylation information across all the individuals can allow for the detection of new patterns by the sum of methylation signal coming from several individuals. Moreover, every single cytosine position can be analyzed on its own with information coming from multiple individuals that could be used as replicates. Merging BS-seq data can aid in the identification of new genomic elements or regions of interest with differential patterns of DNA methylation, or even the discovery of subtle methylation signals undetectable when the analysis is restricted to few individuals. Since variation of methylation within genes and the surrounding sequences has been associated with expression differences, recognizing subtle differences and identifying the underlying cause is important for understanding the contribution of methylation in gene expression in plant genomes. We will combine methylation studies with general gene expression information of leaves tissues from three poplar replicates to study possible effects of gene expression within the accumulation of methylation in genes. The data collected and analyzed in this study allowed for the identification of DNA methylation polymorphisms that can be directly linked to processes of DNA methylation spreading from TEs subject to epigenetic silencing and offer new insights into how DNA methylation is accumulated in genic sequences in connection with their expression and gene body methylation levels.
27-set-2023
Inglese
Corzani, Barbara
SISSA
Trieste
File in questo prodotto:
File Dimensione Formato  
Paloma_Perez_Bello_PhD_thesis.pdf

Open Access dal 28/09/2024

Dimensione 10.5 MB
Formato Adobe PDF
10.5 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/169056
Il codice NBN di questa tesi è URN:NBN:IT:SISSA-169056