Recent technological advances in long-read sequencing have facilitated the assembly of telomere-to-telomere (T2T) genomes. The currently available T2T assembly CHM13 provides valuable insights into the complete architecture of the human genome, yet its exploitation for functional experiments is affected by discrepancies between the reference and the native genome of the specific biological system under study. Access to complete reference genomes for stable, experimentally relevant cell lines is essential to advance functional studies and precise manipulation, including multi-omics data analyses and genome editing, particularly in highly variable regions such as the centromeres. Comparative analyses of newly available human genome assemblies have highlighted extensive variation that peaks at centromeres. Reliance on a single reference genome can thus hinder whole-genome analysis of sequencing data derived from laboratory cell lines and limit their accurate genomic manipulation. Centromeres are epigenetically specified by distinct chromatin, whereas their DNA varies between species and individuals. This extensive sequence divergence makes comparative analyses between centromeres challenging. In this study, we 1) present RPE1v1.1, the near-complete diploid genome reference for the human retinal epithelial cells RPE-1, a widely used non-cancer laboratory cell line with a stable karyotype; 2) we demonstrate that using an “isogenomic” reference genome – fully matched to the experimental cell line – substantially improves the accuracy of genomic, epigenomic, transcriptomic analyses compared to a nonmatched reference; 3) we identified a chromosomespecific architectural pattern across the human genome, defined by the conserved spacing of a functionally relevant centromeric DNA motif. The distribution of these sites along chromosome arms constitutes the human “centeny map”. By using a custom Genomic Centromere Profiling (GCP) pipeline, we leveraged the motif’s position, orientation, and organization to construct structural models that enable reclassification of human chromosomal clusters, detection of centromere expansion, and identification of structural variants and misassembled regions. The high-resolution maps derived from this pattern not only provide a framework for comparative analysis of centromeres across evolution and disease but also offer a new dimension for chromosome annotation, assembly, and characterization.

Genetic and epigenetic changes in the repetitive regions of the human genome

CORDA, LUCA
2026

Abstract

Recent technological advances in long-read sequencing have facilitated the assembly of telomere-to-telomere (T2T) genomes. The currently available T2T assembly CHM13 provides valuable insights into the complete architecture of the human genome, yet its exploitation for functional experiments is affected by discrepancies between the reference and the native genome of the specific biological system under study. Access to complete reference genomes for stable, experimentally relevant cell lines is essential to advance functional studies and precise manipulation, including multi-omics data analyses and genome editing, particularly in highly variable regions such as the centromeres. Comparative analyses of newly available human genome assemblies have highlighted extensive variation that peaks at centromeres. Reliance on a single reference genome can thus hinder whole-genome analysis of sequencing data derived from laboratory cell lines and limit their accurate genomic manipulation. Centromeres are epigenetically specified by distinct chromatin, whereas their DNA varies between species and individuals. This extensive sequence divergence makes comparative analyses between centromeres challenging. In this study, we 1) present RPE1v1.1, the near-complete diploid genome reference for the human retinal epithelial cells RPE-1, a widely used non-cancer laboratory cell line with a stable karyotype; 2) we demonstrate that using an “isogenomic” reference genome – fully matched to the experimental cell line – substantially improves the accuracy of genomic, epigenomic, transcriptomic analyses compared to a nonmatched reference; 3) we identified a chromosomespecific architectural pattern across the human genome, defined by the conserved spacing of a functionally relevant centromeric DNA motif. The distribution of these sites along chromosome arms constitutes the human “centeny map”. By using a custom Genomic Centromere Profiling (GCP) pipeline, we leveraged the motif’s position, orientation, and organization to construct structural models that enable reclassification of human chromosomal clusters, detection of centromere expansion, and identification of structural variants and misassembled regions. The high-resolution maps derived from this pattern not only provide a framework for comparative analysis of centromeres across evolution and disease but also offer a new dimension for chromosome annotation, assembly, and characterization.
29-gen-2026
Inglese
Giunta, Simona
SAGGIO, Isabella
Università degli Studi di Roma "La Sapienza"
File in questo prodotto:
File Dimensione Formato  
Tesi_dottorato_Corda.pdf

accesso aperto

Licenza: Creative Commons
Dimensione 6.48 MB
Formato Adobe PDF
6.48 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/357331
Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-357331