Cancer arises from complex interactions between genetic and epigenetic alterations that accumulate across cell populations over time. Copy number alterations (CNAs), DNA methylation changes, and clonal heterogeneity are fundamental drivers of tumour progression; understanding how these layers interact is key to decoding tumour evolution. This thesis presents a comprehensive computational framework for the integrative analysis of tumour evolution through the joint modelling of genetic, epigenetic, and clonal variation using whole-genome sequencing data. In the first part, we introduce a Bayesian approach for allele-specific copy number inference that leverages the native properties of long-read sequencing, including haplotype phasing and direct DNA methylation detection. The model jointly analyses read depth, B-allele frequency, and variant allele frequency to estimate tumour purity, ploidy, and allele-specific copy number states, while integrating haplotype-resolved methylation profiles. This formulation enables the identification of genomic regions where structural imbalance and methylation asymmetry co-occur, providing new insights into the interplay between genetic and epigenetic regulation. Benchmarking across simulated datasets spanning multiple sequencing depths and purity levels demonstrates comparable accuracy compared to existing short-read methods. Application to colorectal cancer organoids and preliminary analysis of 100-patient Genomics England cohort further reveals consistent relationships between CNAs and allele-specific methylation in key regulatory regions, supporting the hypothesis that copy number variation and methylation jointly shape gene regulation during tumour evolution. The second part of the thesis introduces a population genetics-informed simulation framework designed to generate realistic synthetic tumour genomes and corresponding sequencing data for benchmarking and methodological development. The simulator models tumour evolution as a stochastic branching process, where each subclone acquires somatic point mutations, CNAs, and epigenetic alterations under user-defined evolutionary parameters such as mutation rate, selection strength, and clonal expansion dynamics. This approach captures the genetic heterogeneity observed in real tumours while maintaining explicit control over ground-truth evolutionary history. The framework includes a read-level sequencing simulator capable of producing both real whole genome sequencing data with customizable coverage, error rate, read length, and tumour purity, thereby allowing systematic evaluation of analytical tools under controlled conditions. To ensure reproducibility and accessibility, the simulation platform is complemented by a standardized Nextflow pipeline, nf-core/tumourevo, which integrates modules for variant and driver annotation, copy number quality control, mutational signature analysis, and subclonal reconstruction. Benchmarking experiments demonstrate the framework’s capacity to reproduce realistic tumour evolutionary scenarios and to quantify the accuracy and limitations of existing inference methods. Together, these contributions establish a unified framework that connects Bayesian modelling, evolutionary simulation, and long-read sequencing technologies. By jointly analysing copy number, methylation, and clonal structure, this work advances our capacity to interpret tumour evolution and provides a robust methodological foundation for integrating multi-layer molecular data in cancer genomics.

Cancer arises from complex interactions between genetic and epigenetic alterations that accumulate across cell populations over time. Copy number alterations (CNAs), DNA methylation changes, and clonal heterogeneity are fundamental drivers of tumour progression; understanding how these layers interact is key to decoding tumour evolution. This thesis presents a comprehensive computational framework for the integrative analysis of tumour evolution through the joint modelling of genetic, epigenetic, and clonal variation using whole-genome sequencing data. In the first part, we introduce a Bayesian approach for allele-specific copy number inference that leverages the native properties of long-read sequencing, including haplotype phasing and direct DNA methylation detection. The model jointly analyses read depth, B-allele frequency, and variant allele frequency to estimate tumour purity, ploidy, and allele-specific copy number states, while integrating haplotype-resolved methylation profiles. This formulation enables the identification of genomic regions where structural imbalance and methylation asymmetry co-occur, providing new insights into the interplay between genetic and epigenetic regulation. Benchmarking across simulated datasets spanning multiple sequencing depths and purity levels demonstrates comparable accuracy compared to existing short-read methods. Application to colorectal cancer organoids and preliminary analysis of 100-patient Genomics England cohort further reveals consistent relationships between CNAs and allele-specific methylation in key regulatory regions, supporting the hypothesis that copy number variation and methylation jointly shape gene regulation during tumour evolution. The second part of the thesis introduces a population genetics-informed simulation framework designed to generate realistic synthetic tumour genomes and corresponding sequencing data for benchmarking and methodological development. The simulator models tumour evolution as a stochastic branching process, where each subclone acquires somatic point mutations, CNAs, and epigenetic alterations under user-defined evolutionary parameters such as mutation rate, selection strength, and clonal expansion dynamics. This approach captures the genetic heterogeneity observed in real tumours while maintaining explicit control over ground-truth evolutionary history. The framework includes a read-level sequencing simulator capable of producing both real whole genome sequencing data with customizable coverage, error rate, read length, and tumour purity, thereby allowing systematic evaluation of analytical tools under controlled conditions. To ensure reproducibility and accessibility, the simulation platform is complemented by a standardized Nextflow pipeline, nf-core/tumourevo, which integrates modules for variant and driver annotation, copy number quality control, mutational signature analysis, and subclonal reconstruction. Benchmarking experiments demonstrate the framework’s capacity to reproduce realistic tumour evolutionary scenarios and to quantify the accuracy and limitations of existing inference methods. Together, these contributions establish a unified framework that connects Bayesian modelling, evolutionary simulation, and long-read sequencing technologies. By jointly analysing copy number, methylation, and clonal structure, this work advances our capacity to interpret tumour evolution and provides a robust methodological foundation for integrating multi-layer molecular data in cancer genomics.

Computational modelling of the interplay between genetic and epigenetic in tumour evolution

VALERIANI, LUCREZIA
2026

Abstract

Cancer arises from complex interactions between genetic and epigenetic alterations that accumulate across cell populations over time. Copy number alterations (CNAs), DNA methylation changes, and clonal heterogeneity are fundamental drivers of tumour progression; understanding how these layers interact is key to decoding tumour evolution. This thesis presents a comprehensive computational framework for the integrative analysis of tumour evolution through the joint modelling of genetic, epigenetic, and clonal variation using whole-genome sequencing data. In the first part, we introduce a Bayesian approach for allele-specific copy number inference that leverages the native properties of long-read sequencing, including haplotype phasing and direct DNA methylation detection. The model jointly analyses read depth, B-allele frequency, and variant allele frequency to estimate tumour purity, ploidy, and allele-specific copy number states, while integrating haplotype-resolved methylation profiles. This formulation enables the identification of genomic regions where structural imbalance and methylation asymmetry co-occur, providing new insights into the interplay between genetic and epigenetic regulation. Benchmarking across simulated datasets spanning multiple sequencing depths and purity levels demonstrates comparable accuracy compared to existing short-read methods. Application to colorectal cancer organoids and preliminary analysis of 100-patient Genomics England cohort further reveals consistent relationships between CNAs and allele-specific methylation in key regulatory regions, supporting the hypothesis that copy number variation and methylation jointly shape gene regulation during tumour evolution. The second part of the thesis introduces a population genetics-informed simulation framework designed to generate realistic synthetic tumour genomes and corresponding sequencing data for benchmarking and methodological development. The simulator models tumour evolution as a stochastic branching process, where each subclone acquires somatic point mutations, CNAs, and epigenetic alterations under user-defined evolutionary parameters such as mutation rate, selection strength, and clonal expansion dynamics. This approach captures the genetic heterogeneity observed in real tumours while maintaining explicit control over ground-truth evolutionary history. The framework includes a read-level sequencing simulator capable of producing both real whole genome sequencing data with customizable coverage, error rate, read length, and tumour purity, thereby allowing systematic evaluation of analytical tools under controlled conditions. To ensure reproducibility and accessibility, the simulation platform is complemented by a standardized Nextflow pipeline, nf-core/tumourevo, which integrates modules for variant and driver annotation, copy number quality control, mutational signature analysis, and subclonal reconstruction. Benchmarking experiments demonstrate the framework’s capacity to reproduce realistic tumour evolutionary scenarios and to quantify the accuracy and limitations of existing inference methods. Together, these contributions establish a unified framework that connects Bayesian modelling, evolutionary simulation, and long-read sequencing technologies. By jointly analysing copy number, methylation, and clonal structure, this work advances our capacity to interpret tumour evolution and provides a robust methodological foundation for integrating multi-layer molecular data in cancer genomics.
5-feb-2026
Inglese
Cancer arises from complex interactions between genetic and epigenetic alterations that accumulate across cell populations over time. Copy number alterations (CNAs), DNA methylation changes, and clonal heterogeneity are fundamental drivers of tumour progression; understanding how these layers interact is key to decoding tumour evolution. This thesis presents a comprehensive computational framework for the integrative analysis of tumour evolution through the joint modelling of genetic, epigenetic, and clonal variation using whole-genome sequencing data. In the first part, we introduce a Bayesian approach for allele-specific copy number inference that leverages the native properties of long-read sequencing, including haplotype phasing and direct DNA methylation detection. The model jointly analyses read depth, B-allele frequency, and variant allele frequency to estimate tumour purity, ploidy, and allele-specific copy number states, while integrating haplotype-resolved methylation profiles. This formulation enables the identification of genomic regions where structural imbalance and methylation asymmetry co-occur, providing new insights into the interplay between genetic and epigenetic regulation. Benchmarking across simulated datasets spanning multiple sequencing depths and purity levels demonstrates comparable accuracy compared to existing short-read methods. Application to colorectal cancer organoids and preliminary analysis of 100-patient Genomics England cohort further reveals consistent relationships between CNAs and allele-specific methylation in key regulatory regions, supporting the hypothesis that copy number variation and methylation jointly shape gene regulation during tumour evolution. The second part of the thesis introduces a population genetics-informed simulation framework designed to generate realistic synthetic tumour genomes and corresponding sequencing data for benchmarking and methodological development. The simulator models tumour evolution as a stochastic branching process, where each subclone acquires somatic point mutations, CNAs, and epigenetic alterations under user-defined evolutionary parameters such as mutation rate, selection strength, and clonal expansion dynamics. This approach captures the genetic heterogeneity observed in real tumours while maintaining explicit control over ground-truth evolutionary history. The framework includes a read-level sequencing simulator capable of producing both real whole genome sequencing data with customizable coverage, error rate, read length, and tumour purity, thereby allowing systematic evaluation of analytical tools under controlled conditions. To ensure reproducibility and accessibility, the simulation platform is complemented by a standardized Nextflow pipeline, nf-core/tumourevo, which integrates modules for variant and driver annotation, copy number quality control, mutational signature analysis, and subclonal reconstruction. Benchmarking experiments demonstrate the framework’s capacity to reproduce realistic tumour evolutionary scenarios and to quantify the accuracy and limitations of existing inference methods. Together, these contributions establish a unified framework that connects Bayesian modelling, evolutionary simulation, and long-read sequencing technologies. By jointly analysing copy number, methylation, and clonal structure, this work advances our capacity to interpret tumour evolution and provides a robust methodological foundation for integrating multi-layer molecular data in cancer genomics.
Bayesian inference; Hidden Markov Models; Bioinformatics; Copy Number Calling; Simulations
CARAVAGNA, GIULIO
CAZZANIGA, ALBERTO
Università degli Studi di Trieste
File in questo prodotto:
File Dimensione Formato  
Valeriani_PhDThesis_Final.pdf

embargo fino al 05/02/2027

Licenza: Tutti i diritti riservati
Dimensione 18.24 MB
Formato Adobe PDF
18.24 MB Adobe PDF
Valeriani_PhDThesis_Final_1.pdf

embargo fino al 05/02/2027

Licenza: Tutti i diritti riservati
Dimensione 18.24 MB
Formato Adobe PDF
18.24 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/357309
Il codice NBN di questa tesi è URN:NBN:IT:UNITS-357309