Reproducible Computational Frameworks for Single-Cell and Long-Read Functional Genomics

Ratto, Maria Luisa

Genomic and transcriptomic technologies have advanced rapidly in recent years, becoming essential tools for modern biological and medical research. As data volume and complexity continue to grow, there is an increasing demand for computational pipelines that are reproducible, accessible, and rigorously tested, while remaining aligned with the latest experimental methodologies. In this work, I addressed several open challenges in contemporary bioinformatics through the development of reproducible computational frameworks spanning diverse techniques and biological questions. To support the need for benchmarking in single-cell RNA sequencing, I systematically compared multiple clustering algorithms using a custom dataset specifically designed to model cancer heterogeneity in a controlled environment. Such benchmarking on labeled data is essential to ensure robust and generalizable tool development. Building on this experience, I contributed to the development of iPS2-seq (iPS-optimized inducible Post-transcriptional Silencing in pool deconvoluted by single-cell sequencing), a novel method enabling clonal, single-cell-resolved gene perturbation screens applicable to human pluripotent stem cell-derived lineages. Within this framework, I designed catcheR (clonality and treatment-controlled shRNA effect findeR), a reproducible and user-friendly data analysis pipeline. Implemented as a Dockerized R package, catcheR performs quality control and filtering to achieve reliable perturbation assignment, followed by dimensionality reduction, clustering, and annotation via Monocle3. It quantifies how gene perturbations affect transcriptional modules, pseudotime trajectories, and population shifts, generating publication-ready plots and statistics. Finally, I explored telomere biology in ALT+ sarcomas, exploiting long-read sequencing to characterize telomeric repeats and telomere insertions, repetitive genomic features that have long posed challenges to standard analyses. By assembling extended consensus sequences containing telomeric sequences, I was able to both quantify repeat content and map telomeric insertions to specific genomic regions, revealing significant overlap with structural variants and extrachromosomal DNA. Overall, this thesis demonstrates how reproducible computational frameworks can bridge diverse experimental contexts, from single-cell transcriptomics to long-read genomics, advancing both methodological rigor and functional discovery in modern bioinformatics.

Reproducible Computational Frameworks for Single-Cell and Long-Read Functional Genomics

RATTO, MARIA LUISA

2026

Abstract

Genomic and transcriptomic technologies have advanced rapidly in recent years, becoming essential tools for modern biological and medical research. As data volume and complexity continue to grow, there is an increasing demand for computational pipelines that are reproducible, accessible, and rigorously tested, while remaining aligned with the latest experimental methodologies. In this work, I addressed several open challenges in contemporary bioinformatics through the development of reproducible computational frameworks spanning diverse techniques and biological questions. To support the need for benchmarking in single-cell RNA sequencing, I systematically compared multiple clustering algorithms using a custom dataset specifically designed to model cancer heterogeneity in a controlled environment. Such benchmarking on labeled data is essential to ensure robust and generalizable tool development. Building on this experience, I contributed to the development of iPS2-seq (iPS-optimized inducible Post-transcriptional Silencing in pool deconvoluted by single-cell sequencing), a novel method enabling clonal, single-cell-resolved gene perturbation screens applicable to human pluripotent stem cell-derived lineages. Within this framework, I designed catcheR (clonality and treatment-controlled shRNA effect findeR), a reproducible and user-friendly data analysis pipeline. Implemented as a Dockerized R package, catcheR performs quality control and filtering to achieve reliable perturbation assignment, followed by dimensionality reduction, clustering, and annotation via Monocle3. It quantifies how gene perturbations affect transcriptional modules, pseudotime trajectories, and population shifts, generating publication-ready plots and statistics. Finally, I explored telomere biology in ALT+ sarcomas, exploiting long-read sequencing to characterize telomeric repeats and telomere insertions, repetitive genomic features that have long posed challenges to standard analyses. By assembling extended consensus sequences containing telomeric sequences, I was able to both quantify repeat content and map telomeric insertions to specific genomic regions, revealing significant overlap with structural variants and extrachromosomal DNA. Overall, this thesis demonstrates how reproducible computational frameworks can bridge diverse experimental contexts, from single-cell transcriptomics to long-read genomics, advancing both methodological rigor and functional discovery in modern bioinformatics.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				COMPLEX SYSTEMS FOR QUANTITATIVE BIOMEDICINE
			
	Data di pubblicazione
	
				3-feb-2026
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				CALOGERO, Raffaele Adolfo
			
	Nome Editore
	
				Università degli Studi di Torino
			
	Collezione di appartenenza
	
				Università degli Studi di Torino

File in questo prodotto:

File	Dimensione	Formato
Tesi-Ratto-MariaLuisa.pdf embargo fino al 03/02/2027 Licenza: Tutti i diritti riservati Dimensione 17.05 MB Formato Adobe PDF	17.05 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/357245

Il codice NBN di questa tesi è URN:NBN:IT:UNITO-357245