DEVELOPMENT AND APPLICATION OF COMPUTATIONAL APPROACHES FOR SINGLE CELL RNA SEQUENCING DATA ANALYSIS

Traversa, Daniele

During my PhD, I focused on the development and application of bioinformatics methods for the analysis of data produced by single-cell technologies, with a particular focus on single-cell RNA sequencing (scRNA-seq). A critical challenge in this field is the automatic annotation of cell identity, where no standardized strategy currently exists due to inherent limitations such as data sparsity, low transcript detection, and dependence on heterogeneous reference datasets. To address this gap, I developed SCARLET, a novel probabilistic framework that defines cel- lular identity as the transcriptional program that most likely explains a cell’s gene expression profile. SCARLET integrates bootstrap-based gene selection with a mutual-information ap- proach to derive transcriptional programs and quantify their similarity, and employs likelihood- based modeling to assign cells to one or multiple plausible identities. Unlike existing methods, SCARLET explicitly accounts for the limitations of scRNA-seq. Benchmarking confirmed its competitive or superior performance relative to state-of-the-art tools. Applied to large-scale datasets encompassing more than 5 million cells, SCARLET reduced redundant labels while preserving annotation concordance, demonstrating both scalability and robustness. Beyond methodological advances, I applied single-cell approaches to two biological contexts. In rice shoot apical meristems, snRNA-seq analysis revealed transcription factor families dy- namically regulating floral transition, while underscoring current obstacles in plant single-cell research, including incomplete transcriptome coverage and limited reference resources. In breast cancer, scCROP-seq analysis uncovered mechanisms of tumor heterogeneity and evolv- ability under estrogen deprivation, identifying both perturbation-specific and shared transcrip- tional programs that illuminate adaptive survival strategies. Overall, this work advances computational strategies for cell identity annotation, delivers new biological insights, and highlights the need for improved data quality, multi-omics integration, and broader application of single-cell methods across diverse organisms.

DEVELOPMENT AND APPLICATION OF COMPUTATIONAL APPROACHES FOR SINGLE CELL RNA SEQUENCING DATA ANALYSIS

TRAVERSA, DANIELE

2026

Abstract

During my PhD, I focused on the development and application of bioinformatics methods for the analysis of data produced by single-cell technologies, with a particular focus on single-cell RNA sequencing (scRNA-seq). A critical challenge in this field is the automatic annotation of cell identity, where no standardized strategy currently exists due to inherent limitations such as data sparsity, low transcript detection, and dependence on heterogeneous reference datasets. To address this gap, I developed SCARLET, a novel probabilistic framework that defines cel- lular identity as the transcriptional program that most likely explains a cell’s gene expression profile. SCARLET integrates bootstrap-based gene selection with a mutual-information ap- proach to derive transcriptional programs and quantify their similarity, and employs likelihood- based modeling to assign cells to one or multiple plausible identities. Unlike existing methods, SCARLET explicitly accounts for the limitations of scRNA-seq. Benchmarking confirmed its competitive or superior performance relative to state-of-the-art tools. Applied to large-scale datasets encompassing more than 5 million cells, SCARLET reduced redundant labels while preserving annotation concordance, demonstrating both scalability and robustness. Beyond methodological advances, I applied single-cell approaches to two biological contexts. In rice shoot apical meristems, snRNA-seq analysis revealed transcription factor families dy- namically regulating floral transition, while underscoring current obstacles in plant single-cell research, including incomplete transcriptome coverage and limited reference resources. In breast cancer, scCROP-seq analysis uncovered mechanisms of tumor heterogeneity and evolv- ability under estrogen deprivation, identifying both perturbation-specific and shared transcrip- tional programs that illuminate adaptive survival strategies. Overall, this work advances computational strategies for cell identity annotation, delivers new biological insights, and highlights the need for improved data quality, multi-omics integration, and broader application of single-cell methods across diverse organisms.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				Dipartimento di Bioscienze
			
	Corso di studio
	
				BIOLOGIA MOLECOLARE E CELLULARE
			
	Data di pubblicazione
	
				27-mar-2026
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				CHIARA, MATTEO
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				RICAGNO, STEFANO
			
	Nome Editore
	
				Università degli Studi di Milano
			
	Numero di pagine
	
				180
			
	Collezione di appartenenza
	
				Università degli Studi di Milano

File in questo prodotto:

File	Dimensione	Formato
phd_unimi_R13894.pdf accesso aperto Licenza: Creative Commons Dimensione 185.96 MB Formato Adobe PDF Visualizza/Apri	185.96 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/362914

Il codice NBN di questa tesi è URN:NBN:IT:UNIMI-362914