During my PhD, I focused on the development and application of bioinformatics methods for the analysis of data produced by single-cell technologies, with a particular focus on single-cell RNA sequencing (scRNA-seq). A critical challenge in this field is the automatic annotation of cell identity, where no standardized strategy currently exists due to inherent limitations such as data sparsity, low transcript detection, and dependence on heterogeneous reference datasets. To address this gap, I developed SCARLET, a novel probabilistic framework that defines cel- lular identity as the transcriptional program that most likely explains a cell’s gene expression profile. SCARLET integrates bootstrap-based gene selection with a mutual-information ap- proach to derive transcriptional programs and quantify their similarity, and employs likelihood- based modeling to assign cells to one or multiple plausible identities. Unlike existing methods, SCARLET explicitly accounts for the limitations of scRNA-seq. Benchmarking confirmed its competitive or superior performance relative to state-of-the-art tools. Applied to large-scale datasets encompassing more than 5 million cells, SCARLET reduced redundant labels while preserving annotation concordance, demonstrating both scalability and robustness. Beyond methodological advances, I applied single-cell approaches to two biological contexts. In rice shoot apical meristems, snRNA-seq analysis revealed transcription factor families dy- namically regulating floral transition, while underscoring current obstacles in plant single-cell research, including incomplete transcriptome coverage and limited reference resources. In breast cancer, scCROP-seq analysis uncovered mechanisms of tumor heterogeneity and evolv- ability under estrogen deprivation, identifying both perturbation-specific and shared transcrip- tional programs that illuminate adaptive survival strategies. Overall, this work advances computational strategies for cell identity annotation, delivers new biological insights, and highlights the need for improved data quality, multi-omics integration, and broader application of single-cell methods across diverse organisms.

DEVELOPMENT AND APPLICATION OF COMPUTATIONAL APPROACHES FOR SINGLE CELL RNA SEQUENCING DATA ANALYSIS

TRAVERSA, DANIELE
2026

Abstract

During my PhD, I focused on the development and application of bioinformatics methods for the analysis of data produced by single-cell technologies, with a particular focus on single-cell RNA sequencing (scRNA-seq). A critical challenge in this field is the automatic annotation of cell identity, where no standardized strategy currently exists due to inherent limitations such as data sparsity, low transcript detection, and dependence on heterogeneous reference datasets. To address this gap, I developed SCARLET, a novel probabilistic framework that defines cel- lular identity as the transcriptional program that most likely explains a cell’s gene expression profile. SCARLET integrates bootstrap-based gene selection with a mutual-information ap- proach to derive transcriptional programs and quantify their similarity, and employs likelihood- based modeling to assign cells to one or multiple plausible identities. Unlike existing methods, SCARLET explicitly accounts for the limitations of scRNA-seq. Benchmarking confirmed its competitive or superior performance relative to state-of-the-art tools. Applied to large-scale datasets encompassing more than 5 million cells, SCARLET reduced redundant labels while preserving annotation concordance, demonstrating both scalability and robustness. Beyond methodological advances, I applied single-cell approaches to two biological contexts. In rice shoot apical meristems, snRNA-seq analysis revealed transcription factor families dy- namically regulating floral transition, while underscoring current obstacles in plant single-cell research, including incomplete transcriptome coverage and limited reference resources. In breast cancer, scCROP-seq analysis uncovered mechanisms of tumor heterogeneity and evolv- ability under estrogen deprivation, identifying both perturbation-specific and shared transcrip- tional programs that illuminate adaptive survival strategies. Overall, this work advances computational strategies for cell identity annotation, delivers new biological insights, and highlights the need for improved data quality, multi-omics integration, and broader application of single-cell methods across diverse organisms.
27-mar-2026
Inglese
CHIARA, MATTEO
RICAGNO, STEFANO
Università degli Studi di Milano
180
File in questo prodotto:
File Dimensione Formato  
phd_unimi_R13894.pdf

accesso aperto

Licenza: Creative Commons
Dimensione 185.96 MB
Formato Adobe PDF
185.96 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/362914
Il codice NBN di questa tesi è URN:NBN:IT:UNIMI-362914