During my PhD, I focused on the development and application of bioinformatics methods for the analysis of data produced by single-cell technologies, with a particular focus on single-cell RNA sequencing (scRNA-seq). A critical challenge in this field is the automatic annotation of cell identity, where no standardized strategy currently exists due to inherent limitations such as data sparsity, low transcript detection, and dependence on heterogeneous reference datasets. To address this gap, I developed SCARLET, a novel probabilistic framework that defines cel- lular identity as the transcriptional program that most likely explains a cell’s gene expression profile. SCARLET integrates bootstrap-based gene selection with a mutual-information ap- proach to derive transcriptional programs and quantify their similarity, and employs likelihood- based modeling to assign cells to one or multiple plausible identities. Unlike existing methods, SCARLET explicitly accounts for the limitations of scRNA-seq. Benchmarking confirmed its competitive or superior performance relative to state-of-the-art tools. Applied to large-scale datasets encompassing more than 5 million cells, SCARLET reduced redundant labels while preserving annotation concordance, demonstrating both scalability and robustness. Beyond methodological advances, I applied single-cell approaches to two biological contexts. In rice shoot apical meristems, snRNA-seq analysis revealed transcription factor families dy- namically regulating floral transition, while underscoring current obstacles in plant single-cell research, including incomplete transcriptome coverage and limited reference resources. In breast cancer, scCROP-seq analysis uncovered mechanisms of tumor heterogeneity and evolv- ability under estrogen deprivation, identifying both perturbation-specific and shared transcrip- tional programs that illuminate adaptive survival strategies. Overall, this work advances computational strategies for cell identity annotation, delivers new biological insights, and highlights the need for improved data quality, multi-omics integration, and broader application of single-cell methods across diverse organisms.
DEVELOPMENT AND APPLICATION OF COMPUTATIONAL APPROACHES FOR SINGLE CELL RNA SEQUENCING DATA ANALYSIS
TRAVERSA, DANIELE
2026
Abstract
During my PhD, I focused on the development and application of bioinformatics methods for the analysis of data produced by single-cell technologies, with a particular focus on single-cell RNA sequencing (scRNA-seq). A critical challenge in this field is the automatic annotation of cell identity, where no standardized strategy currently exists due to inherent limitations such as data sparsity, low transcript detection, and dependence on heterogeneous reference datasets. To address this gap, I developed SCARLET, a novel probabilistic framework that defines cel- lular identity as the transcriptional program that most likely explains a cell’s gene expression profile. SCARLET integrates bootstrap-based gene selection with a mutual-information ap- proach to derive transcriptional programs and quantify their similarity, and employs likelihood- based modeling to assign cells to one or multiple plausible identities. Unlike existing methods, SCARLET explicitly accounts for the limitations of scRNA-seq. Benchmarking confirmed its competitive or superior performance relative to state-of-the-art tools. Applied to large-scale datasets encompassing more than 5 million cells, SCARLET reduced redundant labels while preserving annotation concordance, demonstrating both scalability and robustness. Beyond methodological advances, I applied single-cell approaches to two biological contexts. In rice shoot apical meristems, snRNA-seq analysis revealed transcription factor families dy- namically regulating floral transition, while underscoring current obstacles in plant single-cell research, including incomplete transcriptome coverage and limited reference resources. In breast cancer, scCROP-seq analysis uncovered mechanisms of tumor heterogeneity and evolv- ability under estrogen deprivation, identifying both perturbation-specific and shared transcrip- tional programs that illuminate adaptive survival strategies. Overall, this work advances computational strategies for cell identity annotation, delivers new biological insights, and highlights the need for improved data quality, multi-omics integration, and broader application of single-cell methods across diverse organisms.| File | Dimensione | Formato | |
|---|---|---|---|
|
phd_unimi_R13894.pdf
accesso aperto
Licenza:
Creative Commons
Dimensione
185.96 MB
Formato
Adobe PDF
|
185.96 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/362914
URN:NBN:IT:UNIMI-362914