While every cell in our body shares the genetic blueprint encoded in DNA, the human body contains a variety of cell types, each with distinct physiology and function. This diversity arises from regulatory mechanisms that control gene expression and eventually define cell identity. Understanding how these mechanisms operate and evolve over time is a central question in biology, and remains challenging due to the complexity and dynamic nature of gene regulation. Single-cell RNA sequencing (scRNA-seq) technologies allow the measurement of gene expression at the resolution of individual cells, providing novel insights into cellular heterogeneity and cell state transitions. However, these technologies are inherently destructive, capturing only static snapshots of gene expression, which limits direct observation of temporal cellular dynamics. This thesis introduces deep learning frameworks that integrate biophysical modeling with single-cell data to bridge this gap and discover underlying dynamic processes. The first contribution, NeuroVelo, integrates spliced and unspliced RNA counts into a neural ordinary differential equation (Neural ODE) framework constrained by RNA velocity principles. NeuroVelo recovers latent cellular trajectories while simultaneously enabling the inference of time-varying gene regulatory networks. Benchmarking shows that NeuroVelo performs competitively with state-of-the-art RNA velocity approaches while providing mechanistic interpretability through regulatory network inference. The second contribution, Tango, extends splicing kinetics models by explicitly incorporating transcription factor and RNA-binding protein activities into a biophysics-informed variational autoencoder. Applied to human pluripotent stem cell differentiation data, Tango identifies lineage-defining regulators such as SOX17, EOMES, and GATA6, and highlights dynamic modules of transcriptional and post-transcriptional regulation. Moreover, Tango reveals coupling between developmental pseudotime and cell cycle state, suggesting novel mechanisms of lineage commitment. Together, these frameworks demonstrate how combining neural networks with mechanistic constraints enables the reconstruction of dynamic cellular processes from static single-cell data. They provide both predictive power and biological interpretability, bridging machine learning flexibility with biophysical insight, and facilitating the way for future models that integrate additional molecular modalities and spatial information.

From Snapshots to Dynamics: Deep Learning Integration for Uncovering Cellular Processes in Single-Cell Data

KOUADRI BOUDJELTHIA, IDRIS
2025

Abstract

While every cell in our body shares the genetic blueprint encoded in DNA, the human body contains a variety of cell types, each with distinct physiology and function. This diversity arises from regulatory mechanisms that control gene expression and eventually define cell identity. Understanding how these mechanisms operate and evolve over time is a central question in biology, and remains challenging due to the complexity and dynamic nature of gene regulation. Single-cell RNA sequencing (scRNA-seq) technologies allow the measurement of gene expression at the resolution of individual cells, providing novel insights into cellular heterogeneity and cell state transitions. However, these technologies are inherently destructive, capturing only static snapshots of gene expression, which limits direct observation of temporal cellular dynamics. This thesis introduces deep learning frameworks that integrate biophysical modeling with single-cell data to bridge this gap and discover underlying dynamic processes. The first contribution, NeuroVelo, integrates spliced and unspliced RNA counts into a neural ordinary differential equation (Neural ODE) framework constrained by RNA velocity principles. NeuroVelo recovers latent cellular trajectories while simultaneously enabling the inference of time-varying gene regulatory networks. Benchmarking shows that NeuroVelo performs competitively with state-of-the-art RNA velocity approaches while providing mechanistic interpretability through regulatory network inference. The second contribution, Tango, extends splicing kinetics models by explicitly incorporating transcription factor and RNA-binding protein activities into a biophysics-informed variational autoencoder. Applied to human pluripotent stem cell differentiation data, Tango identifies lineage-defining regulators such as SOX17, EOMES, and GATA6, and highlights dynamic modules of transcriptional and post-transcriptional regulation. Moreover, Tango reveals coupling between developmental pseudotime and cell cycle state, suggesting novel mechanisms of lineage commitment. Together, these frameworks demonstrate how combining neural networks with mechanistic constraints enables the reconstruction of dynamic cellular processes from static single-cell data. They provide both predictive power and biological interpretability, bridging machine learning flexibility with biophysical insight, and facilitating the way for future models that integrate additional molecular modalities and spatial information.
25-set-2025
Inglese
Sanguinetti, Guido
SISSA
Trieste
File in questo prodotto:
File Dimensione Formato  
Idris_Phd_thesis_SISSA (7).pdf

embargo fino al 25/03/2026

Dimensione 25.75 MB
Formato Adobe PDF
25.75 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/295810
Il codice NBN di questa tesi è URN:NBN:IT:SISSA-295810