This thesis investigates probabilistic and deep learning methods for modeling biological systems across various scales, with a specific focus on cancer. The aim is to develop models that are both quantitatively rigorous and biologically meaningful. In the first part, I present a hierarchical Bayesian extension of MOBSTER, a model-based clustering approach for subclonal deconvolution. By introducing a probabilistic hierarchical structure, this method accounts for observational noise, for shared uncertainty across karyotypes, and allows inference on a substantially bigger proportion of the genome. I then apply it to a large cohort of whole-genome sequencing data, where I analyze clonal dynamics, mutation rates, and driver selection patterns. The second part deals with single-cell and multi-omic data integration, as well as the study of cancer cell plasticity and resistance to therapy. I present MIDAA, a deep archetypal analysis method that provides interpretable latent representations and show its performance on two different hematopoiesis multiomics datasets. Building on this, I present a computational analysis of lentivirally barcoded patient-derived organoids to show that epigenetic heritability plays a key role in drug resistance. This leads to a unified view where stable epigenetic differences drive diverse cell states and adaptive responses to treatment. The final part focuses on spatial and temporal modeling. I develop the Mixture of Neural Cellular Automata as a stochastic framework for simulating tissue growth and image morphogenesis. I then show some preliminary results on designing an agent-based deep learning model to capture cell fate decisions from spatial transcriptomics data. Overall, this work shows how combining probabilistic and deep learning approaches can provide new ways to study complex biological processes, from cancer evolution to tissue organization, and offers general frameworks that can be applied to a broad range of problems in computational biology.

PROBABILISTIC AND DEEP LEARNING APPROACHES TO MODELING BIOLOGICAL SYSTEMS

MILITE, SALVATORE
2025

Abstract

This thesis investigates probabilistic and deep learning methods for modeling biological systems across various scales, with a specific focus on cancer. The aim is to develop models that are both quantitatively rigorous and biologically meaningful. In the first part, I present a hierarchical Bayesian extension of MOBSTER, a model-based clustering approach for subclonal deconvolution. By introducing a probabilistic hierarchical structure, this method accounts for observational noise, for shared uncertainty across karyotypes, and allows inference on a substantially bigger proportion of the genome. I then apply it to a large cohort of whole-genome sequencing data, where I analyze clonal dynamics, mutation rates, and driver selection patterns. The second part deals with single-cell and multi-omic data integration, as well as the study of cancer cell plasticity and resistance to therapy. I present MIDAA, a deep archetypal analysis method that provides interpretable latent representations and show its performance on two different hematopoiesis multiomics datasets. Building on this, I present a computational analysis of lentivirally barcoded patient-derived organoids to show that epigenetic heritability plays a key role in drug resistance. This leads to a unified view where stable epigenetic differences drive diverse cell states and adaptive responses to treatment. The final part focuses on spatial and temporal modeling. I develop the Mixture of Neural Cellular Automata as a stochastic framework for simulating tissue growth and image morphogenesis. I then show some preliminary results on designing an agent-based deep learning model to capture cell fate decisions from spatial transcriptomics data. Overall, this work shows how combining probabilistic and deep learning approaches can provide new ways to study complex biological processes, from cancer evolution to tissue organization, and offers general frameworks that can be applied to a broad range of problems in computational biology.
16-dic-2025
Inglese
SOTTORIVA, ANDREA
Università degli Studi di Milano
158
File in questo prodotto:
File Dimensione Formato  
phd_unimi_R13490.pdf

embargo fino al 18/11/2026

Licenza: Creative Commons
Dimensione 63.38 MB
Formato Adobe PDF
63.38 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/353687
Il codice NBN di questa tesi è URN:NBN:IT:UNIMI-353687