Understanding the genetic and molecular basis of complex human diseases requires analytical approaches capable of integrating large-scale and heterogeneous datasets. Advances in whole-genome sequencing (WGS) and single-cell transcriptomics have greatly expanded opportunities for discovery, but also introduced challenges related to scalability, interpretability, and the incorporation of prior biological knowledge. The aim of this Ph.D. thesis is the development of computational frameworks that integrate omics data with prior knowledge to improve the study of complex traits. This objective is addressed through three main contributions. The first contribution introduces a scalable framework for single nucleotide polymorphism (SNP)-set analysis (NEBULA). By modeling SNP–SNP interactions and jointly testing groups of variants, NEBULA leverages prior biological structure to increase statistical power and interpretability. Simulations and applications to WGS data highlight its robustness and its ability to identify disease-relevant gene sets enriched in specific brain regions. The second contribution explores transformer-based deep learning for single-cell RNA sequencing (scRNA-seq) data (scTransformer). As a preliminary investigation, this framework captures transcriptional dependencies through self-attention mechanisms and achieves competitive performance in cell-type classification. Its application to Parkinson’s disease datasets illustrates the potential of transformer models for transferable single-cell representations and outlines directions for future work. The third contribution presents a framework for multi-omics analysis that integrates genetic variation with transcriptomic data (MAST). By extending SNP-set testing to molecular phenotypes, MAST links genetic risk to regulatory processes and uncovers pathways and functional modules implicated in neurodegenerative disease. In conclusion, this thesis contributes integrative approaches that combine omics data with prior knowledge, advancing both methodological development and biological insight. The proposed frameworks enhance statistical inference, improve data interpretability, and provide new perspectives on the genetic and molecular architecture of complex diseases.
Integrating omics data and prior knowledge to explore complex systems
MILIA, MIKELE
2026
Abstract
Understanding the genetic and molecular basis of complex human diseases requires analytical approaches capable of integrating large-scale and heterogeneous datasets. Advances in whole-genome sequencing (WGS) and single-cell transcriptomics have greatly expanded opportunities for discovery, but also introduced challenges related to scalability, interpretability, and the incorporation of prior biological knowledge. The aim of this Ph.D. thesis is the development of computational frameworks that integrate omics data with prior knowledge to improve the study of complex traits. This objective is addressed through three main contributions. The first contribution introduces a scalable framework for single nucleotide polymorphism (SNP)-set analysis (NEBULA). By modeling SNP–SNP interactions and jointly testing groups of variants, NEBULA leverages prior biological structure to increase statistical power and interpretability. Simulations and applications to WGS data highlight its robustness and its ability to identify disease-relevant gene sets enriched in specific brain regions. The second contribution explores transformer-based deep learning for single-cell RNA sequencing (scRNA-seq) data (scTransformer). As a preliminary investigation, this framework captures transcriptional dependencies through self-attention mechanisms and achieves competitive performance in cell-type classification. Its application to Parkinson’s disease datasets illustrates the potential of transformer models for transferable single-cell representations and outlines directions for future work. The third contribution presents a framework for multi-omics analysis that integrates genetic variation with transcriptomic data (MAST). By extending SNP-set testing to molecular phenotypes, MAST links genetic risk to regulatory processes and uncovers pathways and functional modules implicated in neurodegenerative disease. In conclusion, this thesis contributes integrative approaches that combine omics data with prior knowledge, advancing both methodological development and biological insight. The proposed frameworks enhance statistical inference, improve data interpretability, and provide new perspectives on the genetic and molecular architecture of complex diseases.| File | Dimensione | Formato | |
|---|---|---|---|
|
tesi_Mikele_Milia.pdf
embargo fino al 11/02/2029
Licenza:
Tutti i diritti riservati
Dimensione
35.48 MB
Formato
Adobe PDF
|
35.48 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/359534
URN:NBN:IT:UNIPD-359534