The Stochastic Block Model (SBM) is a powerful probabilistic approach that reinterprets the task of topic modeling as a community detection problem in network theory, providing an effective method for uncovering latent structures in complex datasets. Its hierarchical extension enhances the ability to identify meaningful patterns by incorporating multiple levels of granularity. This model has been used in various domains, including cancer research, to disentangle intricate relationships within large-scale data. In this thesis, we propose this approach to explore complex biological datasets. Our primary focus is on understanding the genetic basis of brain organization by characterizing specific regions, functions, and neurological disorders. Additionally, we aim to identify conserved patterns across different individuals, uncovering shared genetic signatures that contribute to the structure and functionality of the brain. We use as a laboratory to test our algorithm a dataset obtained from six independent human brains from the Allen Human Brain Atlas. We show that the proposed method is indeed able to identify universal patterns outperforming traditional algorithms like Latent Dirichlet Allocation and Weighted Correlation Network Analysis. The probabilistic associations we uncover between genes and samples accurately reflect the established anatomical and functional organization of the brain. Moreover, leveraging the peculiar ”fuzzy” structure of the gene sets obtained with our method, we identify examples of transcriptional and post-transcriptional pathways associated with specific brain regions, highlighting the potential of our approach. As a final result, we also present an extension of our model for the integration of different omic layers within a multibranch framework, aimed at improving the classification of samples and providing insight into the intricate, multilayered molecular landscape of biological systems.
From the transcriptomic data analysis of the Allen Human Brain Atlas to multi-omic integration via Topic Modeling methods
PIZZINI, LETIZIA
2025
Abstract
The Stochastic Block Model (SBM) is a powerful probabilistic approach that reinterprets the task of topic modeling as a community detection problem in network theory, providing an effective method for uncovering latent structures in complex datasets. Its hierarchical extension enhances the ability to identify meaningful patterns by incorporating multiple levels of granularity. This model has been used in various domains, including cancer research, to disentangle intricate relationships within large-scale data. In this thesis, we propose this approach to explore complex biological datasets. Our primary focus is on understanding the genetic basis of brain organization by characterizing specific regions, functions, and neurological disorders. Additionally, we aim to identify conserved patterns across different individuals, uncovering shared genetic signatures that contribute to the structure and functionality of the brain. We use as a laboratory to test our algorithm a dataset obtained from six independent human brains from the Allen Human Brain Atlas. We show that the proposed method is indeed able to identify universal patterns outperforming traditional algorithms like Latent Dirichlet Allocation and Weighted Correlation Network Analysis. The probabilistic associations we uncover between genes and samples accurately reflect the established anatomical and functional organization of the brain. Moreover, leveraging the peculiar ”fuzzy” structure of the gene sets obtained with our method, we identify examples of transcriptional and post-transcriptional pathways associated with specific brain regions, highlighting the potential of our approach. As a final result, we also present an extension of our model for the integration of different omic layers within a multibranch framework, aimed at improving the classification of samples and providing insight into the intricate, multilayered molecular landscape of biological systems.File | Dimensione | Formato | |
---|---|---|---|
Pizzini_PhD_Thesis.pdf
accesso aperto
Dimensione
10.11 MB
Formato
Adobe PDF
|
10.11 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/298011
URN:NBN:IT:UNITO-298011