Development of computational strategies for bulk, single-cell and spatial transcriptomics data to outline tumors and their microenvironment

Pirrotta, Stefania

Understanding biological mechanisms and defining subtypes in cancer, as well as predicting prognosis and assessing therapy efficacy are crucial aspects of cancer research. One of the biggest challenges is mining the heterogeneity coming from the different cells. Defining the different cell activities inside a heterogeneous mass is a problem shared by all the analyses of gene expression, from bulk RNA sequencing to high resolution transcriptomic technologies, single-cell as well as spatial transcriptomics. The way to dissect all the cell activities in complex datasets of high resolution transcriptomics is a bioinformatic issue. In this framework, my PhD project was designed to develop computational tools to dissect gene expression in tumor samples. At the beginning of my PhD I was involved in a review work of spatial transcriptomics experiments in ovarian cancer. We reviewed and collected all the evidence that described spatially the ovarian cancer tumors (Masatti L et al. Translational Research 2024). Evaluating the collected papers with a special attention to the methodological part of all the studies, I learnt that the vast majority of work used spatial transcriptomics data to validate specific hypotheses looking for co-localization of the expression of one or few genes of interest with specific cell type or in a specific tumor location. However, a strategy to evaluate the tumor system as a heterogeneous system with the coexistence of multiple processes is generally missed. This heterogeneity derives not only to the cell types but also to the different cell activity, and furthermore, cell states, the collection of all the activities that a specific cell is performing. Early in my PhD, I began investigating cell activity in ovarian cancer using gene expression signature. Expression signatures are biomarkers derived by the expression of multiple genes, summarized with different methods, that can be either continuous scores or categorical classifiers. Realizing that many publicly available signatures lack computational implementations for efficient use, I developed signifinder, an R package that compiles and implements cancer-related gene expression signatures from the literature (Pirrotta S et al. NAR-GAB 2024). Signifinder became a pan-cancer resource and is now part of the Bioconductor community since the 3.16 release. In this thesis I present signifinder and its many applications to different expression datasets including bulk, single-cell, and spatial transcriptomics. During my EMBO fellow at CNAG, Barcelona, I applied signifinder to diverse cancer datasets, confirming the consistency of signature scores across different data types and technologies. Additionally, the inclusion of spatial information in spatial transcriptomics data provided deeper insights into the biological processes reliant on tissue organization. In collaboration with Prof. Risso of the University of Padova, we applied the SpaRTaCo co-clustering model to the signature results generated by signifinder in prostate cancer tissue samples, revealing distinct spatial patterns of biological process expression (Pirrotta S et al. CIBB 2023). This approach enhanced our understanding of tumor heterogeneity. To further explore cancer cell states, I integrated cell state signatures into signifinder, allowing for more detailed exploration of tumor stroma using high-resolution transcriptomic data. At the end of my PhD, thanks to the collaboration with Prof. Bonora of the University of Ferrara, I also started the development of mitology, an R package that enables detailed analyses of mitochondrial processes by categorizing mitochondrial-related genes from databases like MitoCarta, IMPI, and MSeqDR, alongside pathways from Gene Ontology and Reactome. This tool dissects mitochondrial processes at varying levels of specificity would enrich our understanding of this organelle, even in a context of cancer, also by re-analyzing public data.

Development of computational strategies for bulk, single-cell and spatial transcriptomics data to outline tumors and their microenvironment

PIRROTTA, STEFANIA

2024

Abstract

Understanding biological mechanisms and defining subtypes in cancer, as well as predicting prognosis and assessing therapy efficacy are crucial aspects of cancer research. One of the biggest challenges is mining the heterogeneity coming from the different cells. Defining the different cell activities inside a heterogeneous mass is a problem shared by all the analyses of gene expression, from bulk RNA sequencing to high resolution transcriptomic technologies, single-cell as well as spatial transcriptomics. The way to dissect all the cell activities in complex datasets of high resolution transcriptomics is a bioinformatic issue. In this framework, my PhD project was designed to develop computational tools to dissect gene expression in tumor samples. At the beginning of my PhD I was involved in a review work of spatial transcriptomics experiments in ovarian cancer. We reviewed and collected all the evidence that described spatially the ovarian cancer tumors (Masatti L et al. Translational Research 2024). Evaluating the collected papers with a special attention to the methodological part of all the studies, I learnt that the vast majority of work used spatial transcriptomics data to validate specific hypotheses looking for co-localization of the expression of one or few genes of interest with specific cell type or in a specific tumor location. However, a strategy to evaluate the tumor system as a heterogeneous system with the coexistence of multiple processes is generally missed. This heterogeneity derives not only to the cell types but also to the different cell activity, and furthermore, cell states, the collection of all the activities that a specific cell is performing. Early in my PhD, I began investigating cell activity in ovarian cancer using gene expression signature. Expression signatures are biomarkers derived by the expression of multiple genes, summarized with different methods, that can be either continuous scores or categorical classifiers. Realizing that many publicly available signatures lack computational implementations for efficient use, I developed signifinder, an R package that compiles and implements cancer-related gene expression signatures from the literature (Pirrotta S et al. NAR-GAB 2024). Signifinder became a pan-cancer resource and is now part of the Bioconductor community since the 3.16 release. In this thesis I present signifinder and its many applications to different expression datasets including bulk, single-cell, and spatial transcriptomics. During my EMBO fellow at CNAG, Barcelona, I applied signifinder to diverse cancer datasets, confirming the consistency of signature scores across different data types and technologies. Additionally, the inclusion of spatial information in spatial transcriptomics data provided deeper insights into the biological processes reliant on tissue organization. In collaboration with Prof. Risso of the University of Padova, we applied the SpaRTaCo co-clustering model to the signature results generated by signifinder in prostate cancer tissue samples, revealing distinct spatial patterns of biological process expression (Pirrotta S et al. CIBB 2023). This approach enhanced our understanding of tumor heterogeneity. To further explore cancer cell states, I integrated cell state signatures into signifinder, allowing for more detailed exploration of tumor stroma using high-resolution transcriptomic data. At the end of my PhD, thanks to the collaboration with Prof. Bonora of the University of Ferrara, I also started the development of mitology, an R package that enables detailed analyses of mitochondrial processes by categorizing mitochondrial-related genes from databases like MitoCarta, IMPI, and MSeqDR, alongside pathways from Gene Ontology and Reactome. This tool dissects mitochondrial processes at varying levels of specificity would enrich our understanding of this organelle, even in a context of cancer, also by re-analyzing public data.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				BIOSCIENZE
			
	Data di pubblicazione
	
				13-dic-2024
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				CALURA, ENRICA
			
	Nome Editore
	
				Università degli studi di Padova
			
	Collezione di appartenenza
	
				Università degli Studi di Padova

File in questo prodotto:

File	Dimensione	Formato
tesi_definitiva_Stefania_Pirrotta.pdf accesso aperto Dimensione 38.02 MB Formato Adobe PDF Visualizza/Apri	38.02 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/218714

Il codice NBN di questa tesi è URN:NBN:IT:UNIPD-218714