Interpretable statistics for complex modelling: quantile and topological learning

Padellini, Tullia

As the complexity of our data increased exponentially in the last decades, so has our need for interpretable features. This thesis revolves around two paradigms to approach this quest for insights. In the first part we focus on parametric models, where the problem of interpretability can be seen as a “parametrization selection”. We introduce a quantile-centric parametrization and we show the advantages of our proposal in the context of regression, where it allows to bridge the gap between classical generalized linear (mixed) models and increasingly popular quantile methods. The second part of the thesis, concerned with topological learning, tackles the problem from a non-parametric perspective. As topology can be thought of as a way of characterizing data in terms of their connectivity structure, it allows to represent complex and possibly high dimensional through few features, such as the number of connected components, loops and voids. We illustrate how the emerging branch of statistics devoted to recovering topological structures in the data, Topological Data Analysis, can be exploited both for exploratory and inferential purposes with a special emphasis on kernels that preserve the topological information in the data. Finally, we show with an application how these two approaches can borrow strength from one another in the identification and description of brain activity through fMRI data from the ABIDE project.

Interpretable statistics for complex modelling: quantile and topological learning

PADELLINI, TULLIA

2019

Abstract

As the complexity of our data increased exponentially in the last decades, so has our need for interpretable features. This thesis revolves around two paradigms to approach this quest for insights. In the first part we focus on parametric models, where the problem of interpretability can be seen as a “parametrization selection”. We introduce a quantile-centric parametrization and we show the advantages of our proposal in the context of regression, where it allows to bridge the gap between classical generalized linear (mixed) models and increasingly popular quantile methods. The second part of the thesis, concerned with topological learning, tackles the problem from a non-parametric perspective. As topology can be thought of as a way of characterizing data in terms of their connectivity structure, it allows to represent complex and possibly high dimensional through few features, such as the number of connected components, loops and voids. We illustrate how the emerging branch of statistics devoted to recovering topological structures in the data, Topological Data Analysis, can be exploited both for exploratory and inferential purposes with a special emphasis on kernels that preserve the topological information in the data. Finally, we show with an application how these two approaches can borrow strength from one another in the identification and description of brain activity through fMRI data from the ABIDE project.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				Scuola di scienze statistiche
			
	Data di pubblicazione
	
				22-feb-2019
			
	Lingua
	
				Inglese
			
	Parola chiave
	
				Quantile regression; topological data analysis; Kernel methods; multiscale statistics
			
	Relatore, Supervisor, Advisor o Tutor
	
				BRUTTI, Pierpaolo
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				CONTI, Pier Luigi
			
	Nome Editore
	
				Università degli Studi di Roma "La Sapienza"
			
	Collezione di appartenenza
	
				Università degli Studi di Roma La Sapienza

File in questo prodotto:

File	Dimensione	Formato
Tesi_dottorato_Padellini.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 18.19 MB Formato Adobe PDF Visualizza/Apri	18.19 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/180541

Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-180541