ASPECTS OF DATA STRUCTURE IN MACHINE LEARNING

Erba, Vittorio

It is widely believed that understanding data structure is a crucial ingredient to push forward our comprehension on how (and why) modern machine learning works. Still, most of the theoretical results we have are obtained under very simplifying assumptions on the structure of the training data. In this Thesis, I review some novel results on the problem of characterizing the geometric structure of datasets and the consequences that this structure has on learning algorithms. I also provide pedagogical introductions to manifold learning, random geometric graphs theory and supervised binary classification. I focus on three different aspects of the problem. First, I spend some time reviewing techniques to characterize the intrinsic dimensionality of datasets: this is the first "experimental" step towards proper theoretical modelling of data. Then, I focus on the problem of finding null models of data in high-dimension: does Euclidean structure survive when the dimensionality of data becomes larger and larger? Finally, I study how geometric data structure alters the expressive potential of simple classifiers.

ASPECTS OF DATA STRUCTURE IN MACHINE LEARNING

ERBA, VITTORIO

2021

Abstract

It is widely believed that understanding data structure is a crucial ingredient to push forward our comprehension on how (and why) modern machine learning works. Still, most of the theoretical results we have are obtained under very simplifying assumptions on the structure of the training data. In this Thesis, I review some novel results on the problem of characterizing the geometric structure of datasets and the consequences that this structure has on learning algorithms. I also provide pedagogical introductions to manifold learning, random geometric graphs theory and supervised binary classification. I focus on three different aspects of the problem. First, I spend some time reviewing techniques to characterize the intrinsic dimensionality of datasets: this is the first "experimental" step towards proper theoretical modelling of data. Then, I focus on the problem of finding null models of data in high-dimension: does Euclidean structure survive when the dimensionality of data becomes larger and larger? Finally, I study how geometric data structure alters the expressive potential of simple classifiers.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				Dipartimento di Fisica Aldo Pontremoli
			
	Corso di studio
	
				FISICA, ASTROFISICA E FISICA APPLICATA
			
	Data di pubblicazione
	
				21-ott-2021
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				CARACCIOLO, SERGIO
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				PARIS, MATTEO
			
	Nome Editore
	
				Università degli Studi di Milano
			
	Collezione di appartenenza
	
				Università degli Studi di Milano

File in questo prodotto:

File	Dimensione	Formato
phd_unimi_R12359.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 6.16 MB Formato Adobe PDF Visualizza/Apri	6.16 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/170159

Il codice NBN di questa tesi è URN:NBN:IT:UNIMI-170159