It is widely believed that understanding data structure is a crucial ingredient to push forward our comprehension on how (and why) modern machine learning works. Still, most of the theoretical results we have are obtained under very simplifying assumptions on the structure of the training data. In this Thesis, I review some novel results on the problem of characterizing the geometric structure of datasets and the consequences that this structure has on learning algorithms. I also provide pedagogical introductions to manifold learning, random geometric graphs theory and supervised binary classification. I focus on three different aspects of the problem. First, I spend some time reviewing techniques to characterize the intrinsic dimensionality of datasets: this is the first "experimental" step towards proper theoretical modelling of data. Then, I focus on the problem of finding null models of data in high-dimension: does Euclidean structure survive when the dimensionality of data becomes larger and larger? Finally, I study how geometric data structure alters the expressive potential of simple classifiers.
ASPECTS OF DATA STRUCTURE IN MACHINE LEARNING
ERBA, VITTORIO
2021
Abstract
It is widely believed that understanding data structure is a crucial ingredient to push forward our comprehension on how (and why) modern machine learning works. Still, most of the theoretical results we have are obtained under very simplifying assumptions on the structure of the training data. In this Thesis, I review some novel results on the problem of characterizing the geometric structure of datasets and the consequences that this structure has on learning algorithms. I also provide pedagogical introductions to manifold learning, random geometric graphs theory and supervised binary classification. I focus on three different aspects of the problem. First, I spend some time reviewing techniques to characterize the intrinsic dimensionality of datasets: this is the first "experimental" step towards proper theoretical modelling of data. Then, I focus on the problem of finding null models of data in high-dimension: does Euclidean structure survive when the dimensionality of data becomes larger and larger? Finally, I study how geometric data structure alters the expressive potential of simple classifiers.File | Dimensione | Formato | |
---|---|---|---|
phd_unimi_R12359.pdf
accesso aperto
Dimensione
6.16 MB
Formato
Adobe PDF
|
6.16 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/170159
URN:NBN:IT:UNIMI-170159