Topological Data Analysis (TDA) is proving to be an excellent tool for shape analysis of digital data. The recently found synergy with artificial intelligence gave rise to Topological Machine Learning (TML), which aims to combine the expressive power of computational topology with the accuracy of machine learning to provide a comprehensive and automatic framework for data classification. The aim of this thesis is twofold: to develop current applications of TML in practical scenarios, with emphasis on the most overlooked aspects of its pipeline, and to connect the theory of TDA with a broader class of maps, the Group Equivariant Non-Expansive Operators (GENEOs). In the first part of this dissertation, we develop a pipeline to study digital data by means of TML in order to validate the practical aspects of our theory. We apply this pipeline to benchmark and experimental datasets, achieving state-of-the-art accuracies in biomedical scenarios. Moreover, we perform an empirical but extensive study of the stability of features arising from the various homological dimensions with respect to noise and points distribution in the persistence diagram. Such a comparison is novel in the TML literature and our findings show that results coming from the concatenation of each homological dimension available are the best approach in the vectorisation step. We later expand on the main concept of TDA, proving that the functor that computes persistence diagrams can be seen as a particular instance of GENEOs. The GENEO framework allows us to inject arbitrary equivariances in a machine learning setting and represents a new possible approach to neural network architecture. Next, we fully present the theory of GENEOs and their properties, such as convexity and concavity, under suitable assumptions. This thesis expand the GENEO theory with two new tools to define such operators, namely using symmetric functions and a characterization theorem of linear GENEOs between arbitrary functional spaces. Finally, we develop a new neural network architecture with GENEOs instead of neurons and show its potential in a couple of applications.

A bridge between persistent homology and group equivariant non-expansive operators: theory and applications

CONTI, FRANCESCO
2024

Abstract

Topological Data Analysis (TDA) is proving to be an excellent tool for shape analysis of digital data. The recently found synergy with artificial intelligence gave rise to Topological Machine Learning (TML), which aims to combine the expressive power of computational topology with the accuracy of machine learning to provide a comprehensive and automatic framework for data classification. The aim of this thesis is twofold: to develop current applications of TML in practical scenarios, with emphasis on the most overlooked aspects of its pipeline, and to connect the theory of TDA with a broader class of maps, the Group Equivariant Non-Expansive Operators (GENEOs). In the first part of this dissertation, we develop a pipeline to study digital data by means of TML in order to validate the practical aspects of our theory. We apply this pipeline to benchmark and experimental datasets, achieving state-of-the-art accuracies in biomedical scenarios. Moreover, we perform an empirical but extensive study of the stability of features arising from the various homological dimensions with respect to noise and points distribution in the persistence diagram. Such a comparison is novel in the TML literature and our findings show that results coming from the concatenation of each homological dimension available are the best approach in the vectorisation step. We later expand on the main concept of TDA, proving that the functor that computes persistence diagrams can be seen as a particular instance of GENEOs. The GENEO framework allows us to inject arbitrary equivariances in a machine learning setting and represents a new possible approach to neural network architecture. Next, we fully present the theory of GENEOs and their properties, such as convexity and concavity, under suitable assumptions. This thesis expand the GENEO theory with two new tools to define such operators, namely using symmetric functions and a characterization theorem of linear GENEOs between arbitrary functional spaces. Finally, we develop a new neural network architecture with GENEOs instead of neurons and show its potential in a couple of applications.
2-lug-2024
Italiano
geneo
persistent homology
topological data analysis
Moroni, Davide
Frosini, Patrizio
Pascali, Maria Antonietta
File in questo prodotto:
File Dimensione Formato  
Relazione_PhD.pdf

non disponibili

Dimensione 95.47 kB
Formato Adobe PDF
95.47 kB Adobe PDF
Sintesi_tesi.pdf

accesso aperto

Dimensione 23.19 kB
Formato Adobe PDF
23.19 kB Adobe PDF Visualizza/Apri
Tesi_dottorato.pdf

accesso aperto

Dimensione 13.21 MB
Formato Adobe PDF
13.21 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/216391
Il codice NBN di questa tesi è URN:NBN:IT:UNIPI-216391