In this doctoral thesis I employ the lens of Statistical Physics to study a number of non-convex models of Neural Networks. I start by using the replica method to investigate the loss landscape of a prototypical neural network model, the Negative Perceptron, and use the tool of Linear Mode Connectivity to describe the connection of different types of solutions. I show that the geometry of such solutions can be described as star-shaped, and numerically verify that such connectivity properties hold for solutions found by algorithms. In the same model, and for the Tree-Committee Machine, I study the critical capacity under the full-RSB ansatz, and settle a long standing open problem about the numerical value of such threshold. Comparing it to simulations with Gradient Descent, I observe an algorithmic gap: for some values of the constraint density solutions exists but are not found by the algorithm. I also introduce a transition line that separates a phase where typical states exhibit an Overlap Gap from a phase where no such gap exists, and discuss potential algorithmic implications. Going back to the connectivity properties of the Negative Perceptron, I use the fRSB framework to characterize the disconnection transition. Finally I go beyond the storage setting and study a Spiked Random Feature Model, where a low rank correction to the random feature matrix can be learned, in the teacher-student scenario. I observe a detection phenomenon where a minimum amount of data is needed for the student to align its spike with that of the teacher, and compare it to numerical simulations with real datasets.

Statistical Physics Methods for Non-Convex Neural Network Models

ANNESI, BRANDON LIVIO
2025

Abstract

In this doctoral thesis I employ the lens of Statistical Physics to study a number of non-convex models of Neural Networks. I start by using the replica method to investigate the loss landscape of a prototypical neural network model, the Negative Perceptron, and use the tool of Linear Mode Connectivity to describe the connection of different types of solutions. I show that the geometry of such solutions can be described as star-shaped, and numerically verify that such connectivity properties hold for solutions found by algorithms. In the same model, and for the Tree-Committee Machine, I study the critical capacity under the full-RSB ansatz, and settle a long standing open problem about the numerical value of such threshold. Comparing it to simulations with Gradient Descent, I observe an algorithmic gap: for some values of the constraint density solutions exists but are not found by the algorithm. I also introduce a transition line that separates a phase where typical states exhibit an Overlap Gap from a phase where no such gap exists, and discuss potential algorithmic implications. Going back to the connectivity properties of the Negative Perceptron, I use the fRSB framework to characterize the disconnection transition. Finally I go beyond the storage setting and study a Spiked Random Feature Model, where a low rank correction to the random feature matrix can be learned, in the teacher-student scenario. I observe a detection phenomenon where a minimum amount of data is needed for the student to align its spike with that of the teacher, and compare it to numerical simulations with real datasets.
23-giu-2025
Inglese
ZECCHINA, RICCARDO
LUCIBELLO, CARLO
Università Bocconi
File in questo prodotto:
File Dimensione Formato  
ANNESI Thesis.pdf

accesso aperto

Dimensione 4.84 MB
Formato Adobe PDF
4.84 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/213812
Il codice NBN di questa tesi è URN:NBN:IT:UNIBOCCONI-213812