Statistical Physics Methods for Non-Convex Neural Network Models

Annesi, Brandon Livio

In this doctoral thesis I employ the lens of Statistical Physics to study a number of non-convex models of Neural Networks. I start by using the replica method to investigate the loss landscape of a prototypical neural network model, the Negative Perceptron, and use the tool of Linear Mode Connectivity to describe the connection of different types of solutions. I show that the geometry of such solutions can be described as star-shaped, and numerically verify that such connectivity properties hold for solutions found by algorithms. In the same model, and for the Tree-Committee Machine, I study the critical capacity under the full-RSB ansatz, and settle a long standing open problem about the numerical value of such threshold. Comparing it to simulations with Gradient Descent, I observe an algorithmic gap: for some values of the constraint density solutions exists but are not found by the algorithm. I also introduce a transition line that separates a phase where typical states exhibit an Overlap Gap from a phase where no such gap exists, and discuss potential algorithmic implications. Going back to the connectivity properties of the Negative Perceptron, I use the fRSB framework to characterize the disconnection transition. Finally I go beyond the storage setting and study a Spiked Random Feature Model, where a low rank correction to the random feature matrix can be learned, in the teacher-student scenario. I observe a detection phenomenon where a minimum amount of data is needed for the student to align its spike with that of the teacher, and compare it to numerical simulations with real datasets.

Statistical Physics Methods for Non-Convex Neural Network Models

ANNESI, BRANDON LIVIO

2025

Abstract

In this doctoral thesis I employ the lens of Statistical Physics to study a number of non-convex models of Neural Networks. I start by using the replica method to investigate the loss landscape of a prototypical neural network model, the Negative Perceptron, and use the tool of Linear Mode Connectivity to describe the connection of different types of solutions. I show that the geometry of such solutions can be described as star-shaped, and numerically verify that such connectivity properties hold for solutions found by algorithms. In the same model, and for the Tree-Committee Machine, I study the critical capacity under the full-RSB ansatz, and settle a long standing open problem about the numerical value of such threshold. Comparing it to simulations with Gradient Descent, I observe an algorithmic gap: for some values of the constraint density solutions exists but are not found by the algorithm. I also introduce a transition line that separates a phase where typical states exhibit an Overlap Gap from a phase where no such gap exists, and discuss potential algorithmic implications. Going back to the connectivity properties of the Negative Perceptron, I use the fRSB framework to characterize the disconnection transition. Finally I go beyond the storage setting and study a Spiked Random Feature Model, where a low rank correction to the random feature matrix can be learned, in the teacher-student scenario. I observe a detection phenomenon where a minimum amount of data is needed for the student to align its spike with that of the teacher, and compare it to numerical simulations with real datasets.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				STATISTICS AND COMPUTER SCIENCE
			
	Data di pubblicazione
	
				23-giu-2025
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				ZECCHINA, RICCARDO
LUCIBELLO, CARLO
			
	Nome Editore
	
				Università Bocconi
			
	Collezione di appartenenza
	
				Università Commerciale Luigi Bocconi di Milano

File in questo prodotto:

File	Dimensione	Formato
ANNESI Thesis.pdf accesso aperto Dimensione 4.84 MB Formato Adobe PDF Visualizza/Apri	4.84 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/213812

Il codice NBN di questa tesi è URN:NBN:IT:UNIBOCCONI-213812