An Optimization Perspective on Deep Neural Networks

Balboni, Dario

  Despite the impressive performance achieved by Deep Neural Networks in recent years and their widespread adoption by major companies worldwide, many aspects of their behavior remain poorly understood.  A significant body of theory exists for very wide and infinite models, particularly related to textit{Neural Tangent Kernels}, and numerous empirical studies aim to guide practitioners working with practically relevant models.  However, there is a concerning lack of actionable theoretical results for the types of models commonly deployed in practice.    This thesis positions itself in the gap between pure theory and practice, aiming to derive practical guidelines for practitioners based on a principled approach to neural networks grounded in optimization theory.  This approach leverages the recently rediscovered PL condition, a generalization of strong convexity suited to describing overparameterized models, and recognizes that certain classical results in convex optimization theory remain applicable to this new setting.  The end result of this thesis is an adaptive learning rate algorithm that requires minimal hyperparameter tuning, performing on par with grid-searched SGD while significantly reducing computational cost.

An Optimization Perspective on Deep Neural Networks

BALBONI, Dario

2025

Abstract

Despite the impressive performance achieved by Deep Neural Networks in recent years and their widespread adoption by major companies worldwide, many aspects of their behavior remain poorly understood. A significant body of theory exists for very wide and infinite models, particularly related to textit{Neural Tangent Kernels}, and numerous empirical studies aim to guide practitioners working with practically relevant models. However, there is a concerning lack of actionable theoretical results for the types of models commonly deployed in practice. This thesis positions itself in the gap between pure theory and practice, aiming to derive practical guidelines for practitioners based on a principled approach to neural networks grounded in optimization theory. This approach leverages the recently rediscovered PL condition, a generalization of strong convexity suited to describing overparameterized models, and recognizes that certain classical results in convex optimization theory remain applicable to this new setting. The end result of this thesis is an adaptive learning rate algorithm that requires minimal hyperparameter tuning, performing on par with grid-searched SGD while significantly reducing computational cost.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				Matematica e Informatica
			
	Data di pubblicazione
	
				30-gen-2025
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				BACCIU, Davide
			
	Nome Editore
	
				Scuola Normale Superiore
			
	Referee
	
				Esperti anonimi
			
	Collezione di appartenenza
	
				Scuola Normale Superiore

File in questo prodotto:

File	Dimensione	Formato
Tesi.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 1.8 MB Formato Adobe PDF Visualizza/Apri	1.8 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/304280

Il codice NBN di questa tesi è URN:NBN:IT:SNS-304280