Despite the impressive performance achieved by Deep Neural Networks in recent years and their widespread adoption by major companies worldwide, many aspects of their behavior remain poorly understood.  A significant body of theory exists for very wide and infinite models, particularly related to textit{Neural Tangent Kernels}, and numerous empirical studies aim to guide practitioners working with practically relevant models.  However, there is a concerning lack of actionable theoretical results for the types of models commonly deployed in practice.    This thesis positions itself in the gap between pure theory and practice, aiming to derive practical guidelines for practitioners based on a principled approach to neural networks grounded in optimization theory.  This approach leverages the recently rediscovered PL condition, a generalization of strong convexity suited to describing overparameterized models, and recognizes that certain classical results in convex optimization theory remain applicable to this new setting.  The end result of this thesis is an adaptive learning rate algorithm that requires minimal hyperparameter tuning, performing on par with grid-searched SGD while significantly reducing computational cost.

An Optimization Perspective on Deep Neural Networks

BALBONI, Dario
2025

Abstract

  Despite the impressive performance achieved by Deep Neural Networks in recent years and their widespread adoption by major companies worldwide, many aspects of their behavior remain poorly understood.  A significant body of theory exists for very wide and infinite models, particularly related to textit{Neural Tangent Kernels}, and numerous empirical studies aim to guide practitioners working with practically relevant models.  However, there is a concerning lack of actionable theoretical results for the types of models commonly deployed in practice.    This thesis positions itself in the gap between pure theory and practice, aiming to derive practical guidelines for practitioners based on a principled approach to neural networks grounded in optimization theory.  This approach leverages the recently rediscovered PL condition, a generalization of strong convexity suited to describing overparameterized models, and recognizes that certain classical results in convex optimization theory remain applicable to this new setting.  The end result of this thesis is an adaptive learning rate algorithm that requires minimal hyperparameter tuning, performing on par with grid-searched SGD while significantly reducing computational cost.
30-gen-2025
Inglese
BACCIU, Davide
Scuola Normale Superiore
Esperti anonimi
File in questo prodotto:
File Dimensione Formato  
Tesi.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 1.8 MB
Formato Adobe PDF
1.8 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/304280
Il codice NBN di questa tesi è URN:NBN:IT:SNS-304280