Despite the impressive performance achieved by Deep Neural Networks in recent years and their widespread adoption by major companies worldwide, many aspects of their behavior remain poorly understood. A significant body of theory exists for very wide and infinite models, particularly related to textit{Neural Tangent Kernels}, and numerous empirical studies aim to guide practitioners working with practically relevant models. However, there is a concerning lack of actionable theoretical results for the types of models commonly deployed in practice. This thesis positions itself in the gap between pure theory and practice, aiming to derive practical guidelines for practitioners based on a principled approach to neural networks grounded in optimization theory. This approach leverages the recently rediscovered PL condition, a generalization of strong convexity suited to describing overparameterized models, and recognizes that certain classical results in convex optimization theory remain applicable to this new setting. The end result of this thesis is an adaptive learning rate algorithm that requires minimal hyperparameter tuning, performing on par with grid-searched SGD while significantly reducing computational cost.
An Optimization Perspective on Deep Neural Networks
BALBONI, Dario
2025
Abstract
Despite the impressive performance achieved by Deep Neural Networks in recent years and their widespread adoption by major companies worldwide, many aspects of their behavior remain poorly understood. A significant body of theory exists for very wide and infinite models, particularly related to textit{Neural Tangent Kernels}, and numerous empirical studies aim to guide practitioners working with practically relevant models. However, there is a concerning lack of actionable theoretical results for the types of models commonly deployed in practice. This thesis positions itself in the gap between pure theory and practice, aiming to derive practical guidelines for practitioners based on a principled approach to neural networks grounded in optimization theory. This approach leverages the recently rediscovered PL condition, a generalization of strong convexity suited to describing overparameterized models, and recognizes that certain classical results in convex optimization theory remain applicable to this new setting. The end result of this thesis is an adaptive learning rate algorithm that requires minimal hyperparameter tuning, performing on par with grid-searched SGD while significantly reducing computational cost.| File | Dimensione | Formato | |
|---|---|---|---|
|
Tesi.pdf
accesso aperto
Licenza:
Tutti i diritti riservati
Dimensione
1.8 MB
Formato
Adobe PDF
|
1.8 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/304280
URN:NBN:IT:SNS-304280