Solving large scale optimization problems, such as neural networks training, can present many challenges. Among others, the proliferation of useless stationary points, that is points where the objective function value is quite far from that of a global minimum, can pose a serious drawback for the optimization algorithm used, which is attracted by them and therefore results inefficient. In this dissertation, we propose two algorithmic schemes along the following lines. First, extending the result proposed in [1] for shallow networks, that is networks with only one hidden layer, we propose a mathematical characterization of a class of such stationary points that arise in deep multilayer neural networks training, that is networks with more than one hidden layer. Availing such a description, we are able to define an incremental training approach that avoids getting stuck in the region of attraction of these undesirable stationary points. Then, exploiting the main properties of the nonmonotone truncated Newton’s method proposed in [2], we attempt to grasp the benefits of using second-order information by giving a preliminary numerical evidence of the potential of following directions of negative curvature during neural networks training in order to guarantee the skill of the optimization algorithm in escaping regions where the objective function is nonconvex. References: [1] Fukumizu, K., & Amari, S. I. (2000). Local minima and plateaus in hierarchical structures of multilayer perceptrons. Neural networks, 13(3), 317-327. [2] Fasano, G., & Lucidi, S. (2009). A nonmonotone truncated Newton–Krylov method exploiting negative curvature directions, for large scale unconstrained optimization. Optimization Letters, 3(4), 521-535.

How to escape nonconvex regions efficiently in large scale optimization problems

TRONCI, EDOARDO MARIA
2022

Abstract

Solving large scale optimization problems, such as neural networks training, can present many challenges. Among others, the proliferation of useless stationary points, that is points where the objective function value is quite far from that of a global minimum, can pose a serious drawback for the optimization algorithm used, which is attracted by them and therefore results inefficient. In this dissertation, we propose two algorithmic schemes along the following lines. First, extending the result proposed in [1] for shallow networks, that is networks with only one hidden layer, we propose a mathematical characterization of a class of such stationary points that arise in deep multilayer neural networks training, that is networks with more than one hidden layer. Availing such a description, we are able to define an incremental training approach that avoids getting stuck in the region of attraction of these undesirable stationary points. Then, exploiting the main properties of the nonmonotone truncated Newton’s method proposed in [2], we attempt to grasp the benefits of using second-order information by giving a preliminary numerical evidence of the potential of following directions of negative curvature during neural networks training in order to guarantee the skill of the optimization algorithm in escaping regions where the objective function is nonconvex. References: [1] Fukumizu, K., & Amari, S. I. (2000). Local minima and plateaus in hierarchical structures of multilayer perceptrons. Neural networks, 13(3), 317-327. [2] Fasano, G., & Lucidi, S. (2009). A nonmonotone truncated Newton–Krylov method exploiting negative curvature directions, for large scale unconstrained optimization. Optimization Letters, 3(4), 521-535.
20-mag-2022
Inglese
Deep neural networks; training algorithm; stationary points; plateaus; negative curvature directions
LUCIDI, Stefano
Università degli Studi di Roma "La Sapienza"
File in questo prodotto:
File Dimensione Formato  
Tesi_dottorato_Tronci.pdf

accesso aperto

Dimensione 33.09 MB
Formato Adobe PDF
33.09 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/96661
Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-96661