How to escape nonconvex regions efficiently in large scale optimization problems

Tronci, EDOARDO MARIA

Solving large scale optimization problems, such as neural networks training, can present many challenges. Among others, the proliferation of useless stationary points, that is points where the objective function value is quite far from that of a global minimum, can pose a serious drawback for the optimization algorithm used, which is attracted by them and therefore results inefficient. In this dissertation, we propose two algorithmic schemes along the following lines. First, extending the result proposed in [1] for shallow networks, that is networks with only one hidden layer, we propose a mathematical characterization of a class of such stationary points that arise in deep multilayer neural networks training, that is networks with more than one hidden layer. Availing such a description, we are able to define an incremental training approach that avoids getting stuck in the region of attraction of these undesirable stationary points. Then, exploiting the main properties of the nonmonotone truncated Newton’s method proposed in [2], we attempt to grasp the benefits of using second-order information by giving a preliminary numerical evidence of the potential of following directions of negative curvature during neural networks training in order to guarantee the skill of the optimization algorithm in escaping regions where the objective function is nonconvex. References: [1] Fukumizu, K., & Amari, S. I. (2000). Local minima and plateaus in hierarchical structures of multilayer perceptrons. Neural networks, 13(3), 317-327. [2] Fasano, G., & Lucidi, S. (2009). A nonmonotone truncated Newton–Krylov method exploiting negative curvature directions, for large scale unconstrained optimization. Optimization Letters, 3(4), 521-535.

How to escape nonconvex regions efficiently in large scale optimization problems

TRONCI, EDOARDO MARIA

2022

Abstract

Solving large scale optimization problems, such as neural networks training, can present many challenges. Among others, the proliferation of useless stationary points, that is points where the objective function value is quite far from that of a global minimum, can pose a serious drawback for the optimization algorithm used, which is attracted by them and therefore results inefficient. In this dissertation, we propose two algorithmic schemes along the following lines. First, extending the result proposed in [1] for shallow networks, that is networks with only one hidden layer, we propose a mathematical characterization of a class of such stationary points that arise in deep multilayer neural networks training, that is networks with more than one hidden layer. Availing such a description, we are able to define an incremental training approach that avoids getting stuck in the region of attraction of these undesirable stationary points. Then, exploiting the main properties of the nonmonotone truncated Newton’s method proposed in [2], we attempt to grasp the benefits of using second-order information by giving a preliminary numerical evidence of the potential of following directions of negative curvature during neural networks training in order to guarantee the skill of the optimization algorithm in escaping regions where the objective function is nonconvex. References: [1] Fukumizu, K., & Amari, S. I. (2000). Local minima and plateaus in hierarchical structures of multilayer perceptrons. Neural networks, 13(3), 317-327. [2] Fasano, G., & Lucidi, S. (2009). A nonmonotone truncated Newton–Krylov method exploiting negative curvature directions, for large scale unconstrained optimization. Optimization Letters, 3(4), 521-535.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI INGEGNERIA INFORMATICA, AUTOMATICA E GESTIONALE -ANTONIO RUBERTI-
			
	Corso di studio
	
				Automatica, bioingegneria e ricerca operativa - Abro
			
	Data di pubblicazione
	
				20-mag-2022
			
	Lingua
	
				Inglese
			
	Parola chiave
	
				Deep neural networks; training algorithm; stationary points; plateaus; negative curvature directions
			
	Relatore, Supervisor, Advisor o Tutor
	
				LUCIDI, Stefano
			
	Nome Editore
	
				Università degli Studi di Roma "La Sapienza"
			
	Collezione di appartenenza
	
				Università degli Studi di Roma La Sapienza

File in questo prodotto:

File	Dimensione	Formato
Tesi_dottorato_Tronci.pdf accesso aperto Dimensione 33.09 MB Formato Adobe PDF Visualizza/Apri	33.09 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/96661

Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-96661