The goal of many data-driven problems is to achieve a good prediction by estimating a quantity of interest based on a finite set of (possibly noisy) measurements, exploiting training data, and some property of the model that may be known or not a-priori. The most common methods to reach this objective are explicit and implicit regularization. The first technique consists of minimizing the sum of a loss function plus a regularizer, which is explicitly added to the objective function and entails some a priori knowledge or some desired property of the solutions that we want to select. The second technique, implicit regularization, which is the main focus of this thesis, consists of minimizing a regularizer subject to the constraints established by the loss function minimizers. In this thesis, we propose two different approaches to solving the implicit problem. In the first part of the thesis, we follow a more traditional approach. We propose and study two new iterative regularization methods for inverse problems that are based on a primal-dual algorithm, where the bias and the loss is fixed. Our analysis, in the noise-free case, provides convergence rates for the Lagrangian and the feasibility gap. In the noisy case, it provides stability bounds and early stopping rules with theoretical guarantees. The main novelty of our work is the exploitation of some a priori knowledge about the solution set: we show that the linear equations determined by the data can be used more than once along the iterations. We discuss various approaches to reusing linear equations that are at the same time consistent with our assumptions and flexible in their implementation. Finally, we illustrate our theoretical findings with numerical simulations for robust sparse recovery and image reconstruction. We confirm the efficiency of the proposed regularization approaches by comparing the results with state-of-the-art methods. In the second part of this thesis, inspired by the recent success of re- and over-parameterization trained with gradient descent in machine learning, we flip our perspective by fixing the loss and the algorithm (gradient flow) and reparameterize the linear model. Then, we aim to find the implicit bias introduced by the chosen optimization method and reparameterization. But there is still an open question of how to find systematically what the inductive bias hidden behind the model for a particular optimization scheme is. The goal of this thesis is to take a step in this direction by studying a unified framework encompassing various reparametrizations presented in the state of the art, called time-warped mirror flow. However, a theoretical analysis of the existence and convergence of the trajectory is missing in the state of the art. Here, we fill this gap by providing a comprehensive study. First, we prove the existence and uniqueness of the solution. Next, we establish the convergence of both the trajectory and the corresponding values of the loss function. Finally, for any convex loss function, we prove that the trajectory converges towards a minimizer of the loss function, and we provide an implicit bias. For a specific case of loss functions, including least squares, the implicit bias can be made explicit. Furthermore, we explore the flexibility of our formulation by applying the previous results to different examples related to weight normalization. Finally, we give a criterion to determine, for a given function that depends only on the norm, a suitable weight normalization parameterization.
Implicit Regularization: Insights from Iterative Regularization and Overparameterization
VEGA CERENO, CRISTIAN JESUS
2024
Abstract
The goal of many data-driven problems is to achieve a good prediction by estimating a quantity of interest based on a finite set of (possibly noisy) measurements, exploiting training data, and some property of the model that may be known or not a-priori. The most common methods to reach this objective are explicit and implicit regularization. The first technique consists of minimizing the sum of a loss function plus a regularizer, which is explicitly added to the objective function and entails some a priori knowledge or some desired property of the solutions that we want to select. The second technique, implicit regularization, which is the main focus of this thesis, consists of minimizing a regularizer subject to the constraints established by the loss function minimizers. In this thesis, we propose two different approaches to solving the implicit problem. In the first part of the thesis, we follow a more traditional approach. We propose and study two new iterative regularization methods for inverse problems that are based on a primal-dual algorithm, where the bias and the loss is fixed. Our analysis, in the noise-free case, provides convergence rates for the Lagrangian and the feasibility gap. In the noisy case, it provides stability bounds and early stopping rules with theoretical guarantees. The main novelty of our work is the exploitation of some a priori knowledge about the solution set: we show that the linear equations determined by the data can be used more than once along the iterations. We discuss various approaches to reusing linear equations that are at the same time consistent with our assumptions and flexible in their implementation. Finally, we illustrate our theoretical findings with numerical simulations for robust sparse recovery and image reconstruction. We confirm the efficiency of the proposed regularization approaches by comparing the results with state-of-the-art methods. In the second part of this thesis, inspired by the recent success of re- and over-parameterization trained with gradient descent in machine learning, we flip our perspective by fixing the loss and the algorithm (gradient flow) and reparameterize the linear model. Then, we aim to find the implicit bias introduced by the chosen optimization method and reparameterization. But there is still an open question of how to find systematically what the inductive bias hidden behind the model for a particular optimization scheme is. The goal of this thesis is to take a step in this direction by studying a unified framework encompassing various reparametrizations presented in the state of the art, called time-warped mirror flow. However, a theoretical analysis of the existence and convergence of the trajectory is missing in the state of the art. Here, we fill this gap by providing a comprehensive study. First, we prove the existence and uniqueness of the solution. Next, we establish the convergence of both the trajectory and the corresponding values of the loss function. Finally, for any convex loss function, we prove that the trajectory converges towards a minimizer of the loss function, and we provide an implicit bias. For a specific case of loss functions, including least squares, the implicit bias can be made explicit. Furthermore, we explore the flexibility of our formulation by applying the previous results to different examples related to weight normalization. Finally, we give a criterion to determine, for a given function that depends only on the norm, a suitable weight normalization parameterization.File | Dimensione | Formato | |
---|---|---|---|
phdunige_4963312.pdf
accesso aperto
Dimensione
1.27 MB
Formato
Adobe PDF
|
1.27 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/157194
URN:NBN:IT:UNIGE-157194