This thesis is dedicated to the study of the Stochastic sub-Gradient Method (SsGM) applied to data that are affected by heavy-tailed noise. The emergence of heavy-tailed noise in modern datasets has been investigated intensively in recent years. This type of noise affects negatively the performances of most optimization and machine learning methods, which instead are conceived with well-behaved light-tailed noise in mind. As SsGM is arguably the most adopted method in modern machine learning applications, it is important to characterize its robustness to noise. To increase the robustness of the method, clipping - i.e. the projection of the sub-gradients in a certain ball before updating the iterates - has emerged as one of the simplest yet effective techniques for first-order methods. We consider the general problem of minimizing a nonsmooth Lipschitz convex function over a convex set while having access only to (very) noise estimates of its sub-gradients. We only assume that the noise possesses the first p moments, allowing for heavy-tailed regimes. In this setting, we provide the following contributions. First, we give the first optimal rates of convergence in expectation for the last iterate of clipped SsGM. These rates generalize to the bounded pth moment noise model all existing results for the standard finite-variance model. As a byproduct of our analysis of the last iterate, we provide improved rates for the average iterate. Second, assuming finite variance of the noise, we give the first optimal rates of convergence with high probability for a large class of averaging scheme of the iterates, including the standard average and a scheme that resembles the last iterate. Third, we show how to apply, via a kernel trick, the clipping operator to infinite dimensional input as those obtained by kernel methods, giving the first application of this approach to statistical learning. Finally, we provide empirical evidence supporting our theoretical results.

The Stochastic sub-Gradient Method under Heavy-Tailed Noise

PARLETTA, DANIELA ANGELA
2025

Abstract

This thesis is dedicated to the study of the Stochastic sub-Gradient Method (SsGM) applied to data that are affected by heavy-tailed noise. The emergence of heavy-tailed noise in modern datasets has been investigated intensively in recent years. This type of noise affects negatively the performances of most optimization and machine learning methods, which instead are conceived with well-behaved light-tailed noise in mind. As SsGM is arguably the most adopted method in modern machine learning applications, it is important to characterize its robustness to noise. To increase the robustness of the method, clipping - i.e. the projection of the sub-gradients in a certain ball before updating the iterates - has emerged as one of the simplest yet effective techniques for first-order methods. We consider the general problem of minimizing a nonsmooth Lipschitz convex function over a convex set while having access only to (very) noise estimates of its sub-gradients. We only assume that the noise possesses the first p moments, allowing for heavy-tailed regimes. In this setting, we provide the following contributions. First, we give the first optimal rates of convergence in expectation for the last iterate of clipped SsGM. These rates generalize to the bounded pth moment noise model all existing results for the standard finite-variance model. As a byproduct of our analysis of the last iterate, we provide improved rates for the average iterate. Second, assuming finite variance of the noise, we give the first optimal rates of convergence with high probability for a large class of averaging scheme of the iterates, including the standard average and a scheme that resembles the last iterate. Third, we show how to apply, via a kernel trick, the clipping operator to infinite dimensional input as those obtained by kernel methods, giving the first application of this approach to statistical learning. Finally, we provide empirical evidence supporting our theoretical results.
3-feb-2025
Inglese
SALZO, SAVERIO
VIGNI, STEFANO
Università degli studi di Genova
File in questo prodotto:
File Dimensione Formato  
phdunige_4965392.pdf

accesso aperto

Dimensione 1.19 MB
Formato Adobe PDF
1.19 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/190001
Il codice NBN di questa tesi è URN:NBN:IT:UNIGE-190001