Optimal and Efficient Learning In Classification

DELLA VECCHIA, Andrea

We study a natural extension of classical empirical risk minimization, where the hypothesis space is a random subspace of a given space. In particular, we consider possibly data dependent subspaces spanned by a random subset of the data, recovering as a special case Nyström approaches for kernel methods. Considering random subspaces naturally leads to computational savings, but the question is whether the corresponding learning accuracy is degraded. These statistical-computational tradeoffs have been recently explored for the least squares loss and self-concordant loss functions, such as the logistic loss. Here, we work to extend these results to convex Lipschitz loss functions, that might not be smooth, such as the hinge loss used in support vector machines. This unified analysis requires developing new proofs, that use different technical tools to establish fast rates. Our main results show the existence of different settings, depending on how hard the learning problem is, for which computational efficiency can be improved with no loss in performance. The analysis is also specialized to smooth loss functions. In the final part of the paper we convert our surrogates risk bounds into classification error bounds and compare the choice of hinge loss with respect to square loss.

Optimal and Efficient Learning In Classification

DELLA VECCHIA, ANDREA

2023

Abstract

We study a natural extension of classical empirical risk minimization, where the hypothesis space is a random subspace of a given space. In particular, we consider possibly data dependent subspaces spanned by a random subset of the data, recovering as a special case Nyström approaches for kernel methods. Considering random subspaces naturally leads to computational savings, but the question is whether the corresponding learning accuracy is degraded. These statistical-computational tradeoffs have been recently explored for the least squares loss and self-concordant loss functions, such as the logistic loss. Here, we work to extend these results to convex Lipschitz loss functions, that might not be smooth, such as the hinge loss used in support vector machines. This unified analysis requires developing new proofs, that use different technical tools to establish fast rates. Our main results show the existence of different settings, depending on how hard the learning problem is, for which computational efficiency can be improved with no loss in performance. The analysis is also specialized to smooth loss functions. In the final part of the paper we convert our surrogates risk bounds into classification error bounds and compare the choice of hinge loss with respect to square loss.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				100023 - Dipartimento di Informatica, bioingegneria, robotica e ingegneria dei sistemi
100021 - Dipartimento di Matematica
			
	Corso di studio
	
				XXXV CICLO - MATEMATICA E APPLICAZIONI - Matematica e applicazioni
			
	Data di pubblicazione
	
				29-mag-2023
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				DE VITO, ERNESTO
ROSASCO, LORENZO
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				VIGNI, STEFANO
			
	Nome Editore
	
				Università degli studi di Genova
			
	Collezione di appartenenza
	
				Università degli Studi di Genova

File in questo prodotto:

File	Dimensione	Formato
phdunige_4794275.pdf accesso aperto Dimensione 1.12 MB Formato Adobe PDF Visualizza/Apri	1.12 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/63674

Il codice NBN di questa tesi è URN:NBN:IT:UNIGE-63674