Novel mixed-integer optimization models and algorithms for interpretable SVMs

D'Onofrio, Federico

In the age of Big Data and complex black-box Machine Learning (ML) models, the pursuit of inherently interpretable models has transitioned from a desirable attribute to a critical necessity. Indeed, lack of explanations on the black-box ML decisions constitutes a main difficulty to their adoption in real world decision domains. Interpretable ML aims to develop models that can give explanatory insights into their decision-making process [72]. However, the development of such interpretable models often involves coping with complex optimization problems that demand significant computational resources, particularly when applied to large datasets. Support Vector Machines (SVMs) are a powerful ML tool for supervised classification which constructs a hyperplane that optimizes misclassification error and the distance between points in two different classes (margin), offering robust and accurate decision boundaries even in complex datasets. In linear SVMs the hyperplane is constructed directly in the input space of the data, while in nonlinear SVMs more complex data is mapped into a higher dimensional space where such a hyperplane can be constructed, resulting in nonlinear decision boundaries in the input space. SVMs classifiers are usually not interpretable even in the linear case being dense with respect to the number of features used. This thesis focuses on the application of Mixed Integer Programming formulations and algorithms to mitigate the challenges associated with training interpretable SVMs for supervised classification. In particular we study the case in which a cardinality constraint on the maximum number of suitable data features is embedded in the standard optimization models. For linear SVMs, we propose two Mixed Integer Quadratic models and develop both heuristic and exact algorithms which leverage efficient relaxations to solve them faster than off-the-shelf commercial solvers. Regarding the nonconvex Mixed Integer Nonlinear Programming problem arising when feature selection is embedded in nonlinear SVMs models, we develop both local search metaheuristics able to find good solutions in reasonable computational times for general kernels and decomposition approaches based on linearization techniques or submodular functions optimization in the case polynomial kernels are adopted. The value of these algorithms extends beyond SVMs for classification tasks, in particular they can be extended to other ML models embedding the SVM framework or to other nonlinear kernel-based ML models for regression or clustering tasks. As a matter of example, we also propose an extension of one of the proposed heuristic methods for linear SVM feature selection to the case of Optimal Classification Trees which employ SVM splits.

Novel mixed-integer optimization models and algorithms for interpretable SVMs

D'ONOFRIO, FEDERICO

2024

Abstract

In the age of Big Data and complex black-box Machine Learning (ML) models, the pursuit of inherently interpretable models has transitioned from a desirable attribute to a critical necessity. Indeed, lack of explanations on the black-box ML decisions constitutes a main difficulty to their adoption in real world decision domains. Interpretable ML aims to develop models that can give explanatory insights into their decision-making process [72]. However, the development of such interpretable models often involves coping with complex optimization problems that demand significant computational resources, particularly when applied to large datasets. Support Vector Machines (SVMs) are a powerful ML tool for supervised classification which constructs a hyperplane that optimizes misclassification error and the distance between points in two different classes (margin), offering robust and accurate decision boundaries even in complex datasets. In linear SVMs the hyperplane is constructed directly in the input space of the data, while in nonlinear SVMs more complex data is mapped into a higher dimensional space where such a hyperplane can be constructed, resulting in nonlinear decision boundaries in the input space. SVMs classifiers are usually not interpretable even in the linear case being dense with respect to the number of features used. This thesis focuses on the application of Mixed Integer Programming formulations and algorithms to mitigate the challenges associated with training interpretable SVMs for supervised classification. In particular we study the case in which a cardinality constraint on the maximum number of suitable data features is embedded in the standard optimization models. For linear SVMs, we propose two Mixed Integer Quadratic models and develop both heuristic and exact algorithms which leverage efficient relaxations to solve them faster than off-the-shelf commercial solvers. Regarding the nonconvex Mixed Integer Nonlinear Programming problem arising when feature selection is embedded in nonlinear SVMs models, we develop both local search metaheuristics able to find good solutions in reasonable computational times for general kernels and decomposition approaches based on linearization techniques or submodular functions optimization in the case polynomial kernels are adopted. The value of these algorithms extends beyond SVMs for classification tasks, in particular they can be extended to other ML models embedding the SVM framework or to other nonlinear kernel-based ML models for regression or clustering tasks. As a matter of example, we also propose an extension of one of the proposed heuristic methods for linear SVM feature selection to the case of Optimal Classification Trees which employ SVM splits.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI INGEGNERIA INFORMATICA, AUTOMATICA E GESTIONALE -ANTONIO RUBERTI-
			
	Corso di studio
	
				Automatica, bioingegneria e ricerca operativa - Abro
			
	Data di pubblicazione
	
				29-mag-2024
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				PALAGI, Laura
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				PALAGI, Laura
			
	Nome Editore
	
				Università degli Studi di Roma "La Sapienza"
			
	Numero di pagine
	
				138
			
	Collezione di appartenenza
	
				Università degli Studi di Roma La Sapienza

File in questo prodotto:

File	Dimensione	Formato
Tesi_dottorato_DOnofrio.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 6.12 MB Formato Adobe PDF Visualizza/Apri	6.12 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/203099

Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-203099