In the age of Big Data and complex black-box Machine Learning (ML) models, the pursuit of inherently interpretable models has transitioned from a desirable attribute to a critical necessity. Indeed, lack of explanations on the black-box ML decisions constitutes a main difficulty to their adoption in real world decision domains. Interpretable ML aims to develop models that can give explanatory insights into their decision-making process [72]. However, the development of such interpretable models often involves coping with complex optimization problems that demand significant computational resources, particularly when applied to large datasets. Support Vector Machines (SVMs) are a powerful ML tool for supervised classification which constructs a hyperplane that optimizes misclassification error and the distance between points in two different classes (margin), offering robust and accurate decision boundaries even in complex datasets. In linear SVMs the hyperplane is constructed directly in the input space of the data, while in nonlinear SVMs more complex data is mapped into a higher dimensional space where such a hyperplane can be constructed, resulting in nonlinear decision boundaries in the input space. SVMs classifiers are usually not interpretable even in the linear case being dense with respect to the number of features used. This thesis focuses on the application of Mixed Integer Programming formulations and algorithms to mitigate the challenges associated with training interpretable SVMs for supervised classification. In particular we study the case in which a cardinality constraint on the maximum number of suitable data features is embedded in the standard optimization models. For linear SVMs, we propose two Mixed Integer Quadratic models and develop both heuristic and exact algorithms which leverage efficient relaxations to solve them faster than off-the-shelf commercial solvers. Regarding the nonconvex Mixed Integer Nonlinear Programming problem arising when feature selection is embedded in nonlinear SVMs models, we develop both local search metaheuristics able to find good solutions in reasonable computational times for general kernels and decomposition approaches based on linearization techniques or submodular functions optimization in the case polynomial kernels are adopted. The value of these algorithms extends beyond SVMs for classification tasks, in particular they can be extended to other ML models embedding the SVM framework or to other nonlinear kernel-based ML models for regression or clustering tasks. As a matter of example, we also propose an extension of one of the proposed heuristic methods for linear SVM feature selection to the case of Optimal Classification Trees which employ SVM splits.
Novel mixed-integer optimization models and algorithms for interpretable SVMs
D'ONOFRIO, FEDERICO
2024
Abstract
In the age of Big Data and complex black-box Machine Learning (ML) models, the pursuit of inherently interpretable models has transitioned from a desirable attribute to a critical necessity. Indeed, lack of explanations on the black-box ML decisions constitutes a main difficulty to their adoption in real world decision domains. Interpretable ML aims to develop models that can give explanatory insights into their decision-making process [72]. However, the development of such interpretable models often involves coping with complex optimization problems that demand significant computational resources, particularly when applied to large datasets. Support Vector Machines (SVMs) are a powerful ML tool for supervised classification which constructs a hyperplane that optimizes misclassification error and the distance between points in two different classes (margin), offering robust and accurate decision boundaries even in complex datasets. In linear SVMs the hyperplane is constructed directly in the input space of the data, while in nonlinear SVMs more complex data is mapped into a higher dimensional space where such a hyperplane can be constructed, resulting in nonlinear decision boundaries in the input space. SVMs classifiers are usually not interpretable even in the linear case being dense with respect to the number of features used. This thesis focuses on the application of Mixed Integer Programming formulations and algorithms to mitigate the challenges associated with training interpretable SVMs for supervised classification. In particular we study the case in which a cardinality constraint on the maximum number of suitable data features is embedded in the standard optimization models. For linear SVMs, we propose two Mixed Integer Quadratic models and develop both heuristic and exact algorithms which leverage efficient relaxations to solve them faster than off-the-shelf commercial solvers. Regarding the nonconvex Mixed Integer Nonlinear Programming problem arising when feature selection is embedded in nonlinear SVMs models, we develop both local search metaheuristics able to find good solutions in reasonable computational times for general kernels and decomposition approaches based on linearization techniques or submodular functions optimization in the case polynomial kernels are adopted. The value of these algorithms extends beyond SVMs for classification tasks, in particular they can be extended to other ML models embedding the SVM framework or to other nonlinear kernel-based ML models for regression or clustering tasks. As a matter of example, we also propose an extension of one of the proposed heuristic methods for linear SVM feature selection to the case of Optimal Classification Trees which employ SVM splits.File | Dimensione | Formato | |
---|---|---|---|
Tesi_dottorato_DOnofrio.pdf
accesso aperto
Dimensione
6.12 MB
Formato
Adobe PDF
|
6.12 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/203099
URN:NBN:IT:UNIROMA1-203099