Nowadays, Machine Learning (ML) models are the first choice for a wide range of applications, thanks to their discriminative power and generalization capabilities. However, deployed models continuously face highly dynamic environments that might compromise their normal functioning, such as unpredictable data encountered at test time or the ever-growing performance requirements and technological advancements that make frequent updates inevitable. Within this thesis, we contribute to the state-of-the-art by proposing analysis and mitigation strategies to help developers combat the practical issues encountered during ML model deployment, focusing on three of those: i.e., (i) concept drift, (ii) practical adversarial perturbations, and (iii) regression in model updates. Concept drift is the temporal evolution of the data, and it is a well-known phenomenon in the malware domain, for which recent works have shown how it deteriorates model performance over time, making updating necessary to adapt to new evolving data. Current solutions activate the updates by detecting the concept drift. However, the underlying reasons for which it affects performance are under-investigated, which can be beneficial to understand how to make them stable for a more extended period. To this end, we propose a drift-analysis framework to identify which data characteristics are causing the drift in the context of Android malware, and we devise a time-aware classifier that leverages this information to better stabilize the performance over time. We highlight the efficacy of our proposal by comparing its degradation over time with a state-of-the-art classifier, revealing that we better withstand the distribution changes that naturally characterize the malware domain, which allows for reduced frequency in model updates. Moreover, after each update, models should be re-evaluated to assess whether performance concretely improved. Such evaluations should consider not only robustness to concept drift but also the robustness in the presence of adversaries capable of crafting practical adversarial perturbations. In the image domain, such perturbations can be realized as adversarial patches, i.e., optimized contiguous pixel blocks that cause misclassification when applied to an input image, can compromise the models’ performances, and cause integrity violations. The only way to properly evaluate the robustness against patch attacks is to generate and test them against the target model. However, their optimization is computationally demanding and requires careful hyperparameter tuning. We address this issue by proposing a dataset, namely ImageNet-Patch, to obtain fast robustness evaluations against adversarial patch attacks in the context of image classification. The dataset can be used to obtain an estimate of the robustness to patch attacks, avoiding all the computationally-expensive procedures required to generate them. We use ImageNet-Patch to conduct a large-scale evaluation, in which we assess the robustness of 127 models against patch attacks and also validate the effectiveness of the given patches in the physical domain (i.e., by printing and applying them to real-world objects). Although a given update might have improved the performance, such improvement can come with new errors on samples the previous model classified successfully. Such errors are referred to as negative flips and perceived by single users as a regression of performance. Recent works addressed this issue by considering only prediction accuracy as the primary metric but without concern about how it may affect other metrics, such as adversarial robustness. As our last contribution, we first disclose the concept of regression under the adversarial robustness lens. In particular, when updating a model to improve its adversarial robustness, some previously-ineffective adversarial examples may become misclassified, causing a regression in the perceived security of the system. To address this issue, we propose a novel technique named robustness-congruent adversarial training. It amounts to fine-tuning a model with state-of-the-art robust learning methods while constraining it to retain higher robustness on the adversarial examples that were correctly classified before the update. We also show that our algorithm and, more generally, learning with non-regression constraints, provide a theoretically grounded framework to train consistent estimators. Our experiments on robust models for image classification confirm that (i) both accuracy and robustness, even if improved after model update, can be affected by negative flips, and (ii) our robustness-congruent adversarial training can mitigate the problem, outperforming competing baseline methods.
Deploying robust machine-learning models: practical issues and mitigations
ANGIONI, DANIELE
2025
Abstract
Nowadays, Machine Learning (ML) models are the first choice for a wide range of applications, thanks to their discriminative power and generalization capabilities. However, deployed models continuously face highly dynamic environments that might compromise their normal functioning, such as unpredictable data encountered at test time or the ever-growing performance requirements and technological advancements that make frequent updates inevitable. Within this thesis, we contribute to the state-of-the-art by proposing analysis and mitigation strategies to help developers combat the practical issues encountered during ML model deployment, focusing on three of those: i.e., (i) concept drift, (ii) practical adversarial perturbations, and (iii) regression in model updates. Concept drift is the temporal evolution of the data, and it is a well-known phenomenon in the malware domain, for which recent works have shown how it deteriorates model performance over time, making updating necessary to adapt to new evolving data. Current solutions activate the updates by detecting the concept drift. However, the underlying reasons for which it affects performance are under-investigated, which can be beneficial to understand how to make them stable for a more extended period. To this end, we propose a drift-analysis framework to identify which data characteristics are causing the drift in the context of Android malware, and we devise a time-aware classifier that leverages this information to better stabilize the performance over time. We highlight the efficacy of our proposal by comparing its degradation over time with a state-of-the-art classifier, revealing that we better withstand the distribution changes that naturally characterize the malware domain, which allows for reduced frequency in model updates. Moreover, after each update, models should be re-evaluated to assess whether performance concretely improved. Such evaluations should consider not only robustness to concept drift but also the robustness in the presence of adversaries capable of crafting practical adversarial perturbations. In the image domain, such perturbations can be realized as adversarial patches, i.e., optimized contiguous pixel blocks that cause misclassification when applied to an input image, can compromise the models’ performances, and cause integrity violations. The only way to properly evaluate the robustness against patch attacks is to generate and test them against the target model. However, their optimization is computationally demanding and requires careful hyperparameter tuning. We address this issue by proposing a dataset, namely ImageNet-Patch, to obtain fast robustness evaluations against adversarial patch attacks in the context of image classification. The dataset can be used to obtain an estimate of the robustness to patch attacks, avoiding all the computationally-expensive procedures required to generate them. We use ImageNet-Patch to conduct a large-scale evaluation, in which we assess the robustness of 127 models against patch attacks and also validate the effectiveness of the given patches in the physical domain (i.e., by printing and applying them to real-world objects). Although a given update might have improved the performance, such improvement can come with new errors on samples the previous model classified successfully. Such errors are referred to as negative flips and perceived by single users as a regression of performance. Recent works addressed this issue by considering only prediction accuracy as the primary metric but without concern about how it may affect other metrics, such as adversarial robustness. As our last contribution, we first disclose the concept of regression under the adversarial robustness lens. In particular, when updating a model to improve its adversarial robustness, some previously-ineffective adversarial examples may become misclassified, causing a regression in the perceived security of the system. To address this issue, we propose a novel technique named robustness-congruent adversarial training. It amounts to fine-tuning a model with state-of-the-art robust learning methods while constraining it to retain higher robustness on the adversarial examples that were correctly classified before the update. We also show that our algorithm and, more generally, learning with non-regression constraints, provide a theoretically grounded framework to train consistent estimators. Our experiments on robust models for image classification confirm that (i) both accuracy and robustness, even if improved after model update, can be affected by negative flips, and (ii) our robustness-congruent adversarial training can mitigate the problem, outperforming competing baseline methods.File | Dimensione | Formato | |
---|---|---|---|
Tesi_dottorato_Angioni.pdf
accesso aperto
Dimensione
5.1 MB
Formato
Adobe PDF
|
5.1 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/195922
URN:NBN:IT:UNIROMA1-195922