Amid its rapid development and groundbreaking results, Artificial Intelligence (AI) techniques, and particularly Machine Learning (ML) models, have been found to be vulnerable against adversarial examples, i.e., input samples purposely designed to mislead the classification. Upon realizing how elusive the security of ML techniques is, the research community has grown exponentially and continuously proposed cutting-edge varieties of adversarial attacks, with the ultimate goal of innovating and nurturing the field of Adversarial Machine Learning. Similarly, multiple defense techniques have been developed to design models robust against adversarial attacks, thus leading to an arms race between attacks and defenses in ML security. However, while designing a robust model is crucial for its deployment and has recently gained remarkable attention, it is not the sole priority; in fact, to comply with a resource- constrained scenario, or more simply remove superfluous parameters, ML models often require to be compressed. In this respect, compression methods such as neural network pruning have garnered great interest, removing redundant parameters in a network and ensuring a lightweight yet performing architecture design. Recently, to address the dual need for robustness against adversarial attacks and model compression, the research community focused on Adversarial Pruning (AP) methods, representing a set of pruning strategies that preserve robustness while reducing the model’s size. In this thesis, we focus our analysis on AP methods by first tackling the challenges hindering the development of such techniques and then proposing new frontiers and analyses to improve their performances. More in detail, we begin our study by surveying current AP methods and addressing two specific challenges: creating a taxonomy and improving the evaluations of AP methods. In fact, in the literature, the design of AP methods can often be diverse and complex, which makes it difficult to analyze the differences and establish a comparison between methods. In addition, the adversarial robustness evaluations of AP methods are often below par with respect to recent progress, thus undermining the reliability of the evaluations. To overcome these issues, we first (i) propose a taxonomy of AP methods based on the pruning pipeline and specifics (defining when and how to prune, respectively); then, we (ii) highlight the main limitations of current adversarial evaluations and propose a novel unified benchmark supplemented by a novel attack approach. In fact, in addition to State-of-the-Art (SoA) adversarial attacks, we improve the evaluations by developing and presenting a novel hyperparameter optimization strategy for Fast Minimum-Norm attacks: HO-FMN. Through our strategy, which we include in our benchmark to test AP methods, we improve current FMN attacks by addressing common adversarial evaluation issues, thus also affecting AP methods. After analyzing the SoA, and overcoming the challenges that emerged from our survey, we follow an intuition linking pruning, the flatness of the loss landscape, and adversarial robustness, while aiming to improve current AP methods. Also inspired by previous work showing that pruning models on flat minima improves generalization, we in turn question whether AP methods on flat minima can likewise improve robustness against adversarial attacks. To this end, we (iii) propose a novel approach referred to as FLat Adversarial Pruning (FLAP), through which we inject flatness into the pipeline of AP methods and improve their adversarial robustness, ultimately suggesting novel strategies to enhance AP methods.
Adversarial pruning: improving evaluations and methods
PIRAS, GIORGIO
2025
Abstract
Amid its rapid development and groundbreaking results, Artificial Intelligence (AI) techniques, and particularly Machine Learning (ML) models, have been found to be vulnerable against adversarial examples, i.e., input samples purposely designed to mislead the classification. Upon realizing how elusive the security of ML techniques is, the research community has grown exponentially and continuously proposed cutting-edge varieties of adversarial attacks, with the ultimate goal of innovating and nurturing the field of Adversarial Machine Learning. Similarly, multiple defense techniques have been developed to design models robust against adversarial attacks, thus leading to an arms race between attacks and defenses in ML security. However, while designing a robust model is crucial for its deployment and has recently gained remarkable attention, it is not the sole priority; in fact, to comply with a resource- constrained scenario, or more simply remove superfluous parameters, ML models often require to be compressed. In this respect, compression methods such as neural network pruning have garnered great interest, removing redundant parameters in a network and ensuring a lightweight yet performing architecture design. Recently, to address the dual need for robustness against adversarial attacks and model compression, the research community focused on Adversarial Pruning (AP) methods, representing a set of pruning strategies that preserve robustness while reducing the model’s size. In this thesis, we focus our analysis on AP methods by first tackling the challenges hindering the development of such techniques and then proposing new frontiers and analyses to improve their performances. More in detail, we begin our study by surveying current AP methods and addressing two specific challenges: creating a taxonomy and improving the evaluations of AP methods. In fact, in the literature, the design of AP methods can often be diverse and complex, which makes it difficult to analyze the differences and establish a comparison between methods. In addition, the adversarial robustness evaluations of AP methods are often below par with respect to recent progress, thus undermining the reliability of the evaluations. To overcome these issues, we first (i) propose a taxonomy of AP methods based on the pruning pipeline and specifics (defining when and how to prune, respectively); then, we (ii) highlight the main limitations of current adversarial evaluations and propose a novel unified benchmark supplemented by a novel attack approach. In fact, in addition to State-of-the-Art (SoA) adversarial attacks, we improve the evaluations by developing and presenting a novel hyperparameter optimization strategy for Fast Minimum-Norm attacks: HO-FMN. Through our strategy, which we include in our benchmark to test AP methods, we improve current FMN attacks by addressing common adversarial evaluation issues, thus also affecting AP methods. After analyzing the SoA, and overcoming the challenges that emerged from our survey, we follow an intuition linking pruning, the flatness of the loss landscape, and adversarial robustness, while aiming to improve current AP methods. Also inspired by previous work showing that pruning models on flat minima improves generalization, we in turn question whether AP methods on flat minima can likewise improve robustness against adversarial attacks. To this end, we (iii) propose a novel approach referred to as FLat Adversarial Pruning (FLAP), through which we inject flatness into the pipeline of AP methods and improve their adversarial robustness, ultimately suggesting novel strategies to enhance AP methods.File | Dimensione | Formato | |
---|---|---|---|
Tesi_dottorato_Piras.pdf
accesso aperto
Dimensione
5.17 MB
Formato
Adobe PDF
|
5.17 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/193911
URN:NBN:IT:UNIROMA1-193911