Adversarial pruning: improving evaluations and methods

Piras, Giorgio

Amid its rapid development and groundbreaking results, Artificial Intelligence (AI) techniques, and particularly Machine Learning (ML) models, have been found to be vulnerable against adversarial examples, i.e., input samples purposely designed to mislead the classification. Upon realizing how elusive the security of ML techniques is, the research community has grown exponentially and continuously proposed cutting-edge varieties of adversarial attacks, with the ultimate goal of innovating and nurturing the field of Adversarial Machine Learning. Similarly, multiple defense techniques have been developed to design models robust against adversarial attacks, thus leading to an arms race between attacks and defenses in ML security. However, while designing a robust model is crucial for its deployment and has recently gained remarkable attention, it is not the sole priority; in fact, to comply with a resource- constrained scenario, or more simply remove superfluous parameters, ML models often require to be compressed. In this respect, compression methods such as neural network pruning have garnered great interest, removing redundant parameters in a network and ensuring a lightweight yet performing architecture design. Recently, to address the dual need for robustness against adversarial attacks and model compression, the research community focused on Adversarial Pruning (AP) methods, representing a set of pruning strategies that preserve robustness while reducing the model’s size. In this thesis, we focus our analysis on AP methods by first tackling the challenges hindering the development of such techniques and then proposing new frontiers and analyses to improve their performances. More in detail, we begin our study by surveying current AP methods and addressing two specific challenges: creating a taxonomy and improving the evaluations of AP methods. In fact, in the literature, the design of AP methods can often be diverse and complex, which makes it difficult to analyze the differences and establish a comparison between methods. In addition, the adversarial robustness evaluations of AP methods are often below par with respect to recent progress, thus undermining the reliability of the evaluations. To overcome these issues, we first (i) propose a taxonomy of AP methods based on the pruning pipeline and specifics (defining when and how to prune, respectively); then, we (ii) highlight the main limitations of current adversarial evaluations and propose a novel unified benchmark supplemented by a novel attack approach. In fact, in addition to State-of-the-Art (SoA) adversarial attacks, we improve the evaluations by developing and presenting a novel hyperparameter optimization strategy for Fast Minimum-Norm attacks: HO-FMN. Through our strategy, which we include in our benchmark to test AP methods, we improve current FMN attacks by addressing common adversarial evaluation issues, thus also affecting AP methods. After analyzing the SoA, and overcoming the challenges that emerged from our survey, we follow an intuition linking pruning, the flatness of the loss landscape, and adversarial robustness, while aiming to improve current AP methods. Also inspired by previous work showing that pruning models on flat minima improves generalization, we in turn question whether AP methods on flat minima can likewise improve robustness against adversarial attacks. To this end, we (iii) propose a novel approach referred to as FLat Adversarial Pruning (FLAP), through which we inject flatness into the pipeline of AP methods and improve their adversarial robustness, ultimately suggesting novel strategies to enhance AP methods.

Adversarial pruning: improving evaluations and methods

PIRAS, GIORGIO

2025

Abstract

Amid its rapid development and groundbreaking results, Artificial Intelligence (AI) techniques, and particularly Machine Learning (ML) models, have been found to be vulnerable against adversarial examples, i.e., input samples purposely designed to mislead the classification. Upon realizing how elusive the security of ML techniques is, the research community has grown exponentially and continuously proposed cutting-edge varieties of adversarial attacks, with the ultimate goal of innovating and nurturing the field of Adversarial Machine Learning. Similarly, multiple defense techniques have been developed to design models robust against adversarial attacks, thus leading to an arms race between attacks and defenses in ML security. However, while designing a robust model is crucial for its deployment and has recently gained remarkable attention, it is not the sole priority; in fact, to comply with a resource- constrained scenario, or more simply remove superfluous parameters, ML models often require to be compressed. In this respect, compression methods such as neural network pruning have garnered great interest, removing redundant parameters in a network and ensuring a lightweight yet performing architecture design. Recently, to address the dual need for robustness against adversarial attacks and model compression, the research community focused on Adversarial Pruning (AP) methods, representing a set of pruning strategies that preserve robustness while reducing the model’s size. In this thesis, we focus our analysis on AP methods by first tackling the challenges hindering the development of such techniques and then proposing new frontiers and analyses to improve their performances. More in detail, we begin our study by surveying current AP methods and addressing two specific challenges: creating a taxonomy and improving the evaluations of AP methods. In fact, in the literature, the design of AP methods can often be diverse and complex, which makes it difficult to analyze the differences and establish a comparison between methods. In addition, the adversarial robustness evaluations of AP methods are often below par with respect to recent progress, thus undermining the reliability of the evaluations. To overcome these issues, we first (i) propose a taxonomy of AP methods based on the pruning pipeline and specifics (defining when and how to prune, respectively); then, we (ii) highlight the main limitations of current adversarial evaluations and propose a novel unified benchmark supplemented by a novel attack approach. In fact, in addition to State-of-the-Art (SoA) adversarial attacks, we improve the evaluations by developing and presenting a novel hyperparameter optimization strategy for Fast Minimum-Norm attacks: HO-FMN. Through our strategy, which we include in our benchmark to test AP methods, we improve current FMN attacks by addressing common adversarial evaluation issues, thus also affecting AP methods. After analyzing the SoA, and overcoming the challenges that emerged from our survey, we follow an intuition linking pruning, the flatness of the loss landscape, and adversarial robustness, while aiming to improve current AP methods. Also inspired by previous work showing that pruning models on flat minima improves generalization, we in turn question whether AP methods on flat minima can likewise improve robustness against adversarial attacks. To this end, we (iii) propose a novel approach referred to as FLat Adversarial Pruning (FLAP), through which we inject flatness into the pipeline of AP methods and improve their adversarial robustness, ultimately suggesting novel strategies to enhance AP methods.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI INGEGNERIA INFORMATICA, AUTOMATICA E GESTIONALE -ANTONIO RUBERTI-
			
	Corso di studio
	
				Altro corso di dottorato
			
	Data di pubblicazione
	
				24-gen-2025
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				LENZERINI, Maurizio
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				LENZERINI, Maurizio
			
	Nome Editore
	
				Università degli Studi di Roma "La Sapienza"
			
	Numero di pagine
	
				113
			
	Collezione di appartenenza
	
				Università degli Studi di Roma La Sapienza

File in questo prodotto:

File	Dimensione	Formato
Tesi_dottorato_Piras.pdf accesso aperto Dimensione 5.17 MB Formato Adobe PDF Visualizza/Apri	5.17 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/193911

Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-193911