Explainable AI with applications to cybersecurity

Zieni, Rasha

Phishing is a pervasive cybersecurity threat that targets individuals and organizations by exploiting human vulnerabilities to steal sensitive information, such as account credentials and credit card details. The timely detection of phishing websites is essential to mitigate the financial and reputation damages caused by such attacks. In this context, machine learning models have proven effective in identifying phishing websites by analyzing features extracted from URLs and web page content. However, ensuring the transparency and trustworthiness of these models through explainability remains a critical challenge. This work addresses the detection of phishing websites by proposing an explainable machine learning framework that not only provides accurate predictions but also identifies the most significant features associated with phishing. The proposed methodology includes a novel feature selection approach based on Lorenz Zonoid, a multidimensional extension of the Gini coefficient, to analyze both structured and unstructured data, including bag-of-words representations. By significantly reducing the number of features, the machine learning model is parsimonious while maintaining at the same time high accuracy and interpretability. Furthermore, this work also addresses a significant gap in the explainable AI domain by devising a methodological approach for systematically evaluating and comparing alternative explanations obtained from different methods based on their complexity and robustness. A series of experiments demonstrates the effectiveness of the proposed approach in identifying explanations that are both less complex and more reliable. Additionally, a novel framework is introduced to measure and optimize the robustness of explanations by fine-tuning model parameters. This framework is exemplified using ensemble tree models on artificially generated data, as well as on a publicly available phishing dataset, illustrating its versatility and applicability. The application of the proposed methodologies to phishing website detection highlights their relevance in tackling real-world cybersecurity challenges. This work not only advances the detection of phishing websites but also offers a foundation for broader applications in other high-stakes domains, such as finance and healthcare.

Explainable AI with applications to cybersecurity

ZIENI, Rasha

2025

Abstract

Phishing is a pervasive cybersecurity threat that targets individuals and organizations by exploiting human vulnerabilities to steal sensitive information, such as account credentials and credit card details. The timely detection of phishing websites is essential to mitigate the financial and reputation damages caused by such attacks. In this context, machine learning models have proven effective in identifying phishing websites by analyzing features extracted from URLs and web page content. However, ensuring the transparency and trustworthiness of these models through explainability remains a critical challenge. This work addresses the detection of phishing websites by proposing an explainable machine learning framework that not only provides accurate predictions but also identifies the most significant features associated with phishing. The proposed methodology includes a novel feature selection approach based on Lorenz Zonoid, a multidimensional extension of the Gini coefficient, to analyze both structured and unstructured data, including bag-of-words representations. By significantly reducing the number of features, the machine learning model is parsimonious while maintaining at the same time high accuracy and interpretability. Furthermore, this work also addresses a significant gap in the explainable AI domain by devising a methodological approach for systematically evaluating and comparing alternative explanations obtained from different methods based on their complexity and robustness. A series of experiments demonstrates the effectiveness of the proposed approach in identifying explanations that are both less complex and more reliable. Additionally, a novel framework is introduced to measure and optimize the robustness of explanations by fine-tuning model parameters. This framework is exemplified using ensemble tree models on artificially generated data, as well as on a publicly available phishing dataset, illustrating its versatility and applicability. The application of the proposed methodologies to phishing website detection highlights their relevance in tackling real-world cybersecurity challenges. This work not only advances the detection of phishing websites but also offers a foundation for broader applications in other high-stakes domains, such as finance and healthcare.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				INGEGNERIA ELETTRONICA, INFORMATICA ED ELETTRICA
			
	Data di pubblicazione
	
				29-mag-2025
			
	Lingua
	
				Inglese
			
	Abstract in italiano
	
				Phishing is a pervasive cybersecurity threat that targets individuals and organizations by exploiting human vulnerabilities to steal sensitive information, such as account credentials and credit card details. The timely detection of phishing websites is essential to mitigate the financial and reputation damages caused by such attacks. In this context, machine learning models have proven effective in identifying phishing websites by analyzing features extracted from URLs and web page content. However, ensuring the transparency and trustworthiness of these models through explainability remains a critical challenge.

This work addresses the detection of phishing websites by proposing an explainable machine learning framework that not only provides accurate predictions but also identifies the most significant features associated with phishing. The proposed methodology includes a novel feature selection approach based on Lorenz Zonoid, a multidimensional extension of the Gini coefficient, to analyze both structured and unstructured data, including bag-of-words representations. By significantly reducing the number of features, the machine learning model is parsimonious while maintaining at the same time high accuracy and interpretability.

Furthermore, this work also addresses a significant gap in the explainable AI domain by devising a methodological approach for systematically evaluating and comparing alternative explanations obtained from different methods based on their complexity and robustness. A series of experiments demonstrates the effectiveness of the proposed approach in identifying explanations that are both less complex and more reliable. Additionally, a novel framework is introduced to measure and optimize the robustness of explanations by fine-tuning model parameters. This framework is exemplified using ensemble tree models on artificially generated data, as well as on a publicly available phishing dataset, illustrating its versatility and applicability. 

The application of the proposed methodologies to phishing website detection highlights their relevance in tackling real-world cybersecurity challenges. This work not only advances the detection of phishing websites but also offers a foundation for broader applications in other high-stakes domains, such as finance and healthcare.
			
	Relatore, Supervisor, Advisor o Tutor
	
				CRISTIANI, ILARIA
			
	Nome Editore
	
				Università degli studi di Pavia
			
	Collezione di appartenenza
	
				Università degli Studi di Pavia

File in questo prodotto:

File	Dimensione	Formato
Zieni_Thesis_Explainable AI with applications to cybersecurity.pdf embargo fino al 08/12/2026 Licenza: Tutti i diritti riservati Dimensione 5.72 MB Formato Adobe PDF	5.72 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/210576

Il codice NBN di questa tesi è URN:NBN:IT:UNIPV-210576