Engineering Fair and Efficient Learning-Based Software Systems

D'Aloisio, Giordano

Learning-based systems – i.e., systems including machine learning (ML) models – are now employed in all aspects of our lives. The wide adoption of these systems has raised several concerns about their quality, as highlighted by the United Nations Sustainability Development Goals and the European Union AI Act. Unlike traditional software systems, learning-based systems employ an additional set of relevant quality properties (such as fairness, explainability, and privacy) that must be addressed. In this thesis, we focus on two of the most relevant quality properties of these systems—namely, fairness and efficiency—and propose a set of contributions that span different phases of a general learning-based systems development workflow. Concerning the fairness quality property, we first address a significant lack of fairness-enhancing methods by proposing a novel pre-processing algorithm to improve fairness in both binary and multi-class classification settings. Next, we formally model the workflows for fairness assessment and select the best combination of ML model and fairness-enhancing method. We propose two low-code approaches leveraging these formal models to support data scientists in developing fair learning-based systems. Additionally, motivated by the desire to further support data scientists in the early identification of variables leading to high bias in a system, we performed an extensive empirical evaluation of the ability of dataset structural features—termed bias symptoms—to detect algorithmic bias early, before training a model. Finally, we begin to investigate bias issues of learning-based systems employing Large Language Models (LLMs) and how the fairness of these systems is currently assessed in GitHub projects. Concerning the efficiency of learning-based systems, we first investigate the ability of existing approaches to estimate the training time of ML models early. This investigation is motivated by the need to assist data scientists in the early selection of ML models that meet a given training time constraint. Next, we examine the efficiency of LLMs regarding inference time and memory size. First, we conduct a thorough empirical investigation of the impact of LLM compression strategies on the efficiency and effectiveness of models fine-tuned for software engineering tasks. From this investigation, we derive a set of recommendations for practitioners and researchers to guide them in selecting the best compression strategy for a given task. Finally, we propose a novel search-based approach that identifies the optimal hyperparameter setting and prompt structure to reduce the inference time of text-to-image generation models while maintaining high quality in the generated images. With this thesis, we aim to support data scientists and practitioners in developing fair and efficient learning-based systems and to help standardize some phases of the development workflow.

Engineering Fair and Efficient Learning-Based Software Systems

D'ALOISIO, GIORDANO

2025

Abstract

Learning-based systems – i.e., systems including machine learning (ML) models – are now employed in all aspects of our lives. The wide adoption of these systems has raised several concerns about their quality, as highlighted by the United Nations Sustainability Development Goals and the European Union AI Act. Unlike traditional software systems, learning-based systems employ an additional set of relevant quality properties (such as fairness, explainability, and privacy) that must be addressed. In this thesis, we focus on two of the most relevant quality properties of these systems—namely, fairness and efficiency—and propose a set of contributions that span different phases of a general learning-based systems development workflow. Concerning the fairness quality property, we first address a significant lack of fairness-enhancing methods by proposing a novel pre-processing algorithm to improve fairness in both binary and multi-class classification settings. Next, we formally model the workflows for fairness assessment and select the best combination of ML model and fairness-enhancing method. We propose two low-code approaches leveraging these formal models to support data scientists in developing fair learning-based systems. Additionally, motivated by the desire to further support data scientists in the early identification of variables leading to high bias in a system, we performed an extensive empirical evaluation of the ability of dataset structural features—termed bias symptoms—to detect algorithmic bias early, before training a model. Finally, we begin to investigate bias issues of learning-based systems employing Large Language Models (LLMs) and how the fairness of these systems is currently assessed in GitHub projects. Concerning the efficiency of learning-based systems, we first investigate the ability of existing approaches to estimate the training time of ML models early. This investigation is motivated by the need to assist data scientists in the early selection of ML models that meet a given training time constraint. Next, we examine the efficiency of LLMs regarding inference time and memory size. First, we conduct a thorough empirical investigation of the impact of LLM compression strategies on the efficiency and effectiveness of models fine-tuned for software engineering tasks. From this investigation, we derive a set of recommendations for practitioners and researchers to guide them in selecting the best compression strategy for a given task. Finally, we propose a novel search-based approach that identifies the optimal hyperparameter setting and prompt structure to reduce the inference time of text-to-image generation models while maintaining high quality in the generated images. With this thesis, we aim to support data scientists and practitioners in developing fair and efficient learning-based systems and to help standardize some phases of the development workflow.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				INGEGNERIA E SCIENZE DELL'INFORMAZIONE
			
	Data di pubblicazione
	
				7-apr-2025
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				DI MARCO, ANTINISCA
DI RUSCIO, DAVIDE
			
	Nome Editore
	
				Università degli Studi dell'Aquila
			
	Collezione di appartenenza
	
				Università degli Studi di L'Aquila

File in questo prodotto:

File	Dimensione	Formato
main.pdf accesso aperto Dimensione 5.35 MB Formato Adobe PDF Visualizza/Apri	5.35 MB	Adobe PDF	Visualizza/Apri
main_1.pdf accesso aperto Dimensione 5.35 MB Formato Adobe PDF Visualizza/Apri	5.35 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/210810

Il codice NBN di questa tesi è URN:NBN:IT:UNIVAQ-210810