Learning-based systems – i.e., systems including machine learning (ML) models – are now employed in all aspects of our lives. The wide adoption of these systems has raised several concerns about their quality, as highlighted by the United Nations Sustainability Development Goals and the European Union AI Act. Unlike traditional software systems, learning-based systems employ an additional set of relevant quality properties (such as fairness, explainability, and privacy) that must be addressed. In this thesis, we focus on two of the most relevant quality properties of these systems—namely, fairness and efficiency—and propose a set of contributions that span different phases of a general learning-based systems development workflow. Concerning the fairness quality property, we first address a significant lack of fairness-enhancing methods by proposing a novel pre-processing algorithm to improve fairness in both binary and multi-class classification settings. Next, we formally model the workflows for fairness assessment and select the best combination of ML model and fairness-enhancing method. We propose two low-code approaches leveraging these formal models to support data scientists in developing fair learning-based systems. Additionally, motivated by the desire to further support data scientists in the early identification of variables leading to high bias in a system, we performed an extensive empirical evaluation of the ability of dataset structural features—termed bias symptoms—to detect algorithmic bias early, before training a model. Finally, we begin to investigate bias issues of learning-based systems employing Large Language Models (LLMs) and how the fairness of these systems is currently assessed in GitHub projects. Concerning the efficiency of learning-based systems, we first investigate the ability of existing approaches to estimate the training time of ML models early. This investigation is motivated by the need to assist data scientists in the early selection of ML models that meet a given training time constraint. Next, we examine the efficiency of LLMs regarding inference time and memory size. First, we conduct a thorough empirical investigation of the impact of LLM compression strategies on the efficiency and effectiveness of models fine-tuned for software engineering tasks. From this investigation, we derive a set of recommendations for practitioners and researchers to guide them in selecting the best compression strategy for a given task. Finally, we propose a novel search-based approach that identifies the optimal hyperparameter setting and prompt structure to reduce the inference time of text-to-image generation models while maintaining high quality in the generated images. With this thesis, we aim to support data scientists and practitioners in developing fair and efficient learning-based systems and to help standardize some phases of the development workflow.

Engineering Fair and Efficient Learning-Based Software Systems

D'ALOISIO, GIORDANO
2025

Abstract

Learning-based systems – i.e., systems including machine learning (ML) models – are now employed in all aspects of our lives. The wide adoption of these systems has raised several concerns about their quality, as highlighted by the United Nations Sustainability Development Goals and the European Union AI Act. Unlike traditional software systems, learning-based systems employ an additional set of relevant quality properties (such as fairness, explainability, and privacy) that must be addressed. In this thesis, we focus on two of the most relevant quality properties of these systems—namely, fairness and efficiency—and propose a set of contributions that span different phases of a general learning-based systems development workflow. Concerning the fairness quality property, we first address a significant lack of fairness-enhancing methods by proposing a novel pre-processing algorithm to improve fairness in both binary and multi-class classification settings. Next, we formally model the workflows for fairness assessment and select the best combination of ML model and fairness-enhancing method. We propose two low-code approaches leveraging these formal models to support data scientists in developing fair learning-based systems. Additionally, motivated by the desire to further support data scientists in the early identification of variables leading to high bias in a system, we performed an extensive empirical evaluation of the ability of dataset structural features—termed bias symptoms—to detect algorithmic bias early, before training a model. Finally, we begin to investigate bias issues of learning-based systems employing Large Language Models (LLMs) and how the fairness of these systems is currently assessed in GitHub projects. Concerning the efficiency of learning-based systems, we first investigate the ability of existing approaches to estimate the training time of ML models early. This investigation is motivated by the need to assist data scientists in the early selection of ML models that meet a given training time constraint. Next, we examine the efficiency of LLMs regarding inference time and memory size. First, we conduct a thorough empirical investigation of the impact of LLM compression strategies on the efficiency and effectiveness of models fine-tuned for software engineering tasks. From this investigation, we derive a set of recommendations for practitioners and researchers to guide them in selecting the best compression strategy for a given task. Finally, we propose a novel search-based approach that identifies the optimal hyperparameter setting and prompt structure to reduce the inference time of text-to-image generation models while maintaining high quality in the generated images. With this thesis, we aim to support data scientists and practitioners in developing fair and efficient learning-based systems and to help standardize some phases of the development workflow.
7-apr-2025
Inglese
DI MARCO, ANTINISCA
DI RUSCIO, DAVIDE
Università degli Studi dell'Aquila
File in questo prodotto:
File Dimensione Formato  
main.pdf

accesso aperto

Dimensione 5.35 MB
Formato Adobe PDF
5.35 MB Adobe PDF Visualizza/Apri
main_1.pdf

accesso aperto

Dimensione 5.35 MB
Formato Adobe PDF
5.35 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/210810
Il codice NBN di questa tesi è URN:NBN:IT:UNIVAQ-210810