Learning-based systems – i.e., systems including machine learning (ML) models – are now employed in all aspects of our lives. The wide adoption of these systems has raised several concerns about their quality, as highlighted by the United Nations Sustainability Development Goals and the European Union AI Act. Unlike traditional software systems, learning-based systems employ an additional set of relevant quality properties (such as fairness, explainability, and privacy) that must be addressed. In this thesis, we focus on two of the most relevant quality properties of these systems—namely, fairness and efficiency—and propose a set of contributions that span different phases of a general learning-based systems development workflow. Concerning the fairness quality property, we first address a significant lack of fairness-enhancing methods by proposing a novel pre-processing algorithm to improve fairness in both binary and multi-class classification settings. Next, we formally model the workflows for fairness assessment and select the best combination of ML model and fairness-enhancing method. We propose two low-code approaches leveraging these formal models to support data scientists in developing fair learning-based systems. Additionally, motivated by the desire to further support data scientists in the early identification of variables leading to high bias in a system, we performed an extensive empirical evaluation of the ability of dataset structural features—termed bias symptoms—to detect algorithmic bias early, before training a model. Finally, we begin to investigate bias issues of learning-based systems employing Large Language Models (LLMs) and how the fairness of these systems is currently assessed in GitHub projects. Concerning the efficiency of learning-based systems, we first investigate the ability of existing approaches to estimate the training time of ML models early. This investigation is motivated by the need to assist data scientists in the early selection of ML models that meet a given training time constraint. Next, we examine the efficiency of LLMs regarding inference time and memory size. First, we conduct a thorough empirical investigation of the impact of LLM compression strategies on the efficiency and effectiveness of models fine-tuned for software engineering tasks. From this investigation, we derive a set of recommendations for practitioners and researchers to guide them in selecting the best compression strategy for a given task. Finally, we propose a novel search-based approach that identifies the optimal hyperparameter setting and prompt structure to reduce the inference time of text-to-image generation models while maintaining high quality in the generated images. With this thesis, we aim to support data scientists and practitioners in developing fair and efficient learning-based systems and to help standardize some phases of the development workflow.
Engineering Fair and Efficient Learning-Based Software Systems
D'ALOISIO, GIORDANO
2025
Abstract
Learning-based systems – i.e., systems including machine learning (ML) models – are now employed in all aspects of our lives. The wide adoption of these systems has raised several concerns about their quality, as highlighted by the United Nations Sustainability Development Goals and the European Union AI Act. Unlike traditional software systems, learning-based systems employ an additional set of relevant quality properties (such as fairness, explainability, and privacy) that must be addressed. In this thesis, we focus on two of the most relevant quality properties of these systems—namely, fairness and efficiency—and propose a set of contributions that span different phases of a general learning-based systems development workflow. Concerning the fairness quality property, we first address a significant lack of fairness-enhancing methods by proposing a novel pre-processing algorithm to improve fairness in both binary and multi-class classification settings. Next, we formally model the workflows for fairness assessment and select the best combination of ML model and fairness-enhancing method. We propose two low-code approaches leveraging these formal models to support data scientists in developing fair learning-based systems. Additionally, motivated by the desire to further support data scientists in the early identification of variables leading to high bias in a system, we performed an extensive empirical evaluation of the ability of dataset structural features—termed bias symptoms—to detect algorithmic bias early, before training a model. Finally, we begin to investigate bias issues of learning-based systems employing Large Language Models (LLMs) and how the fairness of these systems is currently assessed in GitHub projects. Concerning the efficiency of learning-based systems, we first investigate the ability of existing approaches to estimate the training time of ML models early. This investigation is motivated by the need to assist data scientists in the early selection of ML models that meet a given training time constraint. Next, we examine the efficiency of LLMs regarding inference time and memory size. First, we conduct a thorough empirical investigation of the impact of LLM compression strategies on the efficiency and effectiveness of models fine-tuned for software engineering tasks. From this investigation, we derive a set of recommendations for practitioners and researchers to guide them in selecting the best compression strategy for a given task. Finally, we propose a novel search-based approach that identifies the optimal hyperparameter setting and prompt structure to reduce the inference time of text-to-image generation models while maintaining high quality in the generated images. With this thesis, we aim to support data scientists and practitioners in developing fair and efficient learning-based systems and to help standardize some phases of the development workflow.File | Dimensione | Formato | |
---|---|---|---|
main.pdf
accesso aperto
Dimensione
5.35 MB
Formato
Adobe PDF
|
5.35 MB | Adobe PDF | Visualizza/Apri |
main_1.pdf
accesso aperto
Dimensione
5.35 MB
Formato
Adobe PDF
|
5.35 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/210810
URN:NBN:IT:UNIVAQ-210810