The last decade has witnessed a massive development in Decentralised Machine Learning (DML) techniques, from simple decentralisation of the workload in the form of model and data parallelism, to advanced privacy-aware algorithms that train one joint model from a federation of neural networks while preserving strict privacy guarantees. The development of those techniques has coincided with shifts in public perception regarding data, privacy, and the capabilities of modern deep learning approaches. The following thesis compiles research on the topic of Federated Learning (FL), grounding it in modern discussions of data governance. While heavily focused on the latter, the discussion on data governance serves as a narrative framework for the technical research conducted therein, outlining the research’s motivation and possible practical scenarios for applying its outputs. In accordance with this approach, the first substantive part of this work (Chapter 4) introduces an array of issues connected with European Data Governance (EDG), followed by an introduction of Data Collaboratives - a concept that is built upon common management problems and serves as a generalisation of numerous approaches to collaborative learning that have been discussed over the last years. This part of the work serves as the aforementioned narrative clip for the technical parts of the thesis. While certainly not exhaustive, it situates the rest of the work in an appropriate context and suggests possible applications of the presented research outputs. The subsequent chapters (namely Chapters 5, 6 and 7) present the results of the experiments conducted on the selected problems that may arise in collaborative learning scenarios, mainly concerning clients’ marginal contribution quantification (Chapter 5), the relationship between the aforementioned marginal contribution quantification and the susceptibility to the privacy-related attacks (Chapter 6) and personalisation of the algorithms using clients clustering techniques (Chapter 7). The three technical chapters on quantification, re-identification, and personalisation are directly linked to the concept of Data Collaboratives presented at the beginning of the thesis. Two more chapters are placed at the beginning of this work, namely Chapter 2 and Chapter 3. Since the thesis contains narrative that may be of interest to interdisciplinary researchers, Chapter 2 serves as an introduction to the most common terms and concepts used in the Machine Learning (ML) community and provides a general overview of machine learning theory, with pointers to the relevant literature and articles. Chapter 3 serves as a more conventional State of the Art (SOTA) overview, introducing the basics of Decentralised Machine Learning (DML) and setting the notation for Chapters 5, 6 and 7.

Data Collaboration with the use of Federated Learning: Issues of Marginal Contribution Quantification and Personalisation

ZUZIAK, MACIEJ KRZYSZTOF
2025

Abstract

The last decade has witnessed a massive development in Decentralised Machine Learning (DML) techniques, from simple decentralisation of the workload in the form of model and data parallelism, to advanced privacy-aware algorithms that train one joint model from a federation of neural networks while preserving strict privacy guarantees. The development of those techniques has coincided with shifts in public perception regarding data, privacy, and the capabilities of modern deep learning approaches. The following thesis compiles research on the topic of Federated Learning (FL), grounding it in modern discussions of data governance. While heavily focused on the latter, the discussion on data governance serves as a narrative framework for the technical research conducted therein, outlining the research’s motivation and possible practical scenarios for applying its outputs. In accordance with this approach, the first substantive part of this work (Chapter 4) introduces an array of issues connected with European Data Governance (EDG), followed by an introduction of Data Collaboratives - a concept that is built upon common management problems and serves as a generalisation of numerous approaches to collaborative learning that have been discussed over the last years. This part of the work serves as the aforementioned narrative clip for the technical parts of the thesis. While certainly not exhaustive, it situates the rest of the work in an appropriate context and suggests possible applications of the presented research outputs. The subsequent chapters (namely Chapters 5, 6 and 7) present the results of the experiments conducted on the selected problems that may arise in collaborative learning scenarios, mainly concerning clients’ marginal contribution quantification (Chapter 5), the relationship between the aforementioned marginal contribution quantification and the susceptibility to the privacy-related attacks (Chapter 6) and personalisation of the algorithms using clients clustering techniques (Chapter 7). The three technical chapters on quantification, re-identification, and personalisation are directly linked to the concept of Data Collaboratives presented at the beginning of the thesis. Two more chapters are placed at the beginning of this work, namely Chapter 2 and Chapter 3. Since the thesis contains narrative that may be of interest to interdisciplinary researchers, Chapter 2 serves as an introduction to the most common terms and concepts used in the Machine Learning (ML) community and provides a general overview of machine learning theory, with pointers to the relevant literature and articles. Chapter 3 serves as a more conventional State of the Art (SOTA) overview, introducing the basics of Decentralised Machine Learning (DML) and setting the notation for Chapters 5, 6 and 7.
3-dic-2025
Inglese
federated learning
decentralised machine learning
personalised federated learning
data governance
alternative data governance
data commons
data collaboratives
marginal value quantification
Shapley value
marginal utility
membership inference attack
federated learning privacy threats
Rinzivillo, Salvatore
Comandè, Giovanni
File in questo prodotto:
File Dimensione Formato  
Maciej_Krzysztof_Zuziak_PhD_Thesis_PDFA.pdf

accesso aperto

Licenza: Creative Commons
Dimensione 86.1 MB
Formato Adobe PDF
86.1 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/356238
Il codice NBN di questa tesi è URN:NBN:IT:UNIPI-356238