The last decade has witnessed a massive development in Decentralised Machine Learning (DML) techniques, from simple decentralisation of the workload in the form of model and data parallelism, to advanced privacy-aware algorithms that train one joint model from a federation of neural networks while preserving strict privacy guarantees. The development of those techniques has coincided with shifts in public perception regarding data, privacy, and the capabilities of modern deep learning approaches. The following thesis compiles research on the topic of Federated Learning (FL), grounding it in modern discussions of data governance. While heavily focused on the latter, the discussion on data governance serves as a narrative framework for the technical research conducted therein, outlining the research’s motivation and possible practical scenarios for applying its outputs. In accordance with this approach, the first substantive part of this work (Chapter 4) introduces an array of issues connected with European Data Governance (EDG), followed by an introduction of Data Collaboratives - a concept that is built upon common management problems and serves as a generalisation of numerous approaches to collaborative learning that have been discussed over the last years. This part of the work serves as the aforementioned narrative clip for the technical parts of the thesis. While certainly not exhaustive, it situates the rest of the work in an appropriate context and suggests possible applications of the presented research outputs. The subsequent chapters (namely Chapters 5, 6 and 7) present the results of the experiments conducted on the selected problems that may arise in collaborative learning scenarios, mainly concerning clients’ marginal contribution quantification (Chapter 5), the relationship between the aforementioned marginal contribution quantification and the susceptibility to the privacy-related attacks (Chapter 6) and personalisation of the algorithms using clients clustering techniques (Chapter 7). The three technical chapters on quantification, re-identification, and personalisation are directly linked to the concept of Data Collaboratives presented at the beginning of the thesis. Two more chapters are placed at the beginning of this work, namely Chapter 2 and Chapter 3. Since the thesis contains narrative that may be of interest to interdisciplinary researchers, Chapter 2 serves as an introduction to the most common terms and concepts used in the Machine Learning (ML) community and provides a general overview of machine learning theory, with pointers to the relevant literature and articles. Chapter 3 serves as a more conventional State of the Art (SOTA) overview, introducing the basics of Decentralised Machine Learning (DML) and setting the notation for Chapters 5, 6 and 7.
Data Collaboration with the use of Federated Learning: Issues of Marginal Contribution Quantification and Personalisation
ZUZIAK, MACIEJ KRZYSZTOF
2025
Abstract
The last decade has witnessed a massive development in Decentralised Machine Learning (DML) techniques, from simple decentralisation of the workload in the form of model and data parallelism, to advanced privacy-aware algorithms that train one joint model from a federation of neural networks while preserving strict privacy guarantees. The development of those techniques has coincided with shifts in public perception regarding data, privacy, and the capabilities of modern deep learning approaches. The following thesis compiles research on the topic of Federated Learning (FL), grounding it in modern discussions of data governance. While heavily focused on the latter, the discussion on data governance serves as a narrative framework for the technical research conducted therein, outlining the research’s motivation and possible practical scenarios for applying its outputs. In accordance with this approach, the first substantive part of this work (Chapter 4) introduces an array of issues connected with European Data Governance (EDG), followed by an introduction of Data Collaboratives - a concept that is built upon common management problems and serves as a generalisation of numerous approaches to collaborative learning that have been discussed over the last years. This part of the work serves as the aforementioned narrative clip for the technical parts of the thesis. While certainly not exhaustive, it situates the rest of the work in an appropriate context and suggests possible applications of the presented research outputs. The subsequent chapters (namely Chapters 5, 6 and 7) present the results of the experiments conducted on the selected problems that may arise in collaborative learning scenarios, mainly concerning clients’ marginal contribution quantification (Chapter 5), the relationship between the aforementioned marginal contribution quantification and the susceptibility to the privacy-related attacks (Chapter 6) and personalisation of the algorithms using clients clustering techniques (Chapter 7). The three technical chapters on quantification, re-identification, and personalisation are directly linked to the concept of Data Collaboratives presented at the beginning of the thesis. Two more chapters are placed at the beginning of this work, namely Chapter 2 and Chapter 3. Since the thesis contains narrative that may be of interest to interdisciplinary researchers, Chapter 2 serves as an introduction to the most common terms and concepts used in the Machine Learning (ML) community and provides a general overview of machine learning theory, with pointers to the relevant literature and articles. Chapter 3 serves as a more conventional State of the Art (SOTA) overview, introducing the basics of Decentralised Machine Learning (DML) and setting the notation for Chapters 5, 6 and 7.| File | Dimensione | Formato | |
|---|---|---|---|
|
Maciej_Krzysztof_Zuziak_PhD_Thesis_PDFA.pdf
accesso aperto
Licenza:
Creative Commons
Dimensione
86.1 MB
Formato
Adobe PDF
|
86.1 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/356238
URN:NBN:IT:UNIPI-356238