This thesis explores existing and novel methods for extracting knowledge from datawhile preserving users' private information through differentially private machine learning. The central challenge addressed here is handling the sensitivity-utility tradeoff that arises when privatizing queries involving vector averages, which are found everywhere in gradient-based optimization and data science in general. New approaches are thus proposed to provide researchers and practitioners with additional tools to prioritize the use of one strategy over the other,depending on the specific learning context, the privacy expectations, and the accuracy of the resulting model. First, metric privacy concepts are applied to collaborative model training, providing distance-dependent privacy guarantees without pre-defining sensitivity. An online optimization method is then introduced for tuning the clipping threshold concurrently with model training, reducing privacy exposure and computational requirements while improving utility. Efficient strategies for empirically verifying privacy results in the training of large language models are also developed, encouraging practical privacy auditing.Finally, a new perspective is offered on the definition of differential privacy,suggesting that sensitivity with respect to record replacement rather thanaddition/removal could yield increased utility in federated learning settings.Through theoretical analyses, algorithms, and experimental evaluations, this work presents ideas and actual techniques for optimizing the privacy-utility tradeoff inherent in differentially private machine learning.

Sense and Sensitivity: Data Utility and User Privacy in Differentially Private Machine Learning

GALLI, Filippo
2024

Abstract

This thesis explores existing and novel methods for extracting knowledge from datawhile preserving users' private information through differentially private machine learning. The central challenge addressed here is handling the sensitivity-utility tradeoff that arises when privatizing queries involving vector averages, which are found everywhere in gradient-based optimization and data science in general. New approaches are thus proposed to provide researchers and practitioners with additional tools to prioritize the use of one strategy over the other,depending on the specific learning context, the privacy expectations, and the accuracy of the resulting model. First, metric privacy concepts are applied to collaborative model training, providing distance-dependent privacy guarantees without pre-defining sensitivity. An online optimization method is then introduced for tuning the clipping threshold concurrently with model training, reducing privacy exposure and computational requirements while improving utility. Efficient strategies for empirically verifying privacy results in the training of large language models are also developed, encouraging practical privacy auditing.Finally, a new perspective is offered on the definition of differential privacy,suggesting that sensitivity with respect to record replacement rather thanaddition/removal could yield increased utility in federated learning settings.Through theoretical analyses, algorithms, and experimental evaluations, this work presents ideas and actual techniques for optimizing the privacy-utility tradeoff inherent in differentially private machine learning.
16-ott-2024
Inglese
Scuola Normale Superiore
Esperti anonimi
File in questo prodotto:
File Dimensione Formato  
Tesi.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 5.02 MB
Formato Adobe PDF
5.02 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/305919
Il codice NBN di questa tesi è URN:NBN:IT:SNS-305919