This thesis studies differential privacy in three areas: data release, similarity search, and quantile estimation. Differential privacy provides a rigorous mathematical framework for quantifying the privacy loss incurred by mechanisms designed to release sensitive query outputs. Two main frameworks are examined: central and local differential privacy. In the central model, a trusted curator collects sensitive data and subsequently releases query outputs that satisfy differential privacy. In the local model, by contrast, the curator is untrusted, and each user applies a differentially private mechanism locally before transmitting their data. The main contributions of this thesis are: (i) We introduce a novel central differentially private algorithm for the release of aggregated origin/destination trips, drawing inspiration from the Disclosure Avoidance Systems employed by the United States Census Bureau for the 2020 release. The main challenge in publishing such data lies in preserving utility at higher levels of aggregation, for example when trips between cities are aggregated to trips between regions. We give theoretical error guarantees of our algorithm and experimentally demonstrate its advantages over the state of the art on both real and synthetic datasets, achieving faster running time and reducing false positive detections. (ii) We introduce a new central differentially private data structure for the approximate nearest neighbor counting problem that achieves state of the art guarantees while relying on a simpler analysis. The design of the data structure can be tuned to accommodate different differentially private mechanisms, yielding improved worst-case preprocessing time and space requirements under pure differential privacy. Moreover, we establish the existence of a trade-off that enables higher utility at the expense of increased query time, preprocessing time, and space requirements. (iii) We provide a novel analysis of a mechanism that applies local differential privacy to locality sensitive hashing, extending existing work that has so far been limited to binary hashes. We demonstrate that any locality sensitive hashing family can be used to increase the privacy guarantee among close points given by the generalized randomized response mechanism, a particular locally differentially private mechanism. Furthermore, the study provides guidance on the allocation of the privacy budget when such randomized hashes are applied in similarity search tasks. (iv) We derive upper and lower bounds on the sample complexity of estimating a quantile in the local differential privacy framework. Our results highlight a clear separation between adaptive and non-adaptive protocols, showing the superiority of the adaptive approach. In particular, we present a protocol based on noisy binary search that is optimal up to a constant factor. Moreover, an upper bound is derived for the sample complexity in the adaptive shuffle differential privacy framework, illustrating the combined benefits of adaptivity and privacy amplification through shuffling.
Differentially Private Methods for Data Release, Similarity Search and Quantile Estimation
BONINSEGNA, FABRIZIO
2026
Abstract
This thesis studies differential privacy in three areas: data release, similarity search, and quantile estimation. Differential privacy provides a rigorous mathematical framework for quantifying the privacy loss incurred by mechanisms designed to release sensitive query outputs. Two main frameworks are examined: central and local differential privacy. In the central model, a trusted curator collects sensitive data and subsequently releases query outputs that satisfy differential privacy. In the local model, by contrast, the curator is untrusted, and each user applies a differentially private mechanism locally before transmitting their data. The main contributions of this thesis are: (i) We introduce a novel central differentially private algorithm for the release of aggregated origin/destination trips, drawing inspiration from the Disclosure Avoidance Systems employed by the United States Census Bureau for the 2020 release. The main challenge in publishing such data lies in preserving utility at higher levels of aggregation, for example when trips between cities are aggregated to trips between regions. We give theoretical error guarantees of our algorithm and experimentally demonstrate its advantages over the state of the art on both real and synthetic datasets, achieving faster running time and reducing false positive detections. (ii) We introduce a new central differentially private data structure for the approximate nearest neighbor counting problem that achieves state of the art guarantees while relying on a simpler analysis. The design of the data structure can be tuned to accommodate different differentially private mechanisms, yielding improved worst-case preprocessing time and space requirements under pure differential privacy. Moreover, we establish the existence of a trade-off that enables higher utility at the expense of increased query time, preprocessing time, and space requirements. (iii) We provide a novel analysis of a mechanism that applies local differential privacy to locality sensitive hashing, extending existing work that has so far been limited to binary hashes. We demonstrate that any locality sensitive hashing family can be used to increase the privacy guarantee among close points given by the generalized randomized response mechanism, a particular locally differentially private mechanism. Furthermore, the study provides guidance on the allocation of the privacy budget when such randomized hashes are applied in similarity search tasks. (iv) We derive upper and lower bounds on the sample complexity of estimating a quantile in the local differential privacy framework. Our results highlight a clear separation between adaptive and non-adaptive protocols, showing the superiority of the adaptive approach. In particular, we present a protocol based on noisy binary search that is optimal up to a constant factor. Moreover, an upper bound is derived for the sample complexity in the adaptive shuffle differential privacy framework, illustrating the combined benefits of adaptivity and privacy amplification through shuffling.| File | Dimensione | Formato | |
|---|---|---|---|
|
thesis_Fabrizio_Boninsegna.pdf
accesso aperto
Licenza:
Tutti i diritti riservati
Dimensione
6.87 MB
Formato
Adobe PDF
|
6.87 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/359538
URN:NBN:IT:UNIPD-359538