Differentially Private Methods for Data Release, Similarity Search and Quantile Estimation

Boninsegna, Fabrizio

This thesis studies differential privacy in three areas: data release, similarity search, and quantile estimation. Differential privacy provides a rigorous mathematical framework for quantifying the privacy loss incurred by mechanisms designed to release sensitive query outputs. Two main frameworks are examined: central and local differential privacy. In the central model, a trusted curator collects sensitive data and subsequently releases query outputs that satisfy differential privacy. In the local model, by contrast, the curator is untrusted, and each user applies a differentially private mechanism locally before transmitting their data. The main contributions of this thesis are: (i) We introduce a novel central differentially private algorithm for the release of aggregated origin/destination trips, drawing inspiration from the Disclosure Avoidance Systems employed by the United States Census Bureau for the 2020 release. The main challenge in publishing such data lies in preserving utility at higher levels of aggregation, for example when trips between cities are aggregated to trips between regions. We give theoretical error guarantees of our algorithm and experimentally demonstrate its advantages over the state of the art on both real and synthetic datasets, achieving faster running time and reducing false positive detections. (ii) We introduce a new central differentially private data structure for the approximate nearest neighbor counting problem that achieves state of the art guarantees while relying on a simpler analysis. The design of the data structure can be tuned to accommodate different differentially private mechanisms, yielding improved worst-case preprocessing time and space requirements under pure differential privacy. Moreover, we establish the existence of a trade-off that enables higher utility at the expense of increased query time, preprocessing time, and space requirements. (iii) We provide a novel analysis of a mechanism that applies local differential privacy to locality sensitive hashing, extending existing work that has so far been limited to binary hashes. We demonstrate that any locality sensitive hashing family can be used to increase the privacy guarantee among close points given by the generalized randomized response mechanism, a particular locally differentially private mechanism. Furthermore, the study provides guidance on the allocation of the privacy budget when such randomized hashes are applied in similarity search tasks. (iv) We derive upper and lower bounds on the sample complexity of estimating a quantile in the local differential privacy framework. Our results highlight a clear separation between adaptive and non-adaptive protocols, showing the superiority of the adaptive approach. In particular, we present a protocol based on noisy binary search that is optimal up to a constant factor. Moreover, an upper bound is derived for the sample complexity in the adaptive shuffle differential privacy framework, illustrating the combined benefits of adaptivity and privacy amplification through shuffling.

Differentially Private Methods for Data Release, Similarity Search and Quantile Estimation

BONINSEGNA, FABRIZIO

2026

Abstract

This thesis studies differential privacy in three areas: data release, similarity search, and quantile estimation. Differential privacy provides a rigorous mathematical framework for quantifying the privacy loss incurred by mechanisms designed to release sensitive query outputs. Two main frameworks are examined: central and local differential privacy. In the central model, a trusted curator collects sensitive data and subsequently releases query outputs that satisfy differential privacy. In the local model, by contrast, the curator is untrusted, and each user applies a differentially private mechanism locally before transmitting their data. The main contributions of this thesis are: (i) We introduce a novel central differentially private algorithm for the release of aggregated origin/destination trips, drawing inspiration from the Disclosure Avoidance Systems employed by the United States Census Bureau for the 2020 release. The main challenge in publishing such data lies in preserving utility at higher levels of aggregation, for example when trips between cities are aggregated to trips between regions. We give theoretical error guarantees of our algorithm and experimentally demonstrate its advantages over the state of the art on both real and synthetic datasets, achieving faster running time and reducing false positive detections. (ii) We introduce a new central differentially private data structure for the approximate nearest neighbor counting problem that achieves state of the art guarantees while relying on a simpler analysis. The design of the data structure can be tuned to accommodate different differentially private mechanisms, yielding improved worst-case preprocessing time and space requirements under pure differential privacy. Moreover, we establish the existence of a trade-off that enables higher utility at the expense of increased query time, preprocessing time, and space requirements. (iii) We provide a novel analysis of a mechanism that applies local differential privacy to locality sensitive hashing, extending existing work that has so far been limited to binary hashes. We demonstrate that any locality sensitive hashing family can be used to increase the privacy guarantee among close points given by the generalized randomized response mechanism, a particular locally differentially private mechanism. Furthermore, the study provides guidance on the allocation of the privacy budget when such randomized hashes are applied in similarity search tasks. (iv) We derive upper and lower bounds on the sample complexity of estimating a quantile in the local differential privacy framework. Our results highlight a clear separation between adaptive and non-adaptive protocols, showing the superiority of the adaptive approach. In particular, we present a protocol based on noisy binary search that is optimal up to a constant factor. Moreover, an upper bound is derived for the sample complexity in the adaptive shuffle differential privacy framework, illustrating the combined benefits of adaptivity and privacy amplification through shuffling.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				INGEGNERIA DELL'INFORMAZIONE
			
	Data di pubblicazione
	
				12-feb-2026
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				SILVESTRI, FRANCESCO
			
	Nome Editore
	
				Università degli studi di Padova
			
	Collezione di appartenenza
	
				Università degli Studi di Padova

File in questo prodotto:

File	Dimensione	Formato
thesis_Fabrizio_Boninsegna.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 6.95 MB Formato Adobe PDF Visualizza/Apri	6.95 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/359538

Il codice NBN di questa tesi è URN:NBN:IT:UNIPD-359538