Adaptive and scalable algorithms for little, uncertain and strategic data

Russo, Matteo

In real‑world settings, scalable, adaptive learning faces three core challenges inherent to massive datasets and continually evolving, strategic environments. First, even the seemingly simple task of computing a high‑dimensional mean by scanning every point incurs a prohibitive cost. Moreover, tasks like $k$-median clustering require aggressive compression: we need small, weighted ``coresets'' that preserve solution quality while shrinking datasets so that downstream algorithms run in milliseconds instead of hours. Second, online learning and online algorithms often unfold in environments that are neither fully adversarial nor purely stochastic; we often have only scarce glimpses of the future or encounter inputs in random order. Leveraging these modest hints or mild input regularity allows to significantly outperform worst‑case guarantees. Third, when information is provided strategically or stochastically, we need to design mechanisms that trade off payments and costly inspections to obtain robust, implementable incentives.\\ This thesis tackles all three challenges by designing sublinear‑time algorithms for $(1+\varepsilon)$‑approximate mean estimation that inspect only a tiny fraction of the data, by developing coreset construction methods that distill large point sets into provably accurate sketches. It also rethinks online learning and online algorithms to leverage a sublinear budget of targeted probes and to accommodate random‑order streams, showing that even sparse hints or shuffled inputs can sharply reduce classical regret or competitive ratio bounds. Finally, it extends these ideas to the stochastic and strategic domains, where information is no longer merely scarce but shaped by randomness and incentives. We devise algorithms that are robust to both. \paragraph{Part I: Learning Clustering Primitives from Limited Access.} We first address the computational and statistical limits of extracting core primitives from large point sets. In \Cref{chapter:sublinear-mean}, we develop \textit{simple and optimal sublinear} algorithms for mean estimation in $\mathbb{R}^d$. By carefully combining techniques to extract coordinate‑wise and geometric medians, we obtain $(1+\varepsilon)$-approximations to the true mean using only $O(\varepsilon^{-1}\log\delta^{-1})$ samples and matching runtime guarantees. A novel gradient‑descent scheme for the geometric median further accelerates convergence in high dimensions. \Cref{chapter:coresets} turns to \textit{coreset construction} for $k$-clustering. Leveraging a tight VC‑dimension analysis, we design nearly linear‑size summaries that preserve clustering costs up to $(1\pm\varepsilon)$. Our bounds improve prior results both in planar shortest‑path metrics and for curve clustering under the Fréchet distance, achieving $\tilde O(k\varepsilon^{-2})$ and $\tilde O(kd\ell\varepsilon^{-2}\log m)$ sizes, respectively. \paragraph{Part II: Online Learning and Algorithms with Probes and in Random Order.} In the second part, we study online learning augmented by probes, as well as online learning and online algorithms in the random order input model. In probe-augmented settings, in \Cref{chapter:probe-online-learning}, we show that a sublinear budget $k$ of $N$-sized or pairwise probes provably reduces adversarial regret: in full-feedback settings one attains $R_T=O\big(\min\{\sqrt{T\log N},\nicefrac{T\log N}{k}\}\big)$, with tight lower bounds and robustness to bounded probe noise. For random-order inputs in online learning, we introduce in \Cref{chapter:ro-online-learning} a general \textsc{Simulate} reduction that converts many stochastic learners into near-equivalent random-order learners with only polylogarithmic overhead. This yields stochastic-quality guarantees for delay, constrained, switching-cost online learning problems, and reveals that learnability in random order hinges on VC dimension rather than Littlestone dimension, underscoring a nuanced separation from adversarial environments. Under the random order input assumption, we are also able to improve and simplify competitive ratio guarantees for several covering and facility location problems, in \Cref{chapter:random-order-set-cover}. In particular, this allows transforming additive/multiplicative OCO-type regret bounds into multiplicative competitive-ratio guarantees; as an application we obtain a unified $O(\log mn)$-competitive template for a range of random-order covering and facility-location problems by feeding unbiased stochastic gradients into a low-regret OCO black box. \paragraph{Part III: Decision Making and Incentive Design under Strategic Information.} In the last part of this thesis, we investigate the setting where the information fed to the algorithm is stochastic or strategic. For online selection (prophet) problems, in \Cref{chapter:eor-prophets}, we propose and analyze the \emph{Expectation of Ratio} ($\EoR = \E{}{\tfrac{\ALG}{\OPT}}$) as an ex-post-aware benchmark. We prove that for every downward-closed family the worst-case $\EoR$ and the classical \emph{Ratio of Expectations} ($\RoE = \tfrac{\E{}{\ALG}}{\E{}{\OPT}}$) differ by at most constant factors, give a constructive instancewise reduction from $\EoR$ to $\RoE$ (up to constants), and exhibit matching impossibility results for the converse. The proofs combine a tail–core split with a self-bounding/concentration analysis of the offline optimum. In contract design, \Cref{chapter:contracts-with-inspections}, we relax hidden actions by allowing costly inspections and study the computational frontier of optimal inspection schemes. In the fully-observable model we show (i) polynomial-time algorithms for optimal deterministic schemes for any monotone inspection cost, (ii) an efficient algorithm for the optimal randomized scheme when inspection costs are submodular (via marginal-to-distribution compression, intervalization, and low-dimensional optimization), and (iii) strong information-theoretic hardness for XOS and subadditive costs (exponential query complexity for exact solutions and even slight approximations). In the partially-observable model we identify a convexity property that yields efficient additive approximations under submodularity and exhibit instances where randomized schemes outperform deterministic ones by $\Theta(n)$.\\ These results lay a rigorous foundation for designing algorithms that are scalable or highly adaptive when data is scarce, uncertain, strategic, or accessible via a bounded number of external additional probes.

Adaptive and scalable algorithms for little, uncertain and strategic data

RUSSO, MATTEO

2026

Abstract

In real‑world settings, scalable, adaptive learning faces three core challenges inherent to massive datasets and continually evolving, strategic environments. First, even the seemingly simple task of computing a high‑dimensional mean by scanning every point incurs a prohibitive cost. Moreover, tasks like $k$-median clustering require aggressive compression: we need small, weighted ``coresets'' that preserve solution quality while shrinking datasets so that downstream algorithms run in milliseconds instead of hours. Second, online learning and online algorithms often unfold in environments that are neither fully adversarial nor purely stochastic; we often have only scarce glimpses of the future or encounter inputs in random order. Leveraging these modest hints or mild input regularity allows to significantly outperform worst‑case guarantees. Third, when information is provided strategically or stochastically, we need to design mechanisms that trade off payments and costly inspections to obtain robust, implementable incentives.\\ This thesis tackles all three challenges by designing sublinear‑time algorithms for $(1+\varepsilon)$‑approximate mean estimation that inspect only a tiny fraction of the data, by developing coreset construction methods that distill large point sets into provably accurate sketches. It also rethinks online learning and online algorithms to leverage a sublinear budget of targeted probes and to accommodate random‑order streams, showing that even sparse hints or shuffled inputs can sharply reduce classical regret or competitive ratio bounds. Finally, it extends these ideas to the stochastic and strategic domains, where information is no longer merely scarce but shaped by randomness and incentives. We devise algorithms that are robust to both. \paragraph{Part I: Learning Clustering Primitives from Limited Access.} We first address the computational and statistical limits of extracting core primitives from large point sets. In \Cref{chapter:sublinear-mean}, we develop \textit{simple and optimal sublinear} algorithms for mean estimation in $\mathbb{R}^d$. By carefully combining techniques to extract coordinate‑wise and geometric medians, we obtain $(1+\varepsilon)$-approximations to the true mean using only $O(\varepsilon^{-1}\log\delta^{-1})$ samples and matching runtime guarantees. A novel gradient‑descent scheme for the geometric median further accelerates convergence in high dimensions. \Cref{chapter:coresets} turns to \textit{coreset construction} for $k$-clustering. Leveraging a tight VC‑dimension analysis, we design nearly linear‑size summaries that preserve clustering costs up to $(1\pm\varepsilon)$. Our bounds improve prior results both in planar shortest‑path metrics and for curve clustering under the Fréchet distance, achieving $\tilde O(k\varepsilon^{-2})$ and $\tilde O(kd\ell\varepsilon^{-2}\log m)$ sizes, respectively. \paragraph{Part II: Online Learning and Algorithms with Probes and in Random Order.} In the second part, we study online learning augmented by probes, as well as online learning and online algorithms in the random order input model. In probe-augmented settings, in \Cref{chapter:probe-online-learning}, we show that a sublinear budget $k$ of $N$-sized or pairwise probes provably reduces adversarial regret: in full-feedback settings one attains $R_T=O\big(\min\{\sqrt{T\log N},\nicefrac{T\log N}{k}\}\big)$, with tight lower bounds and robustness to bounded probe noise. For random-order inputs in online learning, we introduce in \Cref{chapter:ro-online-learning} a general \textsc{Simulate} reduction that converts many stochastic learners into near-equivalent random-order learners with only polylogarithmic overhead. This yields stochastic-quality guarantees for delay, constrained, switching-cost online learning problems, and reveals that learnability in random order hinges on VC dimension rather than Littlestone dimension, underscoring a nuanced separation from adversarial environments. Under the random order input assumption, we are also able to improve and simplify competitive ratio guarantees for several covering and facility location problems, in \Cref{chapter:random-order-set-cover}. In particular, this allows transforming additive/multiplicative OCO-type regret bounds into multiplicative competitive-ratio guarantees; as an application we obtain a unified $O(\log mn)$-competitive template for a range of random-order covering and facility-location problems by feeding unbiased stochastic gradients into a low-regret OCO black box. \paragraph{Part III: Decision Making and Incentive Design under Strategic Information.} In the last part of this thesis, we investigate the setting where the information fed to the algorithm is stochastic or strategic. For online selection (prophet) problems, in \Cref{chapter:eor-prophets}, we propose and analyze the \emph{Expectation of Ratio} ($\EoR = \E{}{\tfrac{\ALG}{\OPT}}$) as an ex-post-aware benchmark. We prove that for every downward-closed family the worst-case $\EoR$ and the classical \emph{Ratio of Expectations} ($\RoE = \tfrac{\E{}{\ALG}}{\E{}{\OPT}}$) differ by at most constant factors, give a constructive instancewise reduction from $\EoR$ to $\RoE$ (up to constants), and exhibit matching impossibility results for the converse. The proofs combine a tail–core split with a self-bounding/concentration analysis of the offline optimum. In contract design, \Cref{chapter:contracts-with-inspections}, we relax hidden actions by allowing costly inspections and study the computational frontier of optimal inspection schemes. In the fully-observable model we show (i) polynomial-time algorithms for optimal deterministic schemes for any monotone inspection cost, (ii) an efficient algorithm for the optimal randomized scheme when inspection costs are submodular (via marginal-to-distribution compression, intervalization, and low-dimensional optimization), and (iii) strong information-theoretic hardness for XOS and subadditive costs (exponential query complexity for exact solutions and even slight approximations). In the partially-observable model we identify a convexity property that yields efficient additive approximations under submodularity and exhibit instances where randomized schemes outperform deterministic ones by $\Theta(n)$.\\ These results lay a rigorous foundation for designing algorithms that are scalable or highly adaptive when data is scarce, uncertain, strategic, or accessible via a bounded number of external additional probes.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI INGEGNERIA INFORMATICA, AUTOMATICA E GESTIONALE -ANTONIO RUBERTI-
			
	Corso di studio
	
				Data Science
			
	Data di pubblicazione
	
				28-gen-2026
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				LEONARDI, Stefano
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				SILVESTRI, FABRIZIO
			
	Nome Editore
	
				Università degli Studi di Roma "La Sapienza"
			
	Collezione di appartenenza
	
				Università degli Studi di Roma La Sapienza

File in questo prodotto:

File	Dimensione	Formato
Tesi_dottorato_Russo.pdf accesso aperto Licenza: Creative Commons Dimensione 2.67 MB Formato Adobe PDF Visualizza/Apri	2.67 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/358412

Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-358412