Deterministic approximations of analytically intractable posterior distributions are a common tool in Bayesian analysis. However, accurate extensions of these methods to situations in which data stream in rapidly and sequentially are still under-explored. In this thesis, we cover this gap by deriving a general and provably-accurate skew-symmetric approximation of a target posterior whose parameters can be evaluated via novel window-type estimators that make computations effectively online. This is accomplished via a specific treatment of third-order Taylor expansions around an online estimate of the maximum-a-posteriori. Such a perspective enhances scalability, while ensuring accuracy improvements, both in theory and in practice, relative to the Laplace approximation. Following a comprehensive theoretical discussion on the conditions under which improved convergence rates to the target posterior can be achieved, we apply this new methodology to the example of bandit problems. Our focus is on generalized linear bandits, a variant of contextual reinforcement learning where a transformation of the expected reward is linearly predicted by a feature vector. Bayesian solutions to the reward maximization problem in structured bandits often face severe computational challenges when updating, and eventually sampling from, the posterior distributions in the online context. We suggest a hybrid approach to Thompson sampling, leveraging a recent closed-form posterior result combined with the precise skew-symmetric approximation as an alternative to existing approaches.

Sequential skew-symmetric posterior approximations

DOLMETA, PATRIC
2025

Abstract

Deterministic approximations of analytically intractable posterior distributions are a common tool in Bayesian analysis. However, accurate extensions of these methods to situations in which data stream in rapidly and sequentially are still under-explored. In this thesis, we cover this gap by deriving a general and provably-accurate skew-symmetric approximation of a target posterior whose parameters can be evaluated via novel window-type estimators that make computations effectively online. This is accomplished via a specific treatment of third-order Taylor expansions around an online estimate of the maximum-a-posteriori. Such a perspective enhances scalability, while ensuring accuracy improvements, both in theory and in practice, relative to the Laplace approximation. Following a comprehensive theoretical discussion on the conditions under which improved convergence rates to the target posterior can be achieved, we apply this new methodology to the example of bandit problems. Our focus is on generalized linear bandits, a variant of contextual reinforcement learning where a transformation of the expected reward is linearly predicted by a feature vector. Bayesian solutions to the reward maximization problem in structured bandits often face severe computational challenges when updating, and eventually sampling from, the posterior distributions in the online context. We suggest a hybrid approach to Thompson sampling, leveraging a recent closed-form posterior result combined with the precise skew-symmetric approximation as an alternative to existing approaches.
31-gen-2025
Inglese
DURANTE, DANIELE
PAPASPILIOPOULOS, OMIROS
Università Bocconi
File in questo prodotto:
File Dimensione Formato  
Thesis_Dolmeta_Patric.pdf

accesso aperto

Dimensione 2.82 MB
Formato Adobe PDF
2.82 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/190589
Il codice NBN di questa tesi è URN:NBN:IT:UNIBOCCONI-190589