Deterministic approximations of analytically intractable posterior distributions are a common tool in Bayesian analysis. However, accurate extensions of these methods to situations in which data stream in rapidly and sequentially are still under-explored. In this thesis, we cover this gap by deriving a general and provably-accurate skew-symmetric approximation of a target posterior whose parameters can be evaluated via novel window-type estimators that make computations effectively online. This is accomplished via a specific treatment of third-order Taylor expansions around an online estimate of the maximum-a-posteriori. Such a perspective enhances scalability, while ensuring accuracy improvements, both in theory and in practice, relative to the Laplace approximation. Following a comprehensive theoretical discussion on the conditions under which improved convergence rates to the target posterior can be achieved, we apply this new methodology to the example of bandit problems. Our focus is on generalized linear bandits, a variant of contextual reinforcement learning where a transformation of the expected reward is linearly predicted by a feature vector. Bayesian solutions to the reward maximization problem in structured bandits often face severe computational challenges when updating, and eventually sampling from, the posterior distributions in the online context. We suggest a hybrid approach to Thompson sampling, leveraging a recent closed-form posterior result combined with the precise skew-symmetric approximation as an alternative to existing approaches.
Sequential skew-symmetric posterior approximations
DOLMETA, PATRIC
2025
Abstract
Deterministic approximations of analytically intractable posterior distributions are a common tool in Bayesian analysis. However, accurate extensions of these methods to situations in which data stream in rapidly and sequentially are still under-explored. In this thesis, we cover this gap by deriving a general and provably-accurate skew-symmetric approximation of a target posterior whose parameters can be evaluated via novel window-type estimators that make computations effectively online. This is accomplished via a specific treatment of third-order Taylor expansions around an online estimate of the maximum-a-posteriori. Such a perspective enhances scalability, while ensuring accuracy improvements, both in theory and in practice, relative to the Laplace approximation. Following a comprehensive theoretical discussion on the conditions under which improved convergence rates to the target posterior can be achieved, we apply this new methodology to the example of bandit problems. Our focus is on generalized linear bandits, a variant of contextual reinforcement learning where a transformation of the expected reward is linearly predicted by a feature vector. Bayesian solutions to the reward maximization problem in structured bandits often face severe computational challenges when updating, and eventually sampling from, the posterior distributions in the online context. We suggest a hybrid approach to Thompson sampling, leveraging a recent closed-form posterior result combined with the precise skew-symmetric approximation as an alternative to existing approaches.File | Dimensione | Formato | |
---|---|---|---|
Thesis_Dolmeta_Patric.pdf
accesso aperto
Dimensione
2.82 MB
Formato
Adobe PDF
|
2.82 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/190589
URN:NBN:IT:UNIBOCCONI-190589