Despite several appealing properties, Bayesian methods present limitations that can undermine their attractiveness, such as the assumption of the model being well-specified or the necessity to use Markov Chain Monte Carlo algorithms for posterior sampling. In addition, these drawbacks are exacerbated by the growing popularity of machine learning methods, which represent natural competitors of Bayesian methods in many applications. For these reasons, the generalization of the Bayesian paradigm that overcome these limitations has been a fertile area of study in recent years. In this thesis, we present two novel methods that extends the current generalizations of the hypothetical and predictive approaches for Bayesian inference. We start focusing on generalized linear models (GLMs), which are routinely used for modeling relationships between a response variable and covariates. The simple form of a GLM comes with easy interpretability, but also leads to concerns about model misspecification impacting inferential conclusions. A popular semi-parametric solution adopted in the frequentist literature is quasi-likelihood, which improves robustness by only requiring correct specification of the first two moments. We develop a robust approach to hypothetical Bayesian inference in GLMs through quasi-posterior distributions. We show that quasi-posteriors provide a coherent generalized Bayes inference method, while also approximating so-called coarsened posteriors. In so doing, we obtain new insights into the choice of coarsening parameter. Asymptotically, the quasi-posterior converges in total variation to a normal distribution and has important connections with the loss-likelihood bootstrap posterior. We demonstrate that it is also well-calibrated in terms of frequentist coverage. Moreover, the loss-scale parameter has a clear interpretation as a dispersion, and this leads to a consolidated method of moments estimator. Then, we consider the challenges in conducting Bayesian inferences on unknown discrete data distributions, with a particular focus on count data. Motivated by disadvantages of traditional mixture models in terms of model flexibility and/or effciency of posterior inference, we develop a novel Bayesian predictive approach. In particular, our Metropolis-Adjusted Dirichlet (MAD) sequence model characterizes the predictive measure as a mixture of a base measure and Metropolis-Hastings kernels centered on previous data points. The resulting mad sequence is asymptotically exchangeable and the posterior on the data generator takes the form of a martingale posterior. This structure leads to straightforward algorithms for inference on count distributions, with easy extensions to multivariate, regression and binary data cases. Moreover, we obtain a useful asymptotic Gaussian approximation for the implied posterior distribution.

Modellazione statistica nell'ambito del paradigma Bayesiano generalizzato

AGNOLETTO, DAVIDE
2025

Abstract

Despite several appealing properties, Bayesian methods present limitations that can undermine their attractiveness, such as the assumption of the model being well-specified or the necessity to use Markov Chain Monte Carlo algorithms for posterior sampling. In addition, these drawbacks are exacerbated by the growing popularity of machine learning methods, which represent natural competitors of Bayesian methods in many applications. For these reasons, the generalization of the Bayesian paradigm that overcome these limitations has been a fertile area of study in recent years. In this thesis, we present two novel methods that extends the current generalizations of the hypothetical and predictive approaches for Bayesian inference. We start focusing on generalized linear models (GLMs), which are routinely used for modeling relationships between a response variable and covariates. The simple form of a GLM comes with easy interpretability, but also leads to concerns about model misspecification impacting inferential conclusions. A popular semi-parametric solution adopted in the frequentist literature is quasi-likelihood, which improves robustness by only requiring correct specification of the first two moments. We develop a robust approach to hypothetical Bayesian inference in GLMs through quasi-posterior distributions. We show that quasi-posteriors provide a coherent generalized Bayes inference method, while also approximating so-called coarsened posteriors. In so doing, we obtain new insights into the choice of coarsening parameter. Asymptotically, the quasi-posterior converges in total variation to a normal distribution and has important connections with the loss-likelihood bootstrap posterior. We demonstrate that it is also well-calibrated in terms of frequentist coverage. Moreover, the loss-scale parameter has a clear interpretation as a dispersion, and this leads to a consolidated method of moments estimator. Then, we consider the challenges in conducting Bayesian inferences on unknown discrete data distributions, with a particular focus on count data. Motivated by disadvantages of traditional mixture models in terms of model flexibility and/or effciency of posterior inference, we develop a novel Bayesian predictive approach. In particular, our Metropolis-Adjusted Dirichlet (MAD) sequence model characterizes the predictive measure as a mixture of a base measure and Metropolis-Hastings kernels centered on previous data points. The resulting mad sequence is asymptotically exchangeable and the posterior on the data generator takes the form of a martingale posterior. This structure leads to straightforward algorithms for inference on count distributions, with easy extensions to multivariate, regression and binary data cases. Moreover, we obtain a useful asymptotic Gaussian approximation for the implied posterior distribution.
21-gen-2025
Inglese
SCARPA, BRUNO
Università degli studi di Padova
File in questo prodotto:
File Dimensione Formato  
Tesi_Agnoletto.pdf

accesso aperto

Dimensione 1.32 MB
Formato Adobe PDF
1.32 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/218141
Il codice NBN di questa tesi è URN:NBN:IT:UNIPD-218141