Contributions to modelling via Bayesian nonparametric mixtures

Corradin, Riccardo

Bayesian nonparametric mixtures are flexible models for density estimation and clustering, nowadays a standard tool in the toolbox of applied statisticians. The first proposal of such models was the Dirichlet process (DP) (Ferguson, 1973) mixture of Gaussian kernels by Lo (1984), contribution which paved the way to the definition of a wide variety of nonparametric mixture models. In recent years, increasing interest has been dedicated to the definition of mixture models based on nonparametric mixing measures that go beyond the DP. Among these measures, the Pitman-Yor process (PY) (Perman et al., 1992; Pitman, 1995) and, more in general, the class of Gibbs-type priors (see e.g. De Blasi et al., 2015) stand out for conveniently combining mathematical tractability, interpretability and modelling flexibility. In this thesis we investigate three aspects of nonparametric mixture models, which, in turn, concern their modelling, computational and distributional properties. The thesis is organized as follows. The first chapter proposes a coincise review of the area of Bayesian nonparametric statistics, with focus on tools and models that will be considered in the following chapters. We first introduce the notions of exchangeability, exchangeable partitions and discrete random probability measures. We then focus on the DP and the PY case, main ingredients of second and third chapter, respectively. Finally, we briefly discuss the rationale behind the definition of more general classes of discrete nonparametric priors. In the second chapter we propose a thorough study on the effect of invertible affine transformations of the data on the posterior distribution of DP mixture models, with particular attention to DP mixtures of Gaussian kernels (DPM-G). First, we provide an explicit result relating model parameters and transformations of the data. Second, we formalize the notion of asymptotic robustness of a model under affine transformations of the data and prove an asymptotic result which, by relying on the asymptotic consistency of DPM-G models, show that, under mild assumptions on the data-generating distribution, DPM-G are asymptotically robust. The third chapter presents the ICS, a novel conditional sampling scheme for PY mixture models, based on a useful representation of the posterior distribution of a PY (Pitman, 1996) and on an importance sampling idea, similar in spirit to the augmentation step of the celebrated Algorithm 8 of Neal (2000). The proposed method conveniently combines the best features of state-of-the-art conditional and marginal methods for PY mixture models. Importantly, and unlike its most popular conditional competitors, the numerical efficiency of the ICS is robust to the specification of the parameters of the PY. The steps for implementing the ICS are described in detail and its performance is compared with that one of popular competing algorithms. Finally, the ICS is used as a building block for devising a new efficient algorithm for the class of GM-dependent DP mixture models (Lijoi et al., 2014a; Lijoi et al., 2014b), for partially exchangeable data. In the fourth chapter we study some distributional properties Gibbs-type priors. The main result focuses on an exchangeable sample from a Gibbs-type prior and provides a conveniently simple description of the distribution of the size of the cluster the ( m + 1 ) th observation is assigned to, given an unobserved sample of size m. The study of such distribution provides the tools for a simple, yet useful, strategy for prior elicitation of the parameters of a Gibbs-type prior, in the context of Gibbs-type mixture models. The results in the last three chapters are supported by exhaustive simulation studies and illustrated by analysing astronomical datasets.

I modelli mistura in ambito Bayesiano nonparametrico sono modelli flessibili per stime di densità e clustering, ormai uno strumento di uso comune in ambito statistico applicato. Il primo modello introdotto in questo ambito è stato il processo di Dirichlet (DP) (Ferguson, 1973) combinato con un kernel Gaussiano(Lo, 1984). Recentemente è cresciuto l’interesse verso la definizione di modelli mistura basati su misure nonparametriche che generalizzano il DP. Tra le misure proposte, il processo di Pitman-Yor (PY) (Perman et al., 1992; Pitman, 1995) e, più in generale, la classe di Gibbs-type prior (see e.g. De Blasi et al., 2015) rappresentano generalizzazioni convenienti in grado di combinare trattabilità matematica, interpretabilità e flessibilità. In questa tesi investighiamo tre aspetti dei modelli mistura nonparametrici, in ordine, proprietà dei modelli, aspetti computazionali e proprietà distributive. La tesi è organizzata come segue. Il primo capitolo propone una revisione coincisa della statistica Bayesiana nonparametrica, con particolare attenzione a strumenti e modelli utili nei capitoli successivi. Introduciamo le nozioni di scambiabilità, partizioni scambiabili e random probability measure. Discutiamo quindi alcuni casi particolari, i processi DP e PY, ingredienti principali rispettivamente nel secondo e nel terzo capitolo. Infine discutiamo brevemente la logica dietro la definizione di classi più generali di priors nonparametriche discrete. Nel secondo capitolo proponiamo uno studio dell’effetto di trasformazioni affini invertibili dei dati sulla distribuzione a posteriori di modelli mistura DP, con particolare attenzione ai modelli con kernel Gaussiano (DPM-G). Introduciamo un risultato riguardante la specificazione dei parametri di un modello in relazione a trasformazioni dei dati. Successivamente formalizziamo la nozione di robustezza asintotica di un modello nel caso di trasformazioni affini dei dati e dimostriamo un risultato asintotico che, basandosi sulla consistenza asintotica di modelli DPM-G, mostra che, sotto alcune assunzioni sulla distribuzione che ha generato i dati, i modelli DPM-G sono asintoticamente robusti. Nel terzo capitolo presentiamo l’Importance Conditional Sampler (ICS), un nuovo schema di campionamento condizionale per modelli mistura PY, basato su una rappresentazione della distribuzione a posteriori di un processo PY (Pitman, 1996) e sull’idea di importance sampling, ispirandosi al passo augmentation del noto Algoritmo 8 di Neal (2000). Il metodo proposto combina convenientemente le migliori caratteristiche dei metodi esistenti, condizionali e marginali, per modelli mistura PY. A differenza di altri algoritmi condizionali, l’efficienza numerica dell’ICS è robusta rispetto alla specificazione dei parametri del PY. Gli step per implementare l’ICS sono descritti in dettaglio e le performance sono comparate con gli algoritmi più popolari. Infine l’ICS viene usato per definire un nuovo algoritmo efficiente per la classe di modelli mistura GM-dipendenti DP (Lijoi et al., 2014a; Lijoi et al., 2014b), per dati parzialmente scambiabili. Nel quarto capitolo studiamo alcune proprietà delle Gibbs-type priors. Il risultato principale riguarda un campione scambiabile estratto da una Gibbs-type prior e propone una rappresentazione conveniente della distribuzione della dimensione del cluster per l’osservazione (m+1)esima, dato un campione non osservato di ampiezza m. Dallo studio di questa distribuzione deriviamo una strategia, semplice ed utile, per elicitare i parametri di una Gibbs-type prior, nel contesto dei modelli mistura con una misura misturante Gibbs-type. I risultati negli ultimi tre capitoli sono supportati da esaustivi studi di simulazioni ed illustrazioni in ambito atronomico.