Clustering is a fundamental problem in statistics, aiming to partition units into homogeneous groups without relying on labeled data. Although a single partition is sufficient for most applications, many cases require more complex structures that involve multiple partitions. This thesis introduces two novel Bayesian model-based clustering methods that, respectively, take into account time and covariate information to build multiple partitions and combine them to obtain a richer partition of the statistical units. The first contribution is a novel dependent Random Partition Model,, which induces sequences of random partitions exhibiting semi-Markovian dependence. The model relaxes the strong Markovian assumptions of existing dependent Random Partition Models by introducing a concept of persistency, which enables more flexible dependence structures across partitions. The second contribution focuses on the problem of multivariate density estimation in presence of categorical covariates, with particular interest on clustering of the conditional marginal densities. This is achieved through tensor factorizations based on multiple partitions. Two formulations are proposed, considering one and two layers of partitions. The first layer consists of partitions over the covariate levels which provide a straightforward way to aggregate levels with similar effect on the response, and whose product induces a partition over the joint covariate space. The second layer refines the partition induced by the first layer, lifting the limitations imposed by its construction as product of the partitions over the marginal covariate spaces. Algorithms for posterior inference are discussed, and the performance of both approaches is illustrated through applications to simulated and real data.

Bayesian nonparametric multiple random partition models

TOTO, GIOVANNI
2026

Abstract

Clustering is a fundamental problem in statistics, aiming to partition units into homogeneous groups without relying on labeled data. Although a single partition is sufficient for most applications, many cases require more complex structures that involve multiple partitions. This thesis introduces two novel Bayesian model-based clustering methods that, respectively, take into account time and covariate information to build multiple partitions and combine them to obtain a richer partition of the statistical units. The first contribution is a novel dependent Random Partition Model,, which induces sequences of random partitions exhibiting semi-Markovian dependence. The model relaxes the strong Markovian assumptions of existing dependent Random Partition Models by introducing a concept of persistency, which enables more flexible dependence structures across partitions. The second contribution focuses on the problem of multivariate density estimation in presence of categorical covariates, with particular interest on clustering of the conditional marginal densities. This is achieved through tensor factorizations based on multiple partitions. Two formulations are proposed, considering one and two layers of partitions. The first layer consists of partitions over the covariate levels which provide a straightforward way to aggregate levels with similar effect on the response, and whose product induces a partition over the joint covariate space. The second layer refines the partition induced by the first layer, lifting the limitations imposed by its construction as product of the partitions over the marginal covariate spaces. Algorithms for posterior inference are discussed, and the performance of both approaches is illustrated through applications to simulated and real data.
21-gen-2026
Inglese
CANALE, ANTONIO
Università degli studi di Padova
File in questo prodotto:
File Dimensione Formato  
relazione_finale_Giovanni_Toto.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 2.08 MB
Formato Adobe PDF
2.08 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/357157
Il codice NBN di questa tesi è URN:NBN:IT:UNIPD-357157