Recent advancements in species sampling methods and bioinformatic tools have led to the collection of increasingly complex species occurrence datasets, motivating the development of statistical tools to extract valuable insights and enhance our understanding of biodiversity. There is a rich literature in ecology on so-called joint species distribution models, which usually take the form of multivariate probit latent factor regression models. However, they present fundamental problems in handling high-dimensional species co-occurrence data with many rare species: (i) these models cannot deal with the fact that we regularly discover many new species as the sampling is being conducted; (ii) they do not provide specific models for array data but just flatten the data into a matrix, losing structural information. Motivated by ecology applications, this thesis introduces novel Bayesian methods to model multivariate binary data with a growing number of outcomes and multiway data. The thesis is organized into two main threads. The first develops a new class of dependent infinite latent feature models, proposing a general framework that bridges between multivariate probit models and the Indian buffet process, the most popular method in infinite latent feature models literature. The second framework addresses array data modeling, by introducing a Bayesian tensor decomposition model that adaptively selects the unknown rank of the decomposition through a suitable shrinkage prior. In both threads, the theoretical properties of the proposed methods are extensively studied, and efficient algorithms for posterior computation are discussed. The performance of the proposed approaches is assessed in simulation studies and complex ecological applications.

Metodi Bayesiani per strutture di dipendenza complesse con applicazione all'ecologia

STOLF, FEDERICA
2025

Abstract

Recent advancements in species sampling methods and bioinformatic tools have led to the collection of increasingly complex species occurrence datasets, motivating the development of statistical tools to extract valuable insights and enhance our understanding of biodiversity. There is a rich literature in ecology on so-called joint species distribution models, which usually take the form of multivariate probit latent factor regression models. However, they present fundamental problems in handling high-dimensional species co-occurrence data with many rare species: (i) these models cannot deal with the fact that we regularly discover many new species as the sampling is being conducted; (ii) they do not provide specific models for array data but just flatten the data into a matrix, losing structural information. Motivated by ecology applications, this thesis introduces novel Bayesian methods to model multivariate binary data with a growing number of outcomes and multiway data. The thesis is organized into two main threads. The first develops a new class of dependent infinite latent feature models, proposing a general framework that bridges between multivariate probit models and the Indian buffet process, the most popular method in infinite latent feature models literature. The second framework addresses array data modeling, by introducing a Bayesian tensor decomposition model that adaptively selects the unknown rank of the decomposition through a suitable shrinkage prior. In both threads, the theoretical properties of the proposed methods are extensively studied, and efficient algorithms for posterior computation are discussed. The performance of the proposed approaches is assessed in simulation studies and complex ecological applications.
21-gen-2025
Inglese
CANALE, ANTONIO
Università degli studi di Padova
File in questo prodotto:
File Dimensione Formato  
PhDThesisStolf.pdf

accesso aperto

Dimensione 4.44 MB
Formato Adobe PDF
4.44 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/218135
Il codice NBN di questa tesi è URN:NBN:IT:UNIPD-218135