Environmental epidemiology research raises intriguing and fascinating causal inference questions, aiming to understand the comprehensive effects of environmental exposures on human health. The complexity of the ties between air pollution exposure, well-being index, demographic and socio-economic characteristics of the population, and air pollution regulations, solicit clear definitions of the causal effects and flexible models. In reply, we propose novel causal models that leverage the desirable characteristics of the Bayesian nonparametric prior, particularly the dependent Dirichlet process. These Bayesian nonparametric mixture models' well-known flexibility and adaptability are not the only reasons. In truth, our proposed models can easily handle two central challenges: the missing data problem, that arises in the causal inference framework of potential outcome, and the clustering structure, that tethers the applied research question with the proposed methodologies engaged in this thesis. Indeed the Bayesian paradigm allows straightforward missing potential outcome imputation, while the mixture structure of the dependent Dirichlet process naturally induces the clustering of the observations, through the latent variable that defines the allocation to the components of the mixture. In the details, we address two common challenging contexts of causal inference that frequently emerge in observational studies: capture and characterize the heterogeneity in the causal effects and deal with the post-treatment variables, that are affected by the treatment and simultaneously affect the outcome. Both contexts elicit the concept of clustering since the heterogeneous causal effects demand the clarification of the groups and the post-treatment variables induce the constitution of principal strata. These concepts manifest in environmental epidemiology studies as the groups of populations that are differently affected by air pollution exposure or air quality regulations. Indeed, different levels of vulnerability/resilience characterize the population, highlighting the socio-economic disparities in American society. Our proposed models, confounder-dependent Bayesian mixture model and the confounders-aware shared-atoms mixture model, allow us to exploit rich forms of dependence given the confounders and relationship between the variables with different treatment levels, enabling us to (i) define with a flexible structure the probability distribution of outcome/post-treatment variable, (ii) impute the missing data properly, (iii) estimate individual treatment effects, competitively with benchmark models, (iv) identify the groups/strata structure according to the causal estimands of interest, (v) delineate the characteristics of each group/stratum.

Bayesian Nonparametric Dependent Mixtures for Causal Inference with Applications to Air Pollution Epidemiology

ZORZETTO, DAFNE
2024

Abstract

Environmental epidemiology research raises intriguing and fascinating causal inference questions, aiming to understand the comprehensive effects of environmental exposures on human health. The complexity of the ties between air pollution exposure, well-being index, demographic and socio-economic characteristics of the population, and air pollution regulations, solicit clear definitions of the causal effects and flexible models. In reply, we propose novel causal models that leverage the desirable characteristics of the Bayesian nonparametric prior, particularly the dependent Dirichlet process. These Bayesian nonparametric mixture models' well-known flexibility and adaptability are not the only reasons. In truth, our proposed models can easily handle two central challenges: the missing data problem, that arises in the causal inference framework of potential outcome, and the clustering structure, that tethers the applied research question with the proposed methodologies engaged in this thesis. Indeed the Bayesian paradigm allows straightforward missing potential outcome imputation, while the mixture structure of the dependent Dirichlet process naturally induces the clustering of the observations, through the latent variable that defines the allocation to the components of the mixture. In the details, we address two common challenging contexts of causal inference that frequently emerge in observational studies: capture and characterize the heterogeneity in the causal effects and deal with the post-treatment variables, that are affected by the treatment and simultaneously affect the outcome. Both contexts elicit the concept of clustering since the heterogeneous causal effects demand the clarification of the groups and the post-treatment variables induce the constitution of principal strata. These concepts manifest in environmental epidemiology studies as the groups of populations that are differently affected by air pollution exposure or air quality regulations. Indeed, different levels of vulnerability/resilience characterize the population, highlighting the socio-economic disparities in American society. Our proposed models, confounder-dependent Bayesian mixture model and the confounders-aware shared-atoms mixture model, allow us to exploit rich forms of dependence given the confounders and relationship between the variables with different treatment levels, enabling us to (i) define with a flexible structure the probability distribution of outcome/post-treatment variable, (ii) impute the missing data properly, (iii) estimate individual treatment effects, competitively with benchmark models, (iv) identify the groups/strata structure according to the causal estimands of interest, (v) delineate the characteristics of each group/stratum.
29-gen-2024
Inglese
CANALE, ANTONIO
Università degli studi di Padova
File in questo prodotto:
File Dimensione Formato  
THESIS_dafne_zorzetto.pdf

accesso aperto

Dimensione 8.78 MB
Formato Adobe PDF
8.78 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/96383
Il codice NBN di questa tesi è URN:NBN:IT:UNIPD-96383