Mediation analysis examines the role of the mediator in the relationship between an exposure (or treatment) and an outcome. Rather than only assessing the total effect of the exposure on the outcome, it seeks to determine whether part of this effect operates through a mediator—in other words, whether the exposure variable influences the mediator, which in turn affects the outcome. The classical approach to mediation analysis, introduced by Baron and Kenny (1986), focuses on linear models with three continuous variables: the exposure A, the mediator M, and the outcome Y. This approach has two main limitations: it excludes exposure-mediator interaction and non-linear effects. More recent literature (e.g., Robins and Greenland, 1992; Pearl, 2001) clarifies the assumptions required for mediation analysis in a counterfactual framework and offers definitions of direct and indirect effects based on counterfactual means, allowing for the inclusion of interaction and non-linearity. Identifying causal effects is challenging in observational studies, where one of the primary obstacles in mediation analysis is the assumption that the mediator is observable and measured without error. However, these assumptions are particularly strong in observational contexts. For cases with a mismeasured mediator, Le Cessie et al. (2012) and Valeri et al. (2014) proposed corrections for causal effects under different types of measurement error in the mediator, applicable when the outcome is modeled using linear or logistic regression. However, these approaches require specific, often unverifiable assumptions regarding the nature of the error. In this thesis, we extend the proximal causal inference framework (Miao et al., 2018; Tchetgen Tchetgen et al., 2020) to address two key issues: cases where the mediator is unobservable and those where it is subject to measurement error, presenting a novel method that avoids assumptions about the type of measurement error. This allows for the estimation of natural direct and indirect effects of the exposure on the outcome, even with a hidden mediator, by leveraging two proxy variables for the unobserved mediator. Our approach enables the estimation of causal effects using generalized linear models (GLM) and introduces a straightforward, readily applicable method. Its main advantage is its robustness to various types of measurement error, requiring no specific assumptions. With access to two proxy variables, this approach ensures unbiased causal effect estimation, regardless of the measurement error type. We developed a methodology for cases with a dichotomous treatment, a continuous mediator, and an outcome variable that can be either continuous (linear link) or a count variable (logarithmic link). We also explore scenarios involving interaction between the exposure and mediator and quantify the bias that would arise from using only one of the two proxies in place of the unmeasured mediator. There are several directions for future research. A natural extension is to adapt this proximal approach for binary outcomes with a logit link. Further work may also explore cases involving multiple or time-dependent mediators. Finally, we aim to apply this approach to estimate the average treatment effect (ATE) of the exposure on the outcome in the presence of an unobserved confounder, U, between the exposure and outcome, within the context of the Front-Door Criterion.
L’analisi della mediazione esamina il ruolo del mediatore nella relazione tra un’esposizione (o trattamento) e un esito. Piuttosto che limitarsi a valutare l’effetto totale dell’esposizione sull’esito, mira a determinare se una parte di questo effetto agisca attraverso un mediatore; in altre parole, se la variabile di esposizione influenzi la variabile mediatrice, che a sua volta incide sulla variabile di esito. L’approccio classico all’analisi della mediazione, introdotto da Baron e Kenny (1986), si basa su modelli lineari con tre variabili continue: l’esposizione A, il mediatore M e l’esito Y. Questo approccio presenta due principali limitazioni: esclude l’interazione tra esposizione e mediatore e non considera effetti non lineari. Letteratura più recente (es. Robins e Greenland, 1992; Pearl, 2001) chiarisce le assunzioni richieste per l’analisi della mediazione nel contesto controfattuale e offre definizioni di effetti diretti e indiretti basate sui valori medi controfattuali, che consentono di includere interazioni ed effetti non lineari. L’identificazione di effetti causali è impegnativa negli studi osservazionali, dove uno dei principali ostacoli nell’analisi della mediazione è l’assunzione che il mediatore sia osservabile e misurato senza errore. Tuttavia, tali assunzioni risultano particolarmente forti in contesti osservazionali. Nei casi di misurazione imprecisa del mediatore, Le Cessie et al. (2012) e Valeri et al. (2014) hanno proposto correzioni per effetti causali in presenza di vari tipi di errore di misurazione nel mediatore, applicabili quando l’esito è modellato tramite regressione lineare o logistica. Tuttavia, questi approcci richiedono assunzioni specifiche sulla natura dell’errore, spesso non verificabili. In questa tesi, estendiamo il framework dell’inferenza causale prossimale (Miao et al., 2018; Tchetgen Tchetgen et al., 2020) per affrontare due questioni chiave: il caso in cui il mediatore sia non osservabile e quello in cui sia soggetto a errore di misurazione, presentando un nuovo metodo che non richiede assunzioni sul tipo di errore di misurazione. Questo approccio consente di stimare gli effetti diretti e indiretti naturali dell’esposizione sull’esito, anche in presenza di un mediatore nascosto, sfruttando due variabili proxy per il mediatore non osservato. Il nostro approccio permette la stima degli effetti causali utilizzando modelli lineari generalizzati (GLM) e introduce un metodo semplice e applicabile. Il suo principale vantaggio risiede nella sua robustezza a vari tipi di errore di misurazione, senza richiedere assunzioni specifiche. Con l’accesso a due variabili proxy, questo approccio garantisce una stima non distorta degli effetti causali, indipendentemente dal tipo di errore di misurazione presente. Abbiamo sviluppato una metodologia per casi con un trattamento dicotomico, un mediatore continuo e una variabile di esito che può essere continua (link lineare) o una variabile di conteggio (link logaritmico). Esploriamo anche scenari che includono interazione tra esposizione e mediatore e quantifichiamo il bias che si genererebbe utilizzando solo uno dei due proxy al posto del mediatore non misurato. Esistono diverse direzioni per ricerche future. Un’estensione naturale è adattare questo approccio prossimale per esiti binari con link logit. Ulteriori studi potrebbero anche esplorare casi con mediatori multipli o dipendenti dal tempo. Infine, intendiamo applicare questo approccio per stimare l’effetto medio del trattamento (ATE) dell’esposizione sull’esito in presenza di un confondente non osservato, U, tra esposizione ed esito, nel contesto del criterio della Front-Door Criterion.
Causal Mediation Analysis with Hidden Mediator
MONTELISCIANI, LAURA
2025
Abstract
Mediation analysis examines the role of the mediator in the relationship between an exposure (or treatment) and an outcome. Rather than only assessing the total effect of the exposure on the outcome, it seeks to determine whether part of this effect operates through a mediator—in other words, whether the exposure variable influences the mediator, which in turn affects the outcome. The classical approach to mediation analysis, introduced by Baron and Kenny (1986), focuses on linear models with three continuous variables: the exposure A, the mediator M, and the outcome Y. This approach has two main limitations: it excludes exposure-mediator interaction and non-linear effects. More recent literature (e.g., Robins and Greenland, 1992; Pearl, 2001) clarifies the assumptions required for mediation analysis in a counterfactual framework and offers definitions of direct and indirect effects based on counterfactual means, allowing for the inclusion of interaction and non-linearity. Identifying causal effects is challenging in observational studies, where one of the primary obstacles in mediation analysis is the assumption that the mediator is observable and measured without error. However, these assumptions are particularly strong in observational contexts. For cases with a mismeasured mediator, Le Cessie et al. (2012) and Valeri et al. (2014) proposed corrections for causal effects under different types of measurement error in the mediator, applicable when the outcome is modeled using linear or logistic regression. However, these approaches require specific, often unverifiable assumptions regarding the nature of the error. In this thesis, we extend the proximal causal inference framework (Miao et al., 2018; Tchetgen Tchetgen et al., 2020) to address two key issues: cases where the mediator is unobservable and those where it is subject to measurement error, presenting a novel method that avoids assumptions about the type of measurement error. This allows for the estimation of natural direct and indirect effects of the exposure on the outcome, even with a hidden mediator, by leveraging two proxy variables for the unobserved mediator. Our approach enables the estimation of causal effects using generalized linear models (GLM) and introduces a straightforward, readily applicable method. Its main advantage is its robustness to various types of measurement error, requiring no specific assumptions. With access to two proxy variables, this approach ensures unbiased causal effect estimation, regardless of the measurement error type. We developed a methodology for cases with a dichotomous treatment, a continuous mediator, and an outcome variable that can be either continuous (linear link) or a count variable (logarithmic link). We also explore scenarios involving interaction between the exposure and mediator and quantify the bias that would arise from using only one of the two proxies in place of the unmeasured mediator. There are several directions for future research. A natural extension is to adapt this proximal approach for binary outcomes with a logit link. Further work may also explore cases involving multiple or time-dependent mediators. Finally, we aim to apply this approach to estimate the average treatment effect (ATE) of the exposure on the outcome in the presence of an unobserved confounder, U, between the exposure and outcome, within the context of the Front-Door Criterion.File | Dimensione | Formato | |
---|---|---|---|
phd_unimib_891538.pdf
accesso aperto
Dimensione
861.17 kB
Formato
Adobe PDF
|
861.17 kB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/195927
URN:NBN:IT:UNIMIB-195927