Inferring cause–effect relationships from observational data is a central goal of causality research, as it enables formal reasoning about interventions when experiments are expensive, unethical, or impossible. Causal discovery tackles this task, yet existing algorithms become inefficient as the number of variables grows and lean on restrictive modelling assumptions that seldom hold in practice, limiting their usefulness for real-world problems. This thesis consists of three main contributions, aimed at providing effective and provably correct algorithms for causal discovery in realistic scenarios. (i) We uncover a tight link between score-matching estimation—the score function being the gradient of the log-likelihood of the observed data—and the inference of causal relations between random variables. Exploiting this connection, we design scalable and provably consistent algorithms that recover additive-noise models with arbitrary noise, even in the presence of latent common causes—a longstanding challenge in causal inference. (ii) We bridge the gap between idealised theory and practice by introducing a benchmarking paradigm for causal discovery methods under diverse conditions: rather than assumption-compliant data, our benchmark deliberately violates common modelling assumptions required by most algorithms. We evaluate a broad set of prominent methods under these harsher yet more realistic scenarios, revealing when their conclusions can be trusted. (iii) Finally, we present the first analysis of the guarantees of transformer-based, amortised causal discovery, bringing principled reliability to a promising but previously opaque line of work. Together, these steps push causal discovery towards real-world readiness by delivering both computational efficiency and well-grounded trust.
Efficient and Trustworthy Causal Discovery
MONTAGNA, FRANCESCO
2025
Abstract
Inferring cause–effect relationships from observational data is a central goal of causality research, as it enables formal reasoning about interventions when experiments are expensive, unethical, or impossible. Causal discovery tackles this task, yet existing algorithms become inefficient as the number of variables grows and lean on restrictive modelling assumptions that seldom hold in practice, limiting their usefulness for real-world problems. This thesis consists of three main contributions, aimed at providing effective and provably correct algorithms for causal discovery in realistic scenarios. (i) We uncover a tight link between score-matching estimation—the score function being the gradient of the log-likelihood of the observed data—and the inference of causal relations between random variables. Exploiting this connection, we design scalable and provably consistent algorithms that recover additive-noise models with arbitrary noise, even in the presence of latent common causes—a longstanding challenge in causal inference. (ii) We bridge the gap between idealised theory and practice by introducing a benchmarking paradigm for causal discovery methods under diverse conditions: rather than assumption-compliant data, our benchmark deliberately violates common modelling assumptions required by most algorithms. We evaluate a broad set of prominent methods under these harsher yet more realistic scenarios, revealing when their conclusions can be trusted. (iii) Finally, we present the first analysis of the guarantees of transformer-based, amortised causal discovery, bringing principled reliability to a promising but previously opaque line of work. Together, these steps push causal discovery towards real-world readiness by delivering both computational efficiency and well-grounded trust.File | Dimensione | Formato | |
---|---|---|---|
phdunige_5383874.pdf
accesso aperto
Dimensione
10.36 MB
Formato
Adobe PDF
|
10.36 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/218002
URN:NBN:IT:UNIGE-218002