Inferring cause–effect relationships from observational data is a central goal of causality research, as it enables formal reasoning about interventions when experiments are expensive, unethical, or impossible. Causal discovery tackles this task, yet existing algorithms become inefficient as the number of variables grows and lean on restrictive modelling assumptions that seldom hold in practice, limiting their usefulness for real-world problems. This thesis consists of three main contributions, aimed at providing effective and provably correct algorithms for causal discovery in realistic scenarios. (i) We uncover a tight link between score-matching estimation—the score function being the gradient of the log-likelihood of the observed data—and the inference of causal relations between random variables. Exploiting this connection, we design scalable and provably consistent algorithms that recover additive-noise models with arbitrary noise, even in the presence of latent common causes—a longstanding challenge in causal inference. (ii) We bridge the gap between idealised theory and practice by introducing a benchmarking paradigm for causal discovery methods under diverse conditions: rather than assumption-compliant data, our benchmark deliberately violates common modelling assumptions required by most algorithms. We evaluate a broad set of prominent methods under these harsher yet more realistic scenarios, revealing when their conclusions can be trusted. (iii) Finally, we present the first analysis of the guarantees of transformer-based, amortised causal discovery, bringing principled reliability to a promising but previously opaque line of work. Together, these steps push causal discovery towards real-world readiness by delivering both computational efficiency and well-grounded trust.

Efficient and Trustworthy Causal Discovery

MONTAGNA, FRANCESCO
2025

Abstract

Inferring cause–effect relationships from observational data is a central goal of causality research, as it enables formal reasoning about interventions when experiments are expensive, unethical, or impossible. Causal discovery tackles this task, yet existing algorithms become inefficient as the number of variables grows and lean on restrictive modelling assumptions that seldom hold in practice, limiting their usefulness for real-world problems. This thesis consists of three main contributions, aimed at providing effective and provably correct algorithms for causal discovery in realistic scenarios. (i) We uncover a tight link between score-matching estimation—the score function being the gradient of the log-likelihood of the observed data—and the inference of causal relations between random variables. Exploiting this connection, we design scalable and provably consistent algorithms that recover additive-noise models with arbitrary noise, even in the presence of latent common causes—a longstanding challenge in causal inference. (ii) We bridge the gap between idealised theory and practice by introducing a benchmarking paradigm for causal discovery methods under diverse conditions: rather than assumption-compliant data, our benchmark deliberately violates common modelling assumptions required by most algorithms. We evaluate a broad set of prominent methods under these harsher yet more realistic scenarios, revealing when their conclusions can be trusted. (iii) Finally, we present the first analysis of the guarantees of transformer-based, amortised causal discovery, bringing principled reliability to a promising but previously opaque line of work. Together, these steps push causal discovery towards real-world readiness by delivering both computational efficiency and well-grounded trust.
15-lug-2025
Inglese
Causality; causal discovery; score matching; machine learning
ROSASCO, LORENZO
NOCETI, NICOLETTA
DELZANNO, GIORGIO
Università degli studi di Genova
File in questo prodotto:
File Dimensione Formato  
phdunige_5383874.pdf

accesso aperto

Dimensione 10.36 MB
Formato Adobe PDF
10.36 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/218002
Il codice NBN di questa tesi è URN:NBN:IT:UNIGE-218002