Efficient and Trustworthy Causal Discovery

Montagna, Francesco

Inferring cause–effect relationships from observational data is a central goal of causality research, as it enables formal reasoning about interventions when experiments are expensive, unethical, or impossible. Causal discovery tackles this task, yet existing algorithms become inefficient as the number of variables grows and lean on restrictive modelling assumptions that seldom hold in practice, limiting their usefulness for real-world problems. This thesis consists of three main contributions, aimed at providing effective and provably correct algorithms for causal discovery in realistic scenarios. (i) We uncover a tight link between score-matching estimation—the score function being the gradient of the log-likelihood of the observed data—and the inference of causal relations between random variables. Exploiting this connection, we design scalable and provably consistent algorithms that recover additive-noise models with arbitrary noise, even in the presence of latent common causes—a longstanding challenge in causal inference. (ii) We bridge the gap between idealised theory and practice by introducing a benchmarking paradigm for causal discovery methods under diverse conditions: rather than assumption-compliant data, our benchmark deliberately violates common modelling assumptions required by most algorithms. We evaluate a broad set of prominent methods under these harsher yet more realistic scenarios, revealing when their conclusions can be trusted. (iii) Finally, we present the first analysis of the guarantees of transformer-based, amortised causal discovery, bringing principled reliability to a promising but previously opaque line of work. Together, these steps push causal discovery towards real-world readiness by delivering both computational efficiency and well-grounded trust.

Efficient and Trustworthy Causal Discovery

MONTAGNA, FRANCESCO

2025

Abstract

Inferring cause–effect relationships from observational data is a central goal of causality research, as it enables formal reasoning about interventions when experiments are expensive, unethical, or impossible. Causal discovery tackles this task, yet existing algorithms become inefficient as the number of variables grows and lean on restrictive modelling assumptions that seldom hold in practice, limiting their usefulness for real-world problems. This thesis consists of three main contributions, aimed at providing effective and provably correct algorithms for causal discovery in realistic scenarios. (i) We uncover a tight link between score-matching estimation—the score function being the gradient of the log-likelihood of the observed data—and the inference of causal relations between random variables. Exploiting this connection, we design scalable and provably consistent algorithms that recover additive-noise models with arbitrary noise, even in the presence of latent common causes—a longstanding challenge in causal inference. (ii) We bridge the gap between idealised theory and practice by introducing a benchmarking paradigm for causal discovery methods under diverse conditions: rather than assumption-compliant data, our benchmark deliberately violates common modelling assumptions required by most algorithms. We evaluate a broad set of prominent methods under these harsher yet more realistic scenarios, revealing when their conclusions can be trusted. (iii) Finally, we present the first analysis of the guarantees of transformer-based, amortised causal discovery, bringing principled reliability to a promising but previously opaque line of work. Together, these steps push causal discovery towards real-world readiness by delivering both computational efficiency and well-grounded trust.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				100023 - Dipartimento di Informatica, bioingegneria, robotica e ingegneria dei sistemi
			
	Corso di studio
	
				XXXVII CICLO - INFORMATICA E INGEGNERIA DEI SISTEMI/ COMPUTER SCIENCE AND SYSTEMS ENGINEERING - informatica/computer science
			
	Data di pubblicazione
	
				15-lug-2025
			
	Lingua
	
				Inglese
			
	Parola chiave
	
				Causality; causal discovery; score matching; machine learning
			
	Relatore, Supervisor, Advisor o Tutor
	
				ROSASCO, LORENZO
NOCETI, NICOLETTA
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				DELZANNO, GIORGIO
			
	Nome Editore
	
				Università degli studi di Genova
			
	Collezione di appartenenza
	
				Università degli Studi di Genova

File in questo prodotto:

File	Dimensione	Formato
phdunige_5383874.pdf accesso aperto Dimensione 10.36 MB Formato Adobe PDF Visualizza/Apri	10.36 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/218002

Il codice NBN di questa tesi è URN:NBN:IT:UNIGE-218002