A distance‐based framework for causal discovery from high‐dimensional time series

Del Tatto, Vittorio

Unveiling causal relationships between time-dependent variables, observed as time series, is a challenge with countless applications across diverse scientific fields. This task lies at the heart of a research area known as causal discovery. In this thesis, we introduce a framework for causal discovery which is based on the quantification of the information content of different distance measures, built with suitable subsets of the observed variables. Focusing on distances, rather than directly on the variables, offers significant advantages when applied to high-dimensional systems. The most important advantage is a strongly enhanced statistical power in detecting when a causal link is absent, which brings to a reduced rate of false positive detections. After benchmarking our approach on chaotic dynamical systems and real-world electroencephalographic data, we apply it to the study of causality in physical systems described by molecular dynamics simulations, in a setting in which the system explores a stationary equilibrium distribution. We show that even in these conditions genuine causal links can emerge. In this context, we interpret the emergence of unidirectional causal links between specific collective variables in terms of the structure of the free energy landscapes. We find that a prerequisite for the existence of causal links in molecular systems is a significant separation of the time scales. Furthermore, we propose to identify causal relationships in molecular systems using computational experiments that mimic ideal manipulations of the collective variables of interest. Finally, we build upon our distance-based framework to tackle the problem of causal graph reconstruction, by proposing an algorithm that outputs a "mesoscopic" version of standard causal graphs, where groups of variables are aggregated into single nodes. We show that this framework carries both computational and conceptual advantages, on the one hand simplifying the inference process, and on the other hand yielding a more compact and interpretable causal graph.

A distance‐based framework for causal discovery from high‐dimensional time series

DEL TATTO, VITTORIO

2025

Abstract

Unveiling causal relationships between time-dependent variables, observed as time series, is a challenge with countless applications across diverse scientific fields. This task lies at the heart of a research area known as causal discovery. In this thesis, we introduce a framework for causal discovery which is based on the quantification of the information content of different distance measures, built with suitable subsets of the observed variables. Focusing on distances, rather than directly on the variables, offers significant advantages when applied to high-dimensional systems. The most important advantage is a strongly enhanced statistical power in detecting when a causal link is absent, which brings to a reduced rate of false positive detections. After benchmarking our approach on chaotic dynamical systems and real-world electroencephalographic data, we apply it to the study of causality in physical systems described by molecular dynamics simulations, in a setting in which the system explores a stationary equilibrium distribution. We show that even in these conditions genuine causal links can emerge. In this context, we interpret the emergence of unidirectional causal links between specific collective variables in terms of the structure of the free energy landscapes. We find that a prerequisite for the existence of causal links in molecular systems is a significant separation of the time scales. Furthermore, we propose to identify causal relationships in molecular systems using computational experiments that mimic ideal manipulations of the collective variables of interest. Finally, we build upon our distance-based framework to tackle the problem of causal graph reconstruction, by proposing an algorithm that outputs a "mesoscopic" version of standard causal graphs, where groups of variables are aggregated into single nodes. We show that this framework carries both computational and conceptual advantages, on the one hand simplifying the inference process, and on the other hand yielding a more compact and interpretable causal graph.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				Physics and Chemistry of Biological Systems
			
	Data di pubblicazione
	
				2-ott-2025
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				Laio, Alessandro
			
	Nome Editore
	
				SISSA
			
	Città Editore
	
				Trieste
			
	Collezione di appartenenza
	
				Scuola Internazionale Superiore di Studi Avanzati di Trieste

File in questo prodotto:

File	Dimensione	Formato
PhD_thesis (1).pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 28.3 MB Formato Adobe PDF Visualizza/Apri	28.3 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/301849

Il codice NBN di questa tesi è URN:NBN:IT:SISSA-301849