Learning symbolic planning models from images

Barbin, Aymeric

The construction of symbolic planning models, particularly in the PDDL language, remains a longstanding challenge in Artificial Intelligence due to its reliance on manual domain engineering. This thesis explores how such models can be learned automatically and without supervision, from raw visual input. We focus on learning PDDL-style action models for Classical Planning, with an emphasis on planning effectiveness and semantic interpretability. Our work builds on Latplan, a system that learns symbolic representations from images via deep latent-space encoding. In the first part of the thesis we improve Latplan without architectural changes by enforcing logical invariants through fuzzy logic regularization and tuning hyperparameters. While these methods yield improvements in simpler domains, they do not overcome Latplan’s two key limitations: the lack of action interpretability and poor planning performance due to hallucinated plans. To address this, we propose R-latplan, an architectural extension introducing deterministic action labeling based on visual differences. This enhances planning performance and aligns learned actions with high-level behaviors, even under noise. However, scalability remains a challenge due to the proliferation of generated actions, which increases planning time. To compress the action space, we investigate various techniques, first we use Decision Trees in order to identify patterns among the effects and preconditions of R-latplan and translate these patterns into higher level actions, but this approach does not enable efficient action models; then we propose RC-latplan, which clusters low-level actions into semantically coherent groups, each translated into a PDDL action. Experiments show that RC-latplan greatly reduces domain size while preserving interpretability. Although it slightly underperforms R-latplan, a variant clustering actions with the same effects achieves logical equivalence with R-latplan. Finally, the thesis explores a complementary research in which I worked, on Visual Re- ward Machines (VRMs), a framework for learning symbolic automata from visual observations to improve learning in sparse-reward environments.

Learning symbolic planning models from images

BARBIN, AYMERIC

2024

Abstract

The construction of symbolic planning models, particularly in the PDDL language, remains a longstanding challenge in Artificial Intelligence due to its reliance on manual domain engineering. This thesis explores how such models can be learned automatically and without supervision, from raw visual input. We focus on learning PDDL-style action models for Classical Planning, with an emphasis on planning effectiveness and semantic interpretability. Our work builds on Latplan, a system that learns symbolic representations from images via deep latent-space encoding. In the first part of the thesis we improve Latplan without architectural changes by enforcing logical invariants through fuzzy logic regularization and tuning hyperparameters. While these methods yield improvements in simpler domains, they do not overcome Latplan’s two key limitations: the lack of action interpretability and poor planning performance due to hallucinated plans. To address this, we propose R-latplan, an architectural extension introducing deterministic action labeling based on visual differences. This enhances planning performance and aligns learned actions with high-level behaviors, even under noise. However, scalability remains a challenge due to the proliferation of generated actions, which increases planning time. To compress the action space, we investigate various techniques, first we use Decision Trees in order to identify patterns among the effects and preconditions of R-latplan and translate these patterns into higher level actions, but this approach does not enable efficient action models; then we propose RC-latplan, which clusters low-level actions into semantically coherent groups, each translated into a PDDL action. Experiments show that RC-latplan greatly reduces domain size while preserving interpretability. Although it slightly underperforms R-latplan, a variant clustering actions with the same effects achieves logical equivalence with R-latplan. Finally, the thesis explores a complementary research in which I worked, on Visual Re- ward Machines (VRMs), a framework for learning symbolic automata from visual observations to improve learning in sparse-reward environments.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI INGEGNERIA INFORMATICA, AUTOMATICA E GESTIONALE -ANTONIO RUBERTI-
			
	Corso di studio
	
				Informatica
			
	Data di pubblicazione
	
				30-set-2024
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				GEREVINI, ALFONSO EMILIO
			
	Nome Editore
	
				Università degli Studi di Roma "La Sapienza"
			
	Numero di pagine
	
				100
			
	Collezione di appartenenza
	
				Università degli Studi di Roma La Sapienza

File in questo prodotto:

File	Dimensione	Formato
Tesi_dottorato_Barbin.pdf accesso aperto Dimensione 11.86 MB Formato Adobe PDF Visualizza/Apri	11.86 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/213924

Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-213924