The construction of symbolic planning models, particularly in the PDDL language, remains a longstanding challenge in Artificial Intelligence due to its reliance on manual domain engineering. This thesis explores how such models can be learned automatically and without supervision, from raw visual input. We focus on learning PDDL-style action models for Classical Planning, with an emphasis on planning effectiveness and semantic interpretability. Our work builds on Latplan, a system that learns symbolic representations from images via deep latent-space encoding. In the first part of the thesis we improve Latplan without architectural changes by enforcing logical invariants through fuzzy logic regularization and tuning hyperparameters. While these methods yield improvements in simpler domains, they do not overcome Latplan’s two key limitations: the lack of action interpretability and poor planning performance due to hallucinated plans. To address this, we propose R-latplan, an architectural extension introducing deterministic action labeling based on visual differences. This enhances planning performance and aligns learned actions with high-level behaviors, even under noise. However, scalability remains a challenge due to the proliferation of generated actions, which increases planning time. To compress the action space, we investigate various techniques, first we use Decision Trees in order to identify patterns among the effects and preconditions of R-latplan and translate these patterns into higher level actions, but this approach does not enable efficient action models; then we propose RC-latplan, which clusters low-level actions into semantically coherent groups, each translated into a PDDL action. Experiments show that RC-latplan greatly reduces domain size while preserving interpretability. Although it slightly underperforms R-latplan, a variant clustering actions with the same effects achieves logical equivalence with R-latplan. Finally, the thesis explores a complementary research in which I worked, on Visual Re- ward Machines (VRMs), a framework for learning symbolic automata from visual observations to improve learning in sparse-reward environments.

Learning symbolic planning models from images

BARBIN, AYMERIC
2024

Abstract

The construction of symbolic planning models, particularly in the PDDL language, remains a longstanding challenge in Artificial Intelligence due to its reliance on manual domain engineering. This thesis explores how such models can be learned automatically and without supervision, from raw visual input. We focus on learning PDDL-style action models for Classical Planning, with an emphasis on planning effectiveness and semantic interpretability. Our work builds on Latplan, a system that learns symbolic representations from images via deep latent-space encoding. In the first part of the thesis we improve Latplan without architectural changes by enforcing logical invariants through fuzzy logic regularization and tuning hyperparameters. While these methods yield improvements in simpler domains, they do not overcome Latplan’s two key limitations: the lack of action interpretability and poor planning performance due to hallucinated plans. To address this, we propose R-latplan, an architectural extension introducing deterministic action labeling based on visual differences. This enhances planning performance and aligns learned actions with high-level behaviors, even under noise. However, scalability remains a challenge due to the proliferation of generated actions, which increases planning time. To compress the action space, we investigate various techniques, first we use Decision Trees in order to identify patterns among the effects and preconditions of R-latplan and translate these patterns into higher level actions, but this approach does not enable efficient action models; then we propose RC-latplan, which clusters low-level actions into semantically coherent groups, each translated into a PDDL action. Experiments show that RC-latplan greatly reduces domain size while preserving interpretability. Although it slightly underperforms R-latplan, a variant clustering actions with the same effects achieves logical equivalence with R-latplan. Finally, the thesis explores a complementary research in which I worked, on Visual Re- ward Machines (VRMs), a framework for learning symbolic automata from visual observations to improve learning in sparse-reward environments.
30-set-2024
Inglese
GEREVINI, ALFONSO EMILIO
Università degli Studi di Roma "La Sapienza"
100
File in questo prodotto:
File Dimensione Formato  
Tesi_dottorato_Barbin.pdf

accesso aperto

Dimensione 11.86 MB
Formato Adobe PDF
11.86 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/213924
Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-213924