From histories to states: practical automata-based methods for learning and solving regular decision processes

Paludo Licks, Gabriel

Reinforcement Learning (RL) commonly relies on the Markov property, which asserts that the current state contains all relevant information for future decision-making. However, many real-world scenarios violate this property: agents often face observations that do not fully reveal the environment's true state or that exhibit dependencies extending across extended histories. Partially Observable Markov Decision Processes (POMDPs) provide a comprehensive theoretical framework for such problems, yet exact solutions are computationally intractable, and existing learning-based methods largely depend on heuristic or approximate techniques. In this thesis, we focus on a more specialised class of partially observable systems known as Regular Decision Processes (RDPs). RDPs capture a wide range of non-Markovian dependencies while remaining more tractable than general POMDPs. Specifically, they assume that any temporal dependencies in the environment can be encoded by a finite-state automaton, thereby transforming the problem into a Markovian one once the automaton's internal states are included in the agent's state representation. This structural assumption, though less general than a full POMDP, is satisfied in many practical cases where the relevant history can be summarised by a finite memory mechanism. A principal contribution of this work is demonstrating how an RL agent can learn such an automaton-based representation autonomously, without explicit domain knowledge. We achieve this by leveraging automata learning, a field dedicated to inferring state machines from observed sequences, in combination with established RL methods. By examining how different histories lead to indistinguishable outcomes, the agent merges states that exhibit statistically equivalent future reward and observation distributions. The result is a compact, state-based model of the environment that effectively restores the Markov property. Building on this learned representation, we integrate it with classical RL algorithms. Empirical evaluations on benchmark tasks show that our approaches outperform baseline methods, particularly in domains featuring partial observability and extended temporal dependencies. Beyond improvements in sample efficiency, the explicit automaton structure also provides a degree of interpretability, potentially allowing for transfer to related tasks or examination by domain experts. Finally, we discuss directions for further development, including extensions to larger or noisier observation spaces, integration with function approximation, and potential applications in fields where partial observability and regular temporal dependencies are dominant.

From histories to states: practical automata-based methods for learning and solving regular decision processes

PALUDO LICKS, GABRIEL

2025

Abstract

Reinforcement Learning (RL) commonly relies on the Markov property, which asserts that the current state contains all relevant information for future decision-making. However, many real-world scenarios violate this property: agents often face observations that do not fully reveal the environment's true state or that exhibit dependencies extending across extended histories. Partially Observable Markov Decision Processes (POMDPs) provide a comprehensive theoretical framework for such problems, yet exact solutions are computationally intractable, and existing learning-based methods largely depend on heuristic or approximate techniques. In this thesis, we focus on a more specialised class of partially observable systems known as Regular Decision Processes (RDPs). RDPs capture a wide range of non-Markovian dependencies while remaining more tractable than general POMDPs. Specifically, they assume that any temporal dependencies in the environment can be encoded by a finite-state automaton, thereby transforming the problem into a Markovian one once the automaton's internal states are included in the agent's state representation. This structural assumption, though less general than a full POMDP, is satisfied in many practical cases where the relevant history can be summarised by a finite memory mechanism. A principal contribution of this work is demonstrating how an RL agent can learn such an automaton-based representation autonomously, without explicit domain knowledge. We achieve this by leveraging automata learning, a field dedicated to inferring state machines from observed sequences, in combination with established RL methods. By examining how different histories lead to indistinguishable outcomes, the agent merges states that exhibit statistically equivalent future reward and observation distributions. The result is a compact, state-based model of the environment that effectively restores the Markov property. Building on this learned representation, we integrate it with classical RL algorithms. Empirical evaluations on benchmark tasks show that our approaches outperform baseline methods, particularly in domains featuring partial observability and extended temporal dependencies. Beyond improvements in sample efficiency, the explicit automaton structure also provides a degree of interpretability, potentially allowing for transfer to related tasks or examination by domain experts. Finally, we discuss directions for further development, including extensions to larger or noisier observation spaces, integration with function approximation, and potential applications in fields where partial observability and regular temporal dependencies are dominant.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI INGEGNERIA INFORMATICA, AUTOMATICA E GESTIONALE -ANTONIO RUBERTI-
			
	Corso di studio
	
				Ingegneria informatica
			
	Data di pubblicazione
	
				19-mag-2025
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				DE GIACOMO, Giuseppe
			
	Nome Editore
	
				Università degli Studi di Roma "La Sapienza"
			
	Collezione di appartenenza
	
				Università degli Studi di Roma La Sapienza

File in questo prodotto:

File	Dimensione	Formato
Tesi_dottorato_PaludoLicks.pdf accesso aperto Dimensione 1.37 MB Formato Adobe PDF Visualizza/Apri	1.37 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/212882

Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-212882