Multi-agent reinforcement learning: coordination through abstractions, trust and world models

Frattolillo, Francesco

This thesis explores advancements in cooperative multi-agent reinforcement learning (RL), with a focus on coordination, sample efficiency, and trust in cooperative systems. A significant contribution of this work is the formalization of trust within multi-agent RL, which is pivotal to establishing cooperativeness. Trust, a key determinant of cooperative behavior, is systematically examined through a set of influencing factors. These factors are analyzed to understand how trust shapes agent interactions and supports robust collaboration in dynamic environments. Additionally, this thesis introduces a novel approach to combine the strengths of traditional tabular solutions and Deep RL solutions by constructing discrete abstractions of continuous environments. The use of abstraction allows guiding the learning process in the layers below in the hierarchy, which is particularly useful in the case of environments with very sparse rewards. The solutions are tested on one of the most prominent applications within the RL domain, which is cooperative multi-UAV systems. Central to this work is the integration of model-based RL techniques, utilizing world models to enable agents to reason about future outcomes. By leveraging these learned representations, agents can anticipate the intentions of others, facilitating consensus-building and collective decision-making. The effectiveness of these approaches is demonstrated empirically in different scenarios.

Multi-agent reinforcement learning: coordination through abstractions, trust and world models

FRATTOLILLO, FRANCESCO

2025

Abstract

This thesis explores advancements in cooperative multi-agent reinforcement learning (RL), with a focus on coordination, sample efficiency, and trust in cooperative systems. A significant contribution of this work is the formalization of trust within multi-agent RL, which is pivotal to establishing cooperativeness. Trust, a key determinant of cooperative behavior, is systematically examined through a set of influencing factors. These factors are analyzed to understand how trust shapes agent interactions and supports robust collaboration in dynamic environments. Additionally, this thesis introduces a novel approach to combine the strengths of traditional tabular solutions and Deep RL solutions by constructing discrete abstractions of continuous environments. The use of abstraction allows guiding the learning process in the layers below in the hierarchy, which is particularly useful in the case of environments with very sparse rewards. The solutions are tested on one of the most prominent applications within the RL domain, which is cooperative multi-UAV systems. Central to this work is the integration of model-based RL techniques, utilizing world models to enable agents to reason about future outcomes. By leveraging these learned representations, agents can anticipate the intentions of others, facilitating consensus-building and collective decision-making. The effectiveness of these approaches is demonstrated empirically in different scenarios.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI INGEGNERIA INFORMATICA, AUTOMATICA E GESTIONALE -ANTONIO RUBERTI-
			
	Corso di studio
	
				Ingegneria informatica
			
	Data di pubblicazione
	
				18-set-2025
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				IOCCHI, Luca
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				NAVIGLI, Roberto
			
	Nome Editore
	
				Università degli Studi di Roma "La Sapienza"
			
	Numero di pagine
	
				130
			
	Collezione di appartenenza
	
				Università degli Studi di Roma La Sapienza

File in questo prodotto:

File	Dimensione	Formato
Tesi_dottorato_Frattolillo.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 6.35 MB Formato Adobe PDF Visualizza/Apri	6.35 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/305806

Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-305806