Latent alignment techniques enable modular policies in the context of reinforcement learning

Ricciardi, Antonio Pio

Visual Reinforcement Learning is a popular and powerful framework that fully leverages recent breakthroughs in Deep Learning. However, variations in input domains (e.g., changes in background colors due to seasonal shifts) or task domains (e.g., modifying a car’s target speed) can degrade agent performance, often requiring retraining for each variation. Recent advances in representation learning have demonstrated the potential to combine components from different neural networks to construct new models in a zero-shot fashion. In this dissertation, we build upon these advances and adapt them to the Visual Reinforcement Learning setting, enabling the composition of agent components to form new agents capable of handling novel visual-task combinations not seen during training. This is achieved by establishing communication between encoders and controllers from different models trained under distinct variations. Our findings highlight the promise of model reuse, significantly reducing the need for retraining and thereby cutting down on both time and computational cost.

Latent alignment techniques enable modular policies in the context of reinforcement learning

RICCIARDI, ANTONIO PIO

2025

Abstract

Visual Reinforcement Learning is a popular and powerful framework that fully leverages recent breakthroughs in Deep Learning. However, variations in input domains (e.g., changes in background colors due to seasonal shifts) or task domains (e.g., modifying a car’s target speed) can degrade agent performance, often requiring retraining for each variation. Recent advances in representation learning have demonstrated the potential to combine components from different neural networks to construct new models in a zero-shot fashion. In this dissertation, we build upon these advances and adapt them to the Visual Reinforcement Learning setting, enabling the composition of agent components to form new agents capable of handling novel visual-task combinations not seen during training. This is achieved by establishing communication between encoders and controllers from different models trained under distinct variations. Our findings highlight the promise of model reuse, significantly reducing the need for retraining and thereby cutting down on both time and computational cost.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI INFORMATICA
			
	Corso di studio
	
				Informatica
			
	Data di pubblicazione
	
				22-mag-2025
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				RODOLA', EMANUELE
			
	Nome Editore
	
				Università degli Studi di Roma "La Sapienza"
			
	Numero di pagine
	
				87
			
	Collezione di appartenenza
	
				Università degli Studi di Roma La Sapienza

File in questo prodotto:

File	Dimensione	Formato
Tesi_dottorato_Ricciardi.pdf accesso aperto Dimensione 10.7 MB Formato Adobe PDF Visualizza/Apri	10.7 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/212551

Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-212551