Visual Reinforcement Learning is a popular and powerful framework that fully leverages recent breakthroughs in Deep Learning. However, variations in input domains (e.g., changes in background colors due to seasonal shifts) or task domains (e.g., modifying a car’s target speed) can degrade agent performance, often requiring retraining for each variation. Recent advances in representation learning have demonstrated the potential to combine components from different neural networks to construct new models in a zero-shot fashion. In this dissertation, we build upon these advances and adapt them to the Visual Reinforcement Learning setting, enabling the composition of agent components to form new agents capable of handling novel visual-task combinations not seen during training. This is achieved by establishing communication between encoders and controllers from different models trained under distinct variations. Our findings highlight the promise of model reuse, significantly reducing the need for retraining and thereby cutting down on both time and computational cost.

Latent alignment techniques enable modular policies in the context of reinforcement learning

RICCIARDI, ANTONIO PIO
2025

Abstract

Visual Reinforcement Learning is a popular and powerful framework that fully leverages recent breakthroughs in Deep Learning. However, variations in input domains (e.g., changes in background colors due to seasonal shifts) or task domains (e.g., modifying a car’s target speed) can degrade agent performance, often requiring retraining for each variation. Recent advances in representation learning have demonstrated the potential to combine components from different neural networks to construct new models in a zero-shot fashion. In this dissertation, we build upon these advances and adapt them to the Visual Reinforcement Learning setting, enabling the composition of agent components to form new agents capable of handling novel visual-task combinations not seen during training. This is achieved by establishing communication between encoders and controllers from different models trained under distinct variations. Our findings highlight the promise of model reuse, significantly reducing the need for retraining and thereby cutting down on both time and computational cost.
22-mag-2025
Inglese
RODOLA', EMANUELE
Università degli Studi di Roma "La Sapienza"
87
File in questo prodotto:
File Dimensione Formato  
Tesi_dottorato_Ricciardi.pdf

accesso aperto

Dimensione 10.7 MB
Formato Adobe PDF
10.7 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/212551
Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-212551