THE CHALLENGE OF DOMAIN SHIFT IN REAL-WORLD ROBOTIC VISION: TOWARD SCALABLE, UNSUPERVISED, AND CLOUD-BASED ADAPTATION

Antonazzi, Michele

Mobile robots are an emergent technology more and more present in contexts such as homes, offices, and hospitals to assist humans in daily life activities. Given the complexity of human-centric environments, robotic vision, namely computer vision embedded in mobile robots, has become an essential capability to acquire a semantically-rich understanding of the environment for improving core robotics tasks (such as navigation and localization), but also to enable high-level activities such as manipulation or human-robot interaction. Given the recent advances of deep learning, the naive solution to implement robotic vision is to leverage publicly available deep neural networks that can be mounted in robotic platforms with limited effort. Despite being widely adopted, this approach suffers from important limitations caused by the so-called domain shift problem: being trained on simulated or generic datasets (source domain), deep neural networks dramatically fail to tackle the complexity of the real-world environments (target domain) in which the robots operate. This dissertation investigates the challenges of domain shift in robotic vision and proposes novel adaptation strategies to enable robust and scalable real-world deployments of mobile robots. The contributions are structured into three complementary parts. First, we analyze the limitations of the mainstream pipeline to implement robotic vision that combines pre-trained neural networks with manual fine-tuning. We propose photorealistic simulation-based pre-training using data compliant with the robot's perception modality, and we demonstrate that fine-tuning with limited high-quality manual annotations substantially increases the model's robustness in the specific operational environment of the robot. Second, we remove the need for human supervision, proposing two alternative approaches for unsupervised model's adaptation. Our methods exploit the spatial constraints between neural network's predictions and the 3D world to enhance the quality of the pseudo-labels, thus enabling self-supervised adaptation. Third, we address adaptation in cloud-based robotic perception, where intensive inference required by neural networks is offloaded to remote servers to deal with the limited hardware configuration of mobile robots. In this context, we design solutions to ensure the scalability of domain adaptation and enable privacy preservation, designing lightweight neural networks that can be run locally by the robot. We validate our approaches using both real-world datasets and extensive experiments with real robotic platforms, considering multiple perception tasks such as object detection, 3D pose estimation, and semantic segmentation.

THE CHALLENGE OF DOMAIN SHIFT IN REAL-WORLD ROBOTIC VISION: TOWARD SCALABLE, UNSUPERVISED, AND CLOUD-BASED ADAPTATION

ANTONAZZI, MICHELE

2025

Abstract

Mobile robots are an emergent technology more and more present in contexts such as homes, offices, and hospitals to assist humans in daily life activities. Given the complexity of human-centric environments, robotic vision, namely computer vision embedded in mobile robots, has become an essential capability to acquire a semantically-rich understanding of the environment for improving core robotics tasks (such as navigation and localization), but also to enable high-level activities such as manipulation or human-robot interaction. Given the recent advances of deep learning, the naive solution to implement robotic vision is to leverage publicly available deep neural networks that can be mounted in robotic platforms with limited effort. Despite being widely adopted, this approach suffers from important limitations caused by the so-called domain shift problem: being trained on simulated or generic datasets (source domain), deep neural networks dramatically fail to tackle the complexity of the real-world environments (target domain) in which the robots operate. This dissertation investigates the challenges of domain shift in robotic vision and proposes novel adaptation strategies to enable robust and scalable real-world deployments of mobile robots. The contributions are structured into three complementary parts. First, we analyze the limitations of the mainstream pipeline to implement robotic vision that combines pre-trained neural networks with manual fine-tuning. We propose photorealistic simulation-based pre-training using data compliant with the robot's perception modality, and we demonstrate that fine-tuning with limited high-quality manual annotations substantially increases the model's robustness in the specific operational environment of the robot. Second, we remove the need for human supervision, proposing two alternative approaches for unsupervised model's adaptation. Our methods exploit the spatial constraints between neural network's predictions and the 3D world to enhance the quality of the pseudo-labels, thus enabling self-supervised adaptation. Third, we address adaptation in cloud-based robotic perception, where intensive inference required by neural networks is offloaded to remote servers to deal with the limited hardware configuration of mobile robots. In this context, we design solutions to ensure the scalability of domain adaptation and enable privacy preservation, designing lightweight neural networks that can be run locally by the robot. We validate our approaches using both real-world datasets and extensive experiments with real robotic platforms, considering multiple perception tasks such as object detection, 3D pose estimation, and semantic segmentation.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				Dipartimento di Informatica Giovanni Degli Antoni
			
	Corso di studio
	
				INFORMATICA
			
	Data di pubblicazione
	
				9-dic-2025
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				BASILICO, NICOLA
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				SASSI, ROBERTO
			
	Nome Editore
	
				Università degli Studi di Milano
			
	Numero di pagine
	
				152
			
	Collezione di appartenenza
	
				Università degli Studi di Milano

File in questo prodotto:

File	Dimensione	Formato
phd_unimi_R13700.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 44.98 MB Formato Adobe PDF Visualizza/Apri	44.98 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/352947

Il codice NBN di questa tesi è URN:NBN:IT:UNIMI-352947