Mobile robots are an emergent technology more and more present in contexts such as homes, offices, and hospitals to assist humans in daily life activities. Given the complexity of human-centric environments, robotic vision, namely computer vision embedded in mobile robots, has become an essential capability to acquire a semantically-rich understanding of the environment for improving core robotics tasks (such as navigation and localization), but also to enable high-level activities such as manipulation or human-robot interaction. Given the recent advances of deep learning, the naive solution to implement robotic vision is to leverage publicly available deep neural networks that can be mounted in robotic platforms with limited effort. Despite being widely adopted, this approach suffers from important limitations caused by the so-called domain shift problem: being trained on simulated or generic datasets (source domain), deep neural networks dramatically fail to tackle the complexity of the real-world environments (target domain) in which the robots operate. This dissertation investigates the challenges of domain shift in robotic vision and proposes novel adaptation strategies to enable robust and scalable real-world deployments of mobile robots. The contributions are structured into three complementary parts. First, we analyze the limitations of the mainstream pipeline to implement robotic vision that combines pre-trained neural networks with manual fine-tuning. We propose photorealistic simulation-based pre-training using data compliant with the robot's perception modality, and we demonstrate that fine-tuning with limited high-quality manual annotations substantially increases the model's robustness in the specific operational environment of the robot. Second, we remove the need for human supervision, proposing two alternative approaches for unsupervised model's adaptation. Our methods exploit the spatial constraints between neural network's predictions and the 3D world to enhance the quality of the pseudo-labels, thus enabling self-supervised adaptation. Third, we address adaptation in cloud-based robotic perception, where intensive inference required by neural networks is offloaded to remote servers to deal with the limited hardware configuration of mobile robots. In this context, we design solutions to ensure the scalability of domain adaptation and enable privacy preservation, designing lightweight neural networks that can be run locally by the robot. We validate our approaches using both real-world datasets and extensive experiments with real robotic platforms, considering multiple perception tasks such as object detection, 3D pose estimation, and semantic segmentation.

THE CHALLENGE OF DOMAIN SHIFT IN REAL-WORLD ROBOTIC VISION: TOWARD SCALABLE, UNSUPERVISED, AND CLOUD-BASED ADAPTATION

ANTONAZZI, MICHELE
2025

Abstract

Mobile robots are an emergent technology more and more present in contexts such as homes, offices, and hospitals to assist humans in daily life activities. Given the complexity of human-centric environments, robotic vision, namely computer vision embedded in mobile robots, has become an essential capability to acquire a semantically-rich understanding of the environment for improving core robotics tasks (such as navigation and localization), but also to enable high-level activities such as manipulation or human-robot interaction. Given the recent advances of deep learning, the naive solution to implement robotic vision is to leverage publicly available deep neural networks that can be mounted in robotic platforms with limited effort. Despite being widely adopted, this approach suffers from important limitations caused by the so-called domain shift problem: being trained on simulated or generic datasets (source domain), deep neural networks dramatically fail to tackle the complexity of the real-world environments (target domain) in which the robots operate. This dissertation investigates the challenges of domain shift in robotic vision and proposes novel adaptation strategies to enable robust and scalable real-world deployments of mobile robots. The contributions are structured into three complementary parts. First, we analyze the limitations of the mainstream pipeline to implement robotic vision that combines pre-trained neural networks with manual fine-tuning. We propose photorealistic simulation-based pre-training using data compliant with the robot's perception modality, and we demonstrate that fine-tuning with limited high-quality manual annotations substantially increases the model's robustness in the specific operational environment of the robot. Second, we remove the need for human supervision, proposing two alternative approaches for unsupervised model's adaptation. Our methods exploit the spatial constraints between neural network's predictions and the 3D world to enhance the quality of the pseudo-labels, thus enabling self-supervised adaptation. Third, we address adaptation in cloud-based robotic perception, where intensive inference required by neural networks is offloaded to remote servers to deal with the limited hardware configuration of mobile robots. In this context, we design solutions to ensure the scalability of domain adaptation and enable privacy preservation, designing lightweight neural networks that can be run locally by the robot. We validate our approaches using both real-world datasets and extensive experiments with real robotic platforms, considering multiple perception tasks such as object detection, 3D pose estimation, and semantic segmentation.
9-dic-2025
Inglese
BASILICO, NICOLA
SASSI, ROBERTO
Università degli Studi di Milano
152
File in questo prodotto:
File Dimensione Formato  
phd_unimi_R13700.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 44.98 MB
Formato Adobe PDF
44.98 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/352947
Il codice NBN di questa tesi è URN:NBN:IT:UNIMI-352947