Modern robotics is at a turning point: shifting from rigid, pre-programmed automation to dynamic human-robot collaboration, where adaptability and intelligence drive the next generation of industrial production. The goal is no longer to replace human labor, but to augment it, fusing robotic precision and endurance with human creativity and intuition. To realize this vision, robots must not only move but see, understand, and efficiently act within the ever-changing complexity of the real world. This thesis advances that frontier by proposing a unified framework that bridges perception and action, enabling robots to comprehend and interact within dynamic human environments. The foundation of this framework is a robot-centric perception model, which establishes a coherent and consistent geometric relationship between what the robot perceives and how it moves. A core contribution of this work is the development of novel camera-to-robot calibration techniques that guarantee consistent spatial alignment across heterogeneous setups. These methods generalize to fixed external camera networks observing robotic arms, multi-camera systems mounted on manipulators, and mobile robots operating in unstructured environments, providing the robot with a unified visual reference, as if multiple eyes were coherently aligned within a single coordinate frame. To enable effective collaboration, the framework equips robots with robust human understanding capabilities through a novel multi-view 3D human pose estimation method and a real-time skeleton-based action recognition module. Together, they allow the robot to interpret and anticipate human behavior, turning motion into meaningful collaboration. Finally, the thesis addresses intelligent robot-object interaction through two distinct applications. First, for robot manipulators, we develop a few-shot imitation learning algorithm, guided by a world model, for complex manipulation tasks. Second, for mobile robots, we propose an active mapping strategy for autonomous navigation and reconstruction in unknown environments. This strategy integrates a Fisher Information-driven radiance field (Gaussian Splatting) model to select the most informative viewpoints, while motion prediction optimizes the path to those views. This allows the robot to explore efficiently, focusing attention where learning is most critical. Together, these contributions unify geometry, semantics, and dynamics into a single perceptual-action loop. The resulting system advances the development of robots that can reason, adapt, and collaborate effectively in unstructured and dynamic environments. All proposed methods are released as open-source software to encourage transparency, reproducibility, and collective progress toward perceptually grounded, human-aware robotic intelligence.

Perception from the Robot’s Perspective: Unified Calibration for Human Understanding and Robot-Object Interaction in Human-Robot Collaboration

ALLEGRO, DAVIDE
2026

Abstract

Modern robotics is at a turning point: shifting from rigid, pre-programmed automation to dynamic human-robot collaboration, where adaptability and intelligence drive the next generation of industrial production. The goal is no longer to replace human labor, but to augment it, fusing robotic precision and endurance with human creativity and intuition. To realize this vision, robots must not only move but see, understand, and efficiently act within the ever-changing complexity of the real world. This thesis advances that frontier by proposing a unified framework that bridges perception and action, enabling robots to comprehend and interact within dynamic human environments. The foundation of this framework is a robot-centric perception model, which establishes a coherent and consistent geometric relationship between what the robot perceives and how it moves. A core contribution of this work is the development of novel camera-to-robot calibration techniques that guarantee consistent spatial alignment across heterogeneous setups. These methods generalize to fixed external camera networks observing robotic arms, multi-camera systems mounted on manipulators, and mobile robots operating in unstructured environments, providing the robot with a unified visual reference, as if multiple eyes were coherently aligned within a single coordinate frame. To enable effective collaboration, the framework equips robots with robust human understanding capabilities through a novel multi-view 3D human pose estimation method and a real-time skeleton-based action recognition module. Together, they allow the robot to interpret and anticipate human behavior, turning motion into meaningful collaboration. Finally, the thesis addresses intelligent robot-object interaction through two distinct applications. First, for robot manipulators, we develop a few-shot imitation learning algorithm, guided by a world model, for complex manipulation tasks. Second, for mobile robots, we propose an active mapping strategy for autonomous navigation and reconstruction in unknown environments. This strategy integrates a Fisher Information-driven radiance field (Gaussian Splatting) model to select the most informative viewpoints, while motion prediction optimizes the path to those views. This allows the robot to explore efficiently, focusing attention where learning is most critical. Together, these contributions unify geometry, semantics, and dynamics into a single perceptual-action loop. The resulting system advances the development of robots that can reason, adapt, and collaborate effectively in unstructured and dynamic environments. All proposed methods are released as open-source software to encourage transparency, reproducibility, and collective progress toward perceptually grounded, human-aware robotic intelligence.
12-feb-2026
Inglese
GHIDONI, STEFANO
Università degli studi di Padova
File in questo prodotto:
File Dimensione Formato  
tesi_Davide_Allegro.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 108.77 MB
Formato Adobe PDF
108.77 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/359537
Il codice NBN di questa tesi è URN:NBN:IT:UNIPD-359537