Modern robotics is at a turning point: shifting from rigid, pre-programmed automation to dynamic human-robot collaboration, where adaptability and intelligence drive the next generation of industrial production. The goal is no longer to replace human labor, but to augment it, fusing robotic precision and endurance with human creativity and intuition. To realize this vision, robots must not only move but see, understand, and efficiently act within the ever-changing complexity of the real world. This thesis advances that frontier by proposing a unified framework that bridges perception and action, enabling robots to comprehend and interact within dynamic human environments. The foundation of this framework is a robot-centric perception model, which establishes a coherent and consistent geometric relationship between what the robot perceives and how it moves. A core contribution of this work is the development of novel camera-to-robot calibration techniques that guarantee consistent spatial alignment across heterogeneous setups. These methods generalize to fixed external camera networks observing robotic arms, multi-camera systems mounted on manipulators, and mobile robots operating in unstructured environments, providing the robot with a unified visual reference, as if multiple eyes were coherently aligned within a single coordinate frame. To enable effective collaboration, the framework equips robots with robust human understanding capabilities through a novel multi-view 3D human pose estimation method and a real-time skeleton-based action recognition module. Together, they allow the robot to interpret and anticipate human behavior, turning motion into meaningful collaboration. Finally, the thesis addresses intelligent robot-object interaction through two distinct applications. First, for robot manipulators, we develop a few-shot imitation learning algorithm, guided by a world model, for complex manipulation tasks. Second, for mobile robots, we propose an active mapping strategy for autonomous navigation and reconstruction in unknown environments. This strategy integrates a Fisher Information-driven radiance field (Gaussian Splatting) model to select the most informative viewpoints, while motion prediction optimizes the path to those views. This allows the robot to explore efficiently, focusing attention where learning is most critical. Together, these contributions unify geometry, semantics, and dynamics into a single perceptual-action loop. The resulting system advances the development of robots that can reason, adapt, and collaborate effectively in unstructured and dynamic environments. All proposed methods are released as open-source software to encourage transparency, reproducibility, and collective progress toward perceptually grounded, human-aware robotic intelligence.
Perception from the Robot’s Perspective: Unified Calibration for Human Understanding and Robot-Object Interaction in Human-Robot Collaboration
ALLEGRO, DAVIDE
2026
Abstract
Modern robotics is at a turning point: shifting from rigid, pre-programmed automation to dynamic human-robot collaboration, where adaptability and intelligence drive the next generation of industrial production. The goal is no longer to replace human labor, but to augment it, fusing robotic precision and endurance with human creativity and intuition. To realize this vision, robots must not only move but see, understand, and efficiently act within the ever-changing complexity of the real world. This thesis advances that frontier by proposing a unified framework that bridges perception and action, enabling robots to comprehend and interact within dynamic human environments. The foundation of this framework is a robot-centric perception model, which establishes a coherent and consistent geometric relationship between what the robot perceives and how it moves. A core contribution of this work is the development of novel camera-to-robot calibration techniques that guarantee consistent spatial alignment across heterogeneous setups. These methods generalize to fixed external camera networks observing robotic arms, multi-camera systems mounted on manipulators, and mobile robots operating in unstructured environments, providing the robot with a unified visual reference, as if multiple eyes were coherently aligned within a single coordinate frame. To enable effective collaboration, the framework equips robots with robust human understanding capabilities through a novel multi-view 3D human pose estimation method and a real-time skeleton-based action recognition module. Together, they allow the robot to interpret and anticipate human behavior, turning motion into meaningful collaboration. Finally, the thesis addresses intelligent robot-object interaction through two distinct applications. First, for robot manipulators, we develop a few-shot imitation learning algorithm, guided by a world model, for complex manipulation tasks. Second, for mobile robots, we propose an active mapping strategy for autonomous navigation and reconstruction in unknown environments. This strategy integrates a Fisher Information-driven radiance field (Gaussian Splatting) model to select the most informative viewpoints, while motion prediction optimizes the path to those views. This allows the robot to explore efficiently, focusing attention where learning is most critical. Together, these contributions unify geometry, semantics, and dynamics into a single perceptual-action loop. The resulting system advances the development of robots that can reason, adapt, and collaborate effectively in unstructured and dynamic environments. All proposed methods are released as open-source software to encourage transparency, reproducibility, and collective progress toward perceptually grounded, human-aware robotic intelligence.| File | Dimensione | Formato | |
|---|---|---|---|
|
tesi_Davide_Allegro.pdf
accesso aperto
Licenza:
Tutti i diritti riservati
Dimensione
108.77 MB
Formato
Adobe PDF
|
108.77 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/359537
URN:NBN:IT:UNIPD-359537