Perception from the Robot’s Perspective: Unified Calibration for Human Understanding and Robot-Object Interaction in Human-Robot Collaboration

Allegro, Davide

Modern robotics is at a turning point: shifting from rigid, pre-programmed automation to dynamic human-robot collaboration, where adaptability and intelligence drive the next generation of industrial production. The goal is no longer to replace human labor, but to augment it, fusing robotic precision and endurance with human creativity and intuition. To realize this vision, robots must not only move but see, understand, and efficiently act within the ever-changing complexity of the real world. This thesis advances that frontier by proposing a unified framework that bridges perception and action, enabling robots to comprehend and interact within dynamic human environments. The foundation of this framework is a robot-centric perception model, which establishes a coherent and consistent geometric relationship between what the robot perceives and how it moves. A core contribution of this work is the development of novel camera-to-robot calibration techniques that guarantee consistent spatial alignment across heterogeneous setups. These methods generalize to fixed external camera networks observing robotic arms, multi-camera systems mounted on manipulators, and mobile robots operating in unstructured environments, providing the robot with a unified visual reference, as if multiple eyes were coherently aligned within a single coordinate frame. To enable effective collaboration, the framework equips robots with robust human understanding capabilities through a novel multi-view 3D human pose estimation method and a real-time skeleton-based action recognition module. Together, they allow the robot to interpret and anticipate human behavior, turning motion into meaningful collaboration. Finally, the thesis addresses intelligent robot-object interaction through two distinct applications. First, for robot manipulators, we develop a few-shot imitation learning algorithm, guided by a world model, for complex manipulation tasks. Second, for mobile robots, we propose an active mapping strategy for autonomous navigation and reconstruction in unknown environments. This strategy integrates a Fisher Information-driven radiance field (Gaussian Splatting) model to select the most informative viewpoints, while motion prediction optimizes the path to those views. This allows the robot to explore efficiently, focusing attention where learning is most critical. Together, these contributions unify geometry, semantics, and dynamics into a single perceptual-action loop. The resulting system advances the development of robots that can reason, adapt, and collaborate effectively in unstructured and dynamic environments. All proposed methods are released as open-source software to encourage transparency, reproducibility, and collective progress toward perceptually grounded, human-aware robotic intelligence.

Perception from the Robot’s Perspective: Unified Calibration for Human Understanding and Robot-Object Interaction in Human-Robot Collaboration

ALLEGRO, DAVIDE

2026

Abstract

Modern robotics is at a turning point: shifting from rigid, pre-programmed automation to dynamic human-robot collaboration, where adaptability and intelligence drive the next generation of industrial production. The goal is no longer to replace human labor, but to augment it, fusing robotic precision and endurance with human creativity and intuition. To realize this vision, robots must not only move but see, understand, and efficiently act within the ever-changing complexity of the real world. This thesis advances that frontier by proposing a unified framework that bridges perception and action, enabling robots to comprehend and interact within dynamic human environments. The foundation of this framework is a robot-centric perception model, which establishes a coherent and consistent geometric relationship between what the robot perceives and how it moves. A core contribution of this work is the development of novel camera-to-robot calibration techniques that guarantee consistent spatial alignment across heterogeneous setups. These methods generalize to fixed external camera networks observing robotic arms, multi-camera systems mounted on manipulators, and mobile robots operating in unstructured environments, providing the robot with a unified visual reference, as if multiple eyes were coherently aligned within a single coordinate frame. To enable effective collaboration, the framework equips robots with robust human understanding capabilities through a novel multi-view 3D human pose estimation method and a real-time skeleton-based action recognition module. Together, they allow the robot to interpret and anticipate human behavior, turning motion into meaningful collaboration. Finally, the thesis addresses intelligent robot-object interaction through two distinct applications. First, for robot manipulators, we develop a few-shot imitation learning algorithm, guided by a world model, for complex manipulation tasks. Second, for mobile robots, we propose an active mapping strategy for autonomous navigation and reconstruction in unknown environments. This strategy integrates a Fisher Information-driven radiance field (Gaussian Splatting) model to select the most informative viewpoints, while motion prediction optimizes the path to those views. This allows the robot to explore efficiently, focusing attention where learning is most critical. Together, these contributions unify geometry, semantics, and dynamics into a single perceptual-action loop. The resulting system advances the development of robots that can reason, adapt, and collaborate effectively in unstructured and dynamic environments. All proposed methods are released as open-source software to encourage transparency, reproducibility, and collective progress toward perceptually grounded, human-aware robotic intelligence.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				INGEGNERIA DELL'INFORMAZIONE
			
	Data di pubblicazione
	
				12-feb-2026
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				GHIDONI, STEFANO
			
	Nome Editore
	
				Università degli studi di Padova
			
	Collezione di appartenenza
	
				Università degli Studi di Padova

File in questo prodotto:

File	Dimensione	Formato
tesi_Davide_Allegro.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 108.77 MB Formato Adobe PDF Visualizza/Apri	108.77 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/359537

Il codice NBN di questa tesi è URN:NBN:IT:UNIPD-359537