Neural Representation for robotics

Bortolon, Matteo

Modern robotic systems often exhibit limited flexibility, constraining their range of applications. Consequently, they are restricted to particular tasks, where even minor modifications, such as changes in object dimensions, can require considerable reconfiguration effort. A major bottleneck in achieving adaptable robotic systems is the perception component. This component provides critical information about the objects to manipulate, primarily through vision cameras. Accurate perception is especially crucial in assembly tasks, where robots must precisely localize objects for manipulation. The localization process is traditionally divided into three interdependent subtasks: geometry reconstruction, pose estimation, and grasp prediction. Geometry reconstruction has conventionally relied on depth cameras. However, these cameras are susceptible to numerous limitations, including occlusions, reflections, challenging lighting conditions, and difficulties with small objects—all of which lead to inaccurate or incomplete geometric representations. Neural representations have recently demonstrated exceptional capabilities in Novel View Synthesis (NVS), effectively handling many scenarios where conventional depth cameras struggle. Moreover, unlike traditional methods requiring pre-existing CAD models, neural representations can be generated directly from object images, facilitating the development of more adaptable and flexible robotic systems. While applying NVS techniques to improve robotic manipulation appears promising, their practical integration into functional robotic systems remains challenging and computationally demanding. This thesis establishes a framework that addresses fundamental challenges limiting the application of neural representation techniques to robotics. First, we develop a method that reduces the number of views required for accurate neural object representation by combining traditional view morphing with neural rendering, thereby enhancing practicality for real-world robotic applications. Second, we introduce two complementary 6D pose estimation methods utilizing NVS representations that enable robots to accurately localize objects while preserving the advantages of neural representations. These methods facilitate direct pose validation by comparing query images and synthesized viewpoints. Finally, we enhance grasping precision for anthropomorphic robotic hands by incorporating novel view synthesis during grasp planning, allowing the system to assess potential grasps before execution. Extensive experimental evaluations substantiate the efficacy of our proposed methods across diverse localization and manipulation scenarios, demonstrating their potential to advance robotic perception capabilities.

Neural Representation for robotics

Bortolon, Matteo

2025

Abstract

Modern robotic systems often exhibit limited flexibility, constraining their range of applications. Consequently, they are restricted to particular tasks, where even minor modifications, such as changes in object dimensions, can require considerable reconfiguration effort. A major bottleneck in achieving adaptable robotic systems is the perception component. This component provides critical information about the objects to manipulate, primarily through vision cameras. Accurate perception is especially crucial in assembly tasks, where robots must precisely localize objects for manipulation. The localization process is traditionally divided into three interdependent subtasks: geometry reconstruction, pose estimation, and grasp prediction. Geometry reconstruction has conventionally relied on depth cameras. However, these cameras are susceptible to numerous limitations, including occlusions, reflections, challenging lighting conditions, and difficulties with small objects—all of which lead to inaccurate or incomplete geometric representations. Neural representations have recently demonstrated exceptional capabilities in Novel View Synthesis (NVS), effectively handling many scenarios where conventional depth cameras struggle. Moreover, unlike traditional methods requiring pre-existing CAD models, neural representations can be generated directly from object images, facilitating the development of more adaptable and flexible robotic systems. While applying NVS techniques to improve robotic manipulation appears promising, their practical integration into functional robotic systems remains challenging and computationally demanding. This thesis establishes a framework that addresses fundamental challenges limiting the application of neural representation techniques to robotics. First, we develop a method that reduces the number of views required for accurate neural object representation by combining traditional view morphing with neural rendering, thereby enhancing practicality for real-world robotic applications. Second, we introduce two complementary 6D pose estimation methods utilizing NVS representations that enable robots to accurately localize objects while preserving the advantages of neural representations. These methods facilitate direct pose validation by comparing query images and synthesized viewpoints. Finally, we enhance grasping precision for anthropomorphic robotic hands by incorporating novel view synthesis during grasp planning, allowing the system to assess potential grasps before execution. Extensive experimental evaluations substantiate the efficacy of our proposed methods across diverse localization and manipulation scenarios, demonstrating their potential to advance robotic perception capabilities.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				Università degli Studi di Trento
			
	Corso di studio
	
				Information and Communication Technology
			
	Data di pubblicazione
	
				16-ott-2025
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				Poiesi, Fabio
			
	Nome Editore
	
				Università degli studi di Trento
			
	Città Editore
	
				TRENTO
			
	Numero di pagine
	
				125
			
	Collezione di appartenenza
	
				Università degli Studi di Trento

File in questo prodotto:

File	Dimensione	Formato
bortolon_matteo_phd_thesis.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 97.11 MB Formato Adobe PDF Visualizza/Apri	97.11 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/307039

Il codice NBN di questa tesi è URN:NBN:IT:UNITN-307039