Modern robotic systems often exhibit limited flexibility, constraining their range of applications. Consequently, they are restricted to particular tasks, where even minor modifications, such as changes in object dimensions, can require considerable reconfiguration effort. A major bottleneck in achieving adaptable robotic systems is the perception component. This component provides critical information about the objects to manipulate, primarily through vision cameras. Accurate perception is especially crucial in assembly tasks, where robots must precisely localize objects for manipulation. The localization process is traditionally divided into three interdependent subtasks: geometry reconstruction, pose estimation, and grasp prediction. Geometry reconstruction has conventionally relied on depth cameras. However, these cameras are susceptible to numerous limitations, including occlusions, reflections, challenging lighting conditions, and difficulties with small objects—all of which lead to inaccurate or incomplete geometric representations. Neural representations have recently demonstrated exceptional capabilities in Novel View Synthesis (NVS), effectively handling many scenarios where conventional depth cameras struggle. Moreover, unlike traditional methods requiring pre-existing CAD models, neural representations can be generated directly from object images, facilitating the development of more adaptable and flexible robotic systems. While applying NVS techniques to improve robotic manipulation appears promising, their practical integration into functional robotic systems remains challenging and computationally demanding. This thesis establishes a framework that addresses fundamental challenges limiting the application of neural representation techniques to robotics. First, we develop a method that reduces the number of views required for accurate neural object representation by combining traditional view morphing with neural rendering, thereby enhancing practicality for real-world robotic applications. Second, we introduce two complementary 6D pose estimation methods utilizing NVS representations that enable robots to accurately localize objects while preserving the advantages of neural representations. These methods facilitate direct pose validation by comparing query images and synthesized viewpoints. Finally, we enhance grasping precision for anthropomorphic robotic hands by incorporating novel view synthesis during grasp planning, allowing the system to assess potential grasps before execution. Extensive experimental evaluations substantiate the efficacy of our proposed methods across diverse localization and manipulation scenarios, demonstrating their potential to advance robotic perception capabilities.
Neural Representation for robotics
Bortolon, Matteo
2025
Abstract
Modern robotic systems often exhibit limited flexibility, constraining their range of applications. Consequently, they are restricted to particular tasks, where even minor modifications, such as changes in object dimensions, can require considerable reconfiguration effort. A major bottleneck in achieving adaptable robotic systems is the perception component. This component provides critical information about the objects to manipulate, primarily through vision cameras. Accurate perception is especially crucial in assembly tasks, where robots must precisely localize objects for manipulation. The localization process is traditionally divided into three interdependent subtasks: geometry reconstruction, pose estimation, and grasp prediction. Geometry reconstruction has conventionally relied on depth cameras. However, these cameras are susceptible to numerous limitations, including occlusions, reflections, challenging lighting conditions, and difficulties with small objects—all of which lead to inaccurate or incomplete geometric representations. Neural representations have recently demonstrated exceptional capabilities in Novel View Synthesis (NVS), effectively handling many scenarios where conventional depth cameras struggle. Moreover, unlike traditional methods requiring pre-existing CAD models, neural representations can be generated directly from object images, facilitating the development of more adaptable and flexible robotic systems. While applying NVS techniques to improve robotic manipulation appears promising, their practical integration into functional robotic systems remains challenging and computationally demanding. This thesis establishes a framework that addresses fundamental challenges limiting the application of neural representation techniques to robotics. First, we develop a method that reduces the number of views required for accurate neural object representation by combining traditional view morphing with neural rendering, thereby enhancing practicality for real-world robotic applications. Second, we introduce two complementary 6D pose estimation methods utilizing NVS representations that enable robots to accurately localize objects while preserving the advantages of neural representations. These methods facilitate direct pose validation by comparing query images and synthesized viewpoints. Finally, we enhance grasping precision for anthropomorphic robotic hands by incorporating novel view synthesis during grasp planning, allowing the system to assess potential grasps before execution. Extensive experimental evaluations substantiate the efficacy of our proposed methods across diverse localization and manipulation scenarios, demonstrating their potential to advance robotic perception capabilities.| File | Dimensione | Formato | |
|---|---|---|---|
|
bortolon_matteo_phd_thesis.pdf
accesso aperto
Licenza:
Tutti i diritti riservati
Dimensione
97.11 MB
Formato
Adobe PDF
|
97.11 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/307039
URN:NBN:IT:UNITN-307039