This thesis investigates the embodied AI paradigm, its introduction into active perception and semantical reasoning pipelines, and explores its practical applications in robotics. Central to my research is the action-perception loop, where an agent follows a policy to explore an environment while significantly enhancing its perception—key facets in real-world applications. Embodied AI raises two challenges: (i) evaluating the performance of a policy in a real world scenario is risky, and (ii) complex tasks, e.g., rearranging objects in a room, require a deep understanding of the environment that goes beyond simple perception. I first tackle the issue of deploying an agent on a real-world robotic platform. I propose a novel approach for evaluating agent performance through efficient offline policy evaluation without the need for direct deployment. This method is particularly relevant when deploying in the target scenario is either unethical (e.g., healthcare), expensive (e.g., robotics), or unsafe (e.g., self-driving cars). Secondly, I delve into the complexities of spatial and semantic reasoning. Here, I introduce a novel diffusion model formulation, explicitly designed for tasks involving spatial and semantic reasoning, such as rearranging a room or solving puzzles. In summary, this thesis presents significant contributions in the domains of active exploration, offline policy evaluation, and spatial reasoning. My findings and methodologies not only advance academic understanding but also have substantial implications for the development of real-world robotic applications.
Embodied Active Perception for Spatial and Semantical Reasoning
SCARPELLINI, GIANLUCA
2024
Abstract
This thesis investigates the embodied AI paradigm, its introduction into active perception and semantical reasoning pipelines, and explores its practical applications in robotics. Central to my research is the action-perception loop, where an agent follows a policy to explore an environment while significantly enhancing its perception—key facets in real-world applications. Embodied AI raises two challenges: (i) evaluating the performance of a policy in a real world scenario is risky, and (ii) complex tasks, e.g., rearranging objects in a room, require a deep understanding of the environment that goes beyond simple perception. I first tackle the issue of deploying an agent on a real-world robotic platform. I propose a novel approach for evaluating agent performance through efficient offline policy evaluation without the need for direct deployment. This method is particularly relevant when deploying in the target scenario is either unethical (e.g., healthcare), expensive (e.g., robotics), or unsafe (e.g., self-driving cars). Secondly, I delve into the complexities of spatial and semantic reasoning. Here, I introduce a novel diffusion model formulation, explicitly designed for tasks involving spatial and semantic reasoning, such as rearranging a room or solving puzzles. In summary, this thesis presents significant contributions in the domains of active exploration, offline policy evaluation, and spatial reasoning. My findings and methodologies not only advance academic understanding but also have substantial implications for the development of real-world robotic applications.File | Dimensione | Formato | |
---|---|---|---|
phdunige_4965929.pdf
accesso aperto
Dimensione
33.47 MB
Formato
Adobe PDF
|
33.47 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/125963
URN:NBN:IT:UNIGE-125963