Interacting with humans from the perspective of an Artificial intelligence (AI) agent is a challenging task that is fundamental for a number of applications. AI can be employed for human behaviour understanding and prediction and also in tasks involving natural language. The main focus of this research project is understanding human behavior and their interaction from the perspective of AI agents moving in a human-populated environment. AI has a number of real-world application, which includes self-driving vehicles, robotics, and industrial automation. For instance, for a self driving car system, it is important to predict how pedestrians will act in the near future environment in order to avoid collisions. Furthermore, moving robots should be capable of knowing how to react to the presence of humans in a real environment in order to complete their assigned tasks efficiently and without causing any harm and, optionally, receive, understand, and execute tasks directly from humans. The contribution made through this research project consists of methods and models that aim to interpret human behavior and make predictions from two different perspectives: From the perspective of a static agent, whose task is just making predictions about future behavior given static visual data (like a video recorded by a vehicle's onboard camera); From the perspective of an embodied agent that actively participates in the decision-making/navigation process. To address the core objectives of this study, we formulated three key research questions. First, we sought to determine whether it is possible to predict a person's intentions and future movements from the perspective of a moving agent (RQ1). Second, we explored how to effectively train and evaluate an agent's performance in navigating unseen environments populated by humans (RQ2). Lastly, we investigated whether an AI agent can interpret natural language instructions and relate them to its past observations of the environment (RQ3). Specifically, in the context of active embodied agents, this research project made several important contributions and advancements of the state of the art, proposed a new task and a new protocol for evaluating navigation performances in social contexts.

Static and dynamic approaches for Embodied Social Navigation from the perspective of an autonomous agent

CANCELLI, ENRICO
2025

Abstract

Interacting with humans from the perspective of an Artificial intelligence (AI) agent is a challenging task that is fundamental for a number of applications. AI can be employed for human behaviour understanding and prediction and also in tasks involving natural language. The main focus of this research project is understanding human behavior and their interaction from the perspective of AI agents moving in a human-populated environment. AI has a number of real-world application, which includes self-driving vehicles, robotics, and industrial automation. For instance, for a self driving car system, it is important to predict how pedestrians will act in the near future environment in order to avoid collisions. Furthermore, moving robots should be capable of knowing how to react to the presence of humans in a real environment in order to complete their assigned tasks efficiently and without causing any harm and, optionally, receive, understand, and execute tasks directly from humans. The contribution made through this research project consists of methods and models that aim to interpret human behavior and make predictions from two different perspectives: From the perspective of a static agent, whose task is just making predictions about future behavior given static visual data (like a video recorded by a vehicle's onboard camera); From the perspective of an embodied agent that actively participates in the decision-making/navigation process. To address the core objectives of this study, we formulated three key research questions. First, we sought to determine whether it is possible to predict a person's intentions and future movements from the perspective of a moving agent (RQ1). Second, we explored how to effectively train and evaluate an agent's performance in navigating unseen environments populated by humans (RQ2). Lastly, we investigated whether an AI agent can interpret natural language instructions and relate them to its past observations of the environment (RQ3). Specifically, in the context of active embodied agents, this research project made several important contributions and advancements of the state of the art, proposed a new task and a new protocol for evaluating navigation performances in social contexts.
14-mar-2025
Inglese
BALLAN, LAMBERTO
Università degli studi di Padova
File in questo prodotto:
File Dimensione Formato  
tesi_Enrico_Cancelli.pdf

accesso aperto

Dimensione 21.83 MB
Formato Adobe PDF
21.83 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/200539
Il codice NBN di questa tesi è URN:NBN:IT:UNIPD-200539