This thesis investigates how hidden mental states are encoded in—and can be inferred from—language, gaze, and action in both humans and Large Language Models (LLMs). The first two studies compare humans and LLMs in their ability to infer mental states from language and gaze, whereas the third study examines whether social preferences are encoded in human action. In the first study, we used a comprehensive battery of five language-based tests to systematically compare the performance of humans and several LLMs across diverse mentalistic inference tasks requiring the recognition of false belief, irony, indirect request, faux pas, and higher-order mental states such as manipulation, persuasion, and deception. We found that some LLMs not only passed most tests but even outperformed humans in certain tasks, while still exhibiting systematic deviations from human-like cognition in their responses. In the second study, we extended this comparison beyond text-based tasks by testing whether GPT-4o—a multimodal LLM—can infer complex mental states from people’s eyes, a powerful channel of information for human mindreading. In both the original and the newer multiracial version of the Reading the Mind in the Eyes Test, GPT-4o performed above human level for upright faces. However, this ability was disrupted by face inversion to a greater extent than in humans and was accompanied by a qualitatively distinct error structure. This further supports the finding that, while LLMs can successfully mirror some aspects of human mindreading behavior, consistent deviations in their underlying computations still emerge upon systematic investigation. Having examined language and gaze in the first two studies, in the third and last study we turned to the role of action as a channel of information about others’ minds. Specifically, we investigated whether movement vigor can serve as a continuous proxy for the processes underlying social decision-making, using altruistic punishment as a model of complex social choice. We found that movement vigor increased with offer magnitude when fair offers were accepted but decreased when unfair offers were rejected as a form of punishment for the unfair proposers. In punishment decisions, the reduced vigor was driven not by self- or other-costs alone but by their ratio—the efficiency of the punishment—demonstrating that social preferences are indeed encoded in movement kinematics. Together, these studies advance our understanding of mindreading in both humans and artificial systems and represent initial steps for moving toward more natural human-AI interactions, in which AI systems can access, integrate, and use the same channels of information that humans rely on when reading one another’s minds.
Revealing the Hidden Mind: Mindreading at the Intersection of Language, Gaze, and Action in Humans and AI
PANSARDI, ORIANA
2026
Abstract
This thesis investigates how hidden mental states are encoded in—and can be inferred from—language, gaze, and action in both humans and Large Language Models (LLMs). The first two studies compare humans and LLMs in their ability to infer mental states from language and gaze, whereas the third study examines whether social preferences are encoded in human action. In the first study, we used a comprehensive battery of five language-based tests to systematically compare the performance of humans and several LLMs across diverse mentalistic inference tasks requiring the recognition of false belief, irony, indirect request, faux pas, and higher-order mental states such as manipulation, persuasion, and deception. We found that some LLMs not only passed most tests but even outperformed humans in certain tasks, while still exhibiting systematic deviations from human-like cognition in their responses. In the second study, we extended this comparison beyond text-based tasks by testing whether GPT-4o—a multimodal LLM—can infer complex mental states from people’s eyes, a powerful channel of information for human mindreading. In both the original and the newer multiracial version of the Reading the Mind in the Eyes Test, GPT-4o performed above human level for upright faces. However, this ability was disrupted by face inversion to a greater extent than in humans and was accompanied by a qualitatively distinct error structure. This further supports the finding that, while LLMs can successfully mirror some aspects of human mindreading behavior, consistent deviations in their underlying computations still emerge upon systematic investigation. Having examined language and gaze in the first two studies, in the third and last study we turned to the role of action as a channel of information about others’ minds. Specifically, we investigated whether movement vigor can serve as a continuous proxy for the processes underlying social decision-making, using altruistic punishment as a model of complex social choice. We found that movement vigor increased with offer magnitude when fair offers were accepted but decreased when unfair offers were rejected as a form of punishment for the unfair proposers. In punishment decisions, the reduced vigor was driven not by self- or other-costs alone but by their ratio—the efficiency of the punishment—demonstrating that social preferences are indeed encoded in movement kinematics. Together, these studies advance our understanding of mindreading in both humans and artificial systems and represent initial steps for moving toward more natural human-AI interactions, in which AI systems can access, integrate, and use the same channels of information that humans rely on when reading one another’s minds.| File | Dimensione | Formato | |
|---|---|---|---|
|
Tesi-Pansardi-Oriana.pdf
accesso aperto
Licenza:
Tutti i diritti riservati
Dimensione
3.2 MB
Formato
Adobe PDF
|
3.2 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/359968
URN:NBN:IT:UNITO-359968