This thesis investigates interpretability in Neural Language Models (NLMs), addressing the fundamental question of how these systems represent and process linguistic information. Interpretability is framed as the study of how linguistic representations in NLMs relate to mechanisms underlying human language understanding, examined through two complementary perspectives: cognitively inspired evaluation and cognitively grounded modeling. The first perspective explores how evaluation benchmarks can function as diagnostic instruments by isolating phenomena central to human comprehension, such as temporal reasoning and discourse coherence, thus exposing the depth, structure, and limitations of the linguistic knowledge encoded in NLMs. The second perspective integrates human reading signals, in particular eye-tracking data, into model training to assess whether cognitive supervision can guide neural attention toward more human-like processing strategies. Taken together, these approaches advance interpretability beyond post hoc explanation, positioning it as a guiding principle for the design and analysis of models that are both cognitively informed and computationally effective.

Human in Neural Language Models: Interpreting Encoder-Based Language Models with Cognitive Signals and New Evaluation Strategies

DINI, LUCA
2026

Abstract

This thesis investigates interpretability in Neural Language Models (NLMs), addressing the fundamental question of how these systems represent and process linguistic information. Interpretability is framed as the study of how linguistic representations in NLMs relate to mechanisms underlying human language understanding, examined through two complementary perspectives: cognitively inspired evaluation and cognitively grounded modeling. The first perspective explores how evaluation benchmarks can function as diagnostic instruments by isolating phenomena central to human comprehension, such as temporal reasoning and discourse coherence, thus exposing the depth, structure, and limitations of the linguistic knowledge encoded in NLMs. The second perspective integrates human reading signals, in particular eye-tracking data, into model training to assess whether cognitive supervision can guide neural attention toward more human-like processing strategies. Taken together, these approaches advance interpretability beyond post hoc explanation, positioning it as a guiding principle for the design and analysis of models that are both cognitively informed and computationally effective.
16-feb-2026
Inglese
natural language processing
neural language models
cognitive grounding
interpretability
eye-tracking
discourse coherence
temporal relations
Dell'Orletta, Felice
Brunato, Dominique
File in questo prodotto:
File Dimensione Formato  
tesi_luca_dini.pdf

accesso aperto

Licenza: Creative Commons
Dimensione 8.09 MB
Formato Adobe PDF
8.09 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/359112
Il codice NBN di questa tesi è URN:NBN:IT:UNIPI-359112