This thesis investigates interpretability in Neural Language Models (NLMs), addressing the fundamental question of how these systems represent and process linguistic information. Interpretability is framed as the study of how linguistic representations in NLMs relate to mechanisms underlying human language understanding, examined through two complementary perspectives: cognitively inspired evaluation and cognitively grounded modeling. The first perspective explores how evaluation benchmarks can function as diagnostic instruments by isolating phenomena central to human comprehension, such as temporal reasoning and discourse coherence, thus exposing the depth, structure, and limitations of the linguistic knowledge encoded in NLMs. The second perspective integrates human reading signals, in particular eye-tracking data, into model training to assess whether cognitive supervision can guide neural attention toward more human-like processing strategies. Taken together, these approaches advance interpretability beyond post hoc explanation, positioning it as a guiding principle for the design and analysis of models that are both cognitively informed and computationally effective.
Human in Neural Language Models: Interpreting Encoder-Based Language Models with Cognitive Signals and New Evaluation Strategies
DINI, LUCA
2026
Abstract
This thesis investigates interpretability in Neural Language Models (NLMs), addressing the fundamental question of how these systems represent and process linguistic information. Interpretability is framed as the study of how linguistic representations in NLMs relate to mechanisms underlying human language understanding, examined through two complementary perspectives: cognitively inspired evaluation and cognitively grounded modeling. The first perspective explores how evaluation benchmarks can function as diagnostic instruments by isolating phenomena central to human comprehension, such as temporal reasoning and discourse coherence, thus exposing the depth, structure, and limitations of the linguistic knowledge encoded in NLMs. The second perspective integrates human reading signals, in particular eye-tracking data, into model training to assess whether cognitive supervision can guide neural attention toward more human-like processing strategies. Taken together, these approaches advance interpretability beyond post hoc explanation, positioning it as a guiding principle for the design and analysis of models that are both cognitively informed and computationally effective.| File | Dimensione | Formato | |
|---|---|---|---|
|
tesi_luca_dini.pdf
accesso aperto
Licenza:
Creative Commons
Dimensione
8.09 MB
Formato
Adobe PDF
|
8.09 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/359112
URN:NBN:IT:UNIPI-359112