Modern neural networks achieve human performance on individual tasks, but they tend to forget previously learned knowledge when trained on sequences of tasks. As the models learn subsequent tasks in the sequence, they lose the ability to accurately perform the previously learned ones: a phenomenon known as catastrophic forgetting. Continual learning (CL) methods aim at mitigating this issue by balancing the network's plasticity and stability, thus limiting interference between tasks. However, most of these approaches focus on improving model performance, without providing insights about what is happening internally. Our work addresses this gap by structuring and contributing to the field of eXplainable Artificial Intelligence (XAI)-guided CL, with a focus on self-interpretable approaches. First, we provide a survey of the existing XAI-guided CL methods, with the goals of encouraging research on the topic, unifying benchmarks and terminology, and identifying potential research avenues. Secondly, we introduce new self-interpretable architectures and develop novel XAI-guided CL approaches. The presented architectures rely on human-understandable concepts or prototypes, shedding light on the networks' inner workings, and providing insights into how old and new information is aggregated during CL. Thirdly, we show how to gain insights into how new and past information is integrated in artificial and biological neural networks, respectively. This objective is achieved by directly analyzing the alignment between the representations of the two systems through XAI. We additionally explore diverse application domains, including images, text, graphs, and reinforcement learning. Our findings demonstrate that XAI can serve a dual function: it can be applied to brain alignment to identify gaps in current computational models of cognition, and to CL to enhance performance and interpretability. Taken together, these applications lay the foundation for developing continual learners that are both interpretable and neuro-inspired. Empirically, our methods consistently outperform existing baselines in both class- and task-incremental learning, improve replay strategies in reinforcement learning, and provide novel insights into explanation drift and the role of long-range dependencies in brain–language model alignment.
Memory, explainability, and brain alignment: towards brain-inspired explainable continual learning
PROIETTI, MICHELA
2026
Abstract
Modern neural networks achieve human performance on individual tasks, but they tend to forget previously learned knowledge when trained on sequences of tasks. As the models learn subsequent tasks in the sequence, they lose the ability to accurately perform the previously learned ones: a phenomenon known as catastrophic forgetting. Continual learning (CL) methods aim at mitigating this issue by balancing the network's plasticity and stability, thus limiting interference between tasks. However, most of these approaches focus on improving model performance, without providing insights about what is happening internally. Our work addresses this gap by structuring and contributing to the field of eXplainable Artificial Intelligence (XAI)-guided CL, with a focus on self-interpretable approaches. First, we provide a survey of the existing XAI-guided CL methods, with the goals of encouraging research on the topic, unifying benchmarks and terminology, and identifying potential research avenues. Secondly, we introduce new self-interpretable architectures and develop novel XAI-guided CL approaches. The presented architectures rely on human-understandable concepts or prototypes, shedding light on the networks' inner workings, and providing insights into how old and new information is aggregated during CL. Thirdly, we show how to gain insights into how new and past information is integrated in artificial and biological neural networks, respectively. This objective is achieved by directly analyzing the alignment between the representations of the two systems through XAI. We additionally explore diverse application domains, including images, text, graphs, and reinforcement learning. Our findings demonstrate that XAI can serve a dual function: it can be applied to brain alignment to identify gaps in current computational models of cognition, and to CL to enhance performance and interpretability. Taken together, these applications lay the foundation for developing continual learners that are both interpretable and neuro-inspired. Empirically, our methods consistently outperform existing baselines in both class- and task-incremental learning, improve replay strategies in reinforcement learning, and provide novel insights into explanation drift and the role of long-range dependencies in brain–language model alignment.| File | Dimensione | Formato | |
|---|---|---|---|
|
Tesi_dottorato_Proietti.pdf
accesso aperto
Licenza:
Creative Commons
Dimensione
21.92 MB
Formato
Adobe PDF
|
21.92 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/358418
URN:NBN:IT:UNIROMA1-358418