Since the inception of the Artificial Intelligence (AI) field, one of the main long-term objectives has been to understand and replicate intelligence in order to create systems capable of learning and behaving in a human-like manner. However, this task has proven to be extremely difficult for both traditional AI systems and neural approaches due to the phenomenon of catastrophic forgetting. When exposed to new data, neural network systems tend to erase previously learned knowledge. In this context, Continual Learning (CL) has emerged as a research field aimed at mitigating this behavior and moving towards AI systems that mimic human learning capabilities in lifelong learning tasks and environments. With the deep learning shift in Machine Translation (MT) and Natural Language Processing (NLP) systems, these characteristics have become even more desirable, given the substantial resources involved in training these models, especially in terms of training efficiency and transferability of knowledge. In this dissertation, we provide a practical contribution to this research area. We begin by reviewing fundamental concepts and theoretical aspects of Neural Machine Translation (NMT) and then survey prominent CL methodologies. Building on this foundation, we propose a Continual Learning framework for NMT with the goal of incrementally learning multilingual translation systems. We introduce the Continual Incremental Language Learning setting as a starting point to explore data selection strategies that enhance training efficiency when using effective continual learning strategies such as replay buffers. Furthermore, we demonstrate that employing an NMT model both as a learner and as a generator of replay data is effective in mitigating performance loss during continued training, alleviating several requirements related to training data storage. Within this incremental language learning context, we empirically evaluate, through quantitative and qualitative analyses, both the classical training paradigm and the pre-training and fine-tuning paradigm. We discuss their unique aspects when employing classical data-based rehearsal strategies. We extend our analysis to non-autoregressive NMT models and compare them to state-of-the-art autoregressive NMT systems. Through this work, we aim to provide a comprehensive framework and practical insights into continual learning for NMT, ultimately highlighting the needs and benefits of this learning paradigm.

Continual Incremental Language Learning for Neural Machine Translation

RESTA, MICHELE
2024

Abstract

Since the inception of the Artificial Intelligence (AI) field, one of the main long-term objectives has been to understand and replicate intelligence in order to create systems capable of learning and behaving in a human-like manner. However, this task has proven to be extremely difficult for both traditional AI systems and neural approaches due to the phenomenon of catastrophic forgetting. When exposed to new data, neural network systems tend to erase previously learned knowledge. In this context, Continual Learning (CL) has emerged as a research field aimed at mitigating this behavior and moving towards AI systems that mimic human learning capabilities in lifelong learning tasks and environments. With the deep learning shift in Machine Translation (MT) and Natural Language Processing (NLP) systems, these characteristics have become even more desirable, given the substantial resources involved in training these models, especially in terms of training efficiency and transferability of knowledge. In this dissertation, we provide a practical contribution to this research area. We begin by reviewing fundamental concepts and theoretical aspects of Neural Machine Translation (NMT) and then survey prominent CL methodologies. Building on this foundation, we propose a Continual Learning framework for NMT with the goal of incrementally learning multilingual translation systems. We introduce the Continual Incremental Language Learning setting as a starting point to explore data selection strategies that enhance training efficiency when using effective continual learning strategies such as replay buffers. Furthermore, we demonstrate that employing an NMT model both as a learner and as a generator of replay data is effective in mitigating performance loss during continued training, alleviating several requirements related to training data storage. Within this incremental language learning context, we empirically evaluate, through quantitative and qualitative analyses, both the classical training paradigm and the pre-training and fine-tuning paradigm. We discuss their unique aspects when employing classical data-based rehearsal strategies. We extend our analysis to non-autoregressive NMT models and compare them to state-of-the-art autoregressive NMT systems. Through this work, we aim to provide a comprehensive framework and practical insights into continual learning for NMT, ultimately highlighting the needs and benefits of this learning paradigm.
6-ott-2024
Italiano
catastrophic forgetting
continual learning
incremental language learning
lifelong learning
neural machine translation
Bacciu, Davide
File in questo prodotto:
File Dimensione Formato  
phd_activities_pdfa.pdf

non disponibili

Dimensione 85.41 kB
Formato Adobe PDF
85.41 kB Adobe PDF
resta_thesis_final_1.pdf

accesso aperto

Dimensione 4.37 MB
Formato Adobe PDF
4.37 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/216396
Il codice NBN di questa tesi è URN:NBN:IT:UNIPI-216396