Rich sources of data such as text, video, or time series, are composed of a sequence of elements. Traditionally, recurrent neural networks have been used to process sequences by keeping a trace of the past in a recursively updated hidden state. The ability of recurrent networks to memorize the past is fundamental to their success. In this thesis, we study recurrent networks and their short-term memory, with the objective of maximizing it. In the literature, most models either do not optimize the short-term memory or they do so in a data-independent way. We propose a conceptual framework that splits recurrent networks into two separate components: a feature extractor and a memorization component. Following this separation, we show how to optimize the short-term memory of recurrent networks. This is a challenging problem, hard to solve by end-to-end backpropagation. We propose several solutions that allow us to efficiently optimize the memorization component. Finally, we apply our approach to two application domains: sentence embeddings for natural language processing and continual learning on sequential data. Overall, we find that optimizing the short-term memory improves the ability of recurrent models to learn long-range dependencies, helps the training process, and provides features that generalize well to unseen data. The findings of this thesis provide a better understanding of short-term memory in recurrent networks and suggest general principles that may be useful to design novel recurrent models with expressive memorization components.
Memorization in Recurrent Neural Networks
CARTA, ANTONIO
2021
Abstract
Rich sources of data such as text, video, or time series, are composed of a sequence of elements. Traditionally, recurrent neural networks have been used to process sequences by keeping a trace of the past in a recursively updated hidden state. The ability of recurrent networks to memorize the past is fundamental to their success. In this thesis, we study recurrent networks and their short-term memory, with the objective of maximizing it. In the literature, most models either do not optimize the short-term memory or they do so in a data-independent way. We propose a conceptual framework that splits recurrent networks into two separate components: a feature extractor and a memorization component. Following this separation, we show how to optimize the short-term memory of recurrent networks. This is a challenging problem, hard to solve by end-to-end backpropagation. We propose several solutions that allow us to efficiently optimize the memorization component. Finally, we apply our approach to two application domains: sentence embeddings for natural language processing and continual learning on sequential data. Overall, we find that optimizing the short-term memory improves the ability of recurrent models to learn long-range dependencies, helps the training process, and provides features that generalize well to unseen data. The findings of this thesis provide a better understanding of short-term memory in recurrent networks and suggest general principles that may be useful to design novel recurrent models with expressive memorization components.File | Dimensione | Formato | |
---|---|---|---|
main.pdf
accesso aperto
Dimensione
2.34 MB
Formato
Adobe PDF
|
2.34 MB | Adobe PDF | Visualizza/Apri |
relazione_finale.pdf
accesso aperto
Dimensione
134.46 kB
Formato
Adobe PDF
|
134.46 kB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/215849
URN:NBN:IT:UNIPI-215849