Memorization in Recurrent Neural Networks

Carta, Antonio

Rich sources of data such as text, video, or time series, are composed of a sequence of elements. Traditionally, recurrent neural networks have been used to process sequences by keeping a trace of the past in a recursively updated hidden state. The ability of recurrent networks to memorize the past is fundamental to their success. In this thesis, we study recurrent networks and their short-term memory, with the objective of maximizing it. In the literature, most models either do not optimize the short-term memory or they do so in a data-independent way. We propose a conceptual framework that splits recurrent networks into two separate components: a feature extractor and a memorization component. Following this separation, we show how to optimize the short-term memory of recurrent networks. This is a challenging problem, hard to solve by end-to-end backpropagation. We propose several solutions that allow us to efficiently optimize the memorization component. Finally, we apply our approach to two application domains: sentence embeddings for natural language processing and continual learning on sequential data. Overall, we find that optimizing the short-term memory improves the ability of recurrent models to learn long-range dependencies, helps the training process, and provides features that generalize well to unseen data. The findings of this thesis provide a better understanding of short-term memory in recurrent networks and suggest general principles that may be useful to design novel recurrent models with expressive memorization components.

Memorization in Recurrent Neural Networks

CARTA, ANTONIO

2021

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
				6-lug-2021
			
	Lingua
	
				Italiano
			
	Parola chiave
	
				continual learning
recurrent neural networks
sequence autoencoding
short-term memory
			
	Relatore, Supervisor, Advisor o Tutor
	
				Bacciu, Davide
			
	Collezione di appartenenza
	
				Università degli Studi di Pisa

File in questo prodotto:

File	Dimensione	Formato
main.pdf accesso aperto Dimensione 2.34 MB Formato Adobe PDF Visualizza/Apri	2.34 MB	Adobe PDF	Visualizza/Apri
relazione_finale.pdf accesso aperto Dimensione 134.46 kB Formato Adobe PDF Visualizza/Apri	134.46 kB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/215849

Il codice NBN di questa tesi è URN:NBN:IT:UNIPI-215849