The main objective of this thesis is to explore the physical mechanisms of the so called Hopfield model of neural networks and to recover some quantitative results regarding a few of its possible variations. The Hopfield model is a paradigm of theoretical physics and computational neuroscience because it is a model of interacting neurons that shows how such simple units are able to store and retrieve memories in a statistical mechanical framework. Though the structure of the Hopfield model has been inspired by neuroscience, just like neural networks, its applications can be various and a simulation of such a model has been done also in the context of random optical systems. A modern application of the Hopfield network has been found by Hopfield himself, he was able to build an associative memory model related to the original Hopfield model, but with multiple neuron interactions. He, then, mapped it onto a neural network with one hidden layer and an unusual activation function and its utility has been illustrated for two test cases: the logical gate XOR and the recognition of handwritten digits from the MNIST data set. The Hopfield model is useful also in the case of inference problems and, in the context of neural networks, it has been proven to be equivalent to the so called attention mechanism used by Transformers if the update rule is replaced with a particular one. In recent decades, with the improvement of human computational power, neural networks have proven themselves very useful and they have been widely used not only in the academic world. For this reason, scientists started to wonder more and more about the physical reasons and laws that allow such networks to work. In order to do that, we start, in Section 2, by studying a class of Hopfield models where the memories are represented by a mixture of Gaussian and binary variables and the neurons are Ising spins and for this reason we say that the continuous pattern variables are partially mismatched with the binary system variables. We study the properties of this family of models as the relative weight of the two kinds of variables in the patterns varies. We quantitatively determine how the maximum number of memories that can be stored without letting the system lose the property to act as an associative memory squeezes towards zero as the memory patterns contain a larger fraction of mismatched variables. As the memory is purely Gaussian retrieval is lost for any positive storage capacity. It is shown that this comes about because of the spherical symmetry of the free energy in the Gaussian case. Introducing two different memory pattern overlaps between spin configurations and each contribution to the pattern from the two kinds of variables one can observe that the Gaussian parts of the patterns act as a noise, making retrieval more difficult. The basins of attraction of the states, the accuracy of the retrieval and the storage capacity are studied by means of Monte Carlo numerical simulations. We uncover that even in the limit where the network capacity shrinks to zero, the (few) retrieval states maintain a large basin of attraction and large overlaps with the mismatched patterns. So the network can be used for retrieval, but with a very small capacity. In Section 3, we take into account other possible variations of the Hopfield model and, precisely, the spherical Hopfield model with Gaussian and bimodal memorized patterns. In both cases, the same spherical symmetry that causes the aforementioned lack of retrieval persists and, this time, we use a higher order term in the Hamiltonian (4-bodies interactions) to recover retrieval. In Section 4 we obtain the mean field dynamical equation of the spherical (2+4)- Hopfield model for the macroscopic observables and we present a gradient descent analysis of the out of equilibrium dynamics of such a model in order to better understand its retrieval properties.
Retrieving mismatched memory patterns in the Hopfield model of neural networks
PATTI, ALBERTO
2023
Abstract
The main objective of this thesis is to explore the physical mechanisms of the so called Hopfield model of neural networks and to recover some quantitative results regarding a few of its possible variations. The Hopfield model is a paradigm of theoretical physics and computational neuroscience because it is a model of interacting neurons that shows how such simple units are able to store and retrieve memories in a statistical mechanical framework. Though the structure of the Hopfield model has been inspired by neuroscience, just like neural networks, its applications can be various and a simulation of such a model has been done also in the context of random optical systems. A modern application of the Hopfield network has been found by Hopfield himself, he was able to build an associative memory model related to the original Hopfield model, but with multiple neuron interactions. He, then, mapped it onto a neural network with one hidden layer and an unusual activation function and its utility has been illustrated for two test cases: the logical gate XOR and the recognition of handwritten digits from the MNIST data set. The Hopfield model is useful also in the case of inference problems and, in the context of neural networks, it has been proven to be equivalent to the so called attention mechanism used by Transformers if the update rule is replaced with a particular one. In recent decades, with the improvement of human computational power, neural networks have proven themselves very useful and they have been widely used not only in the academic world. For this reason, scientists started to wonder more and more about the physical reasons and laws that allow such networks to work. In order to do that, we start, in Section 2, by studying a class of Hopfield models where the memories are represented by a mixture of Gaussian and binary variables and the neurons are Ising spins and for this reason we say that the continuous pattern variables are partially mismatched with the binary system variables. We study the properties of this family of models as the relative weight of the two kinds of variables in the patterns varies. We quantitatively determine how the maximum number of memories that can be stored without letting the system lose the property to act as an associative memory squeezes towards zero as the memory patterns contain a larger fraction of mismatched variables. As the memory is purely Gaussian retrieval is lost for any positive storage capacity. It is shown that this comes about because of the spherical symmetry of the free energy in the Gaussian case. Introducing two different memory pattern overlaps between spin configurations and each contribution to the pattern from the two kinds of variables one can observe that the Gaussian parts of the patterns act as a noise, making retrieval more difficult. The basins of attraction of the states, the accuracy of the retrieval and the storage capacity are studied by means of Monte Carlo numerical simulations. We uncover that even in the limit where the network capacity shrinks to zero, the (few) retrieval states maintain a large basin of attraction and large overlaps with the mismatched patterns. So the network can be used for retrieval, but with a very small capacity. In Section 3, we take into account other possible variations of the Hopfield model and, precisely, the spherical Hopfield model with Gaussian and bimodal memorized patterns. In both cases, the same spherical symmetry that causes the aforementioned lack of retrieval persists and, this time, we use a higher order term in the Hamiltonian (4-bodies interactions) to recover retrieval. In Section 4 we obtain the mean field dynamical equation of the spherical (2+4)- Hopfield model for the macroscopic observables and we present a gradient descent analysis of the out of equilibrium dynamics of such a model in order to better understand its retrieval properties.File | Dimensione | Formato | |
---|---|---|---|
Tesi_dottorato_Patti.pdf
accesso aperto
Dimensione
1.71 MB
Formato
Adobe PDF
|
1.71 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/100102
URN:NBN:IT:UNIROMA1-100102