At the core of modern Artificial Intelligence (AI) are Neural Networks, complex computational models that take inspiration from the interconnected structure of neurons in the brain, which underpins natural intelligence. Neural networks extract meaningful patterns from data by first projecting them into high-dimensional latent representations, which then can be leveraged to solve a variety of downstream tasks. In doing so, mimicking the brain, they rely on the interplay between several interconnected computational units that, taken individually, perform simple operations. However, when observed at the full model scale, their collective behavior can result in the onset of unforeseen emergent properties. Understanding these properties could lead to enhancements in the performance and reliability of neural networks and it is an open research problem in AI interpretability. In this thesis, our aim is to investigate some of these properties, by analyzing the latent representations learned by neural networks.
At the core of modern Artificial Intelligence (AI) are Neural Networks, complex computational models that take inspiration from the interconnected structure of neurons in the brain, which underpins natural intelligence. Neural networks extract meaningful patterns from data by first projecting them into high-dimensional latent representations, which then can be leveraged to solve a variety of downstream tasks. In doing so, mimicking the brain, they rely on the interplay between several interconnected computational units that, taken individually, perform simple operations. However, when observed at the full model scale, their collective behavior can result in the onset of unforeseen emergent properties. Understanding these properties could lead to enhancements in the performance and reliability of neural networks and it is an open research problem in AI interpretability. In this thesis, our aim is to investigate some of these properties, by analyzing the latent representations learned by neural networks.
A Representation Learning Perspective on some Emergent Properties of Neural Networks
BASILE, LORENZO
2025
Abstract
At the core of modern Artificial Intelligence (AI) are Neural Networks, complex computational models that take inspiration from the interconnected structure of neurons in the brain, which underpins natural intelligence. Neural networks extract meaningful patterns from data by first projecting them into high-dimensional latent representations, which then can be leveraged to solve a variety of downstream tasks. In doing so, mimicking the brain, they rely on the interplay between several interconnected computational units that, taken individually, perform simple operations. However, when observed at the full model scale, their collective behavior can result in the onset of unforeseen emergent properties. Understanding these properties could lead to enhancements in the performance and reliability of neural networks and it is an open research problem in AI interpretability. In this thesis, our aim is to investigate some of these properties, by analyzing the latent representations learned by neural networks.I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/188298
URN:NBN:IT:UNITS-188298