Learning Beyond the Gaussian Approximation: Algorithmic Hardness, Neural Network Dynamics and Independent Components

Bardone, Lorenzo

Neural networks excel at discovering statistical patterns in high-dimensional data sets. In practice, higher-order cumulants, which quantify the non-Gaussian correlations between three or more variables, are particularly important for the performance of neural networks. In this thesis we study the efficiency of neural networks at extracting features from higher-order cumulants. We first discuss the fundamental statistical and computational limits of recovering a single non-Gaussian direction, by computing the number of samples~n required to reliably distinguish between inputs from the spiked cumulant model and isotropic Gaussian inputs. Our findings confirm the existence of a wide statistical-to-computational gap: statistical distinguishability requires n > d samples, while distinguishing the two distributions in polynomial time requires n > d^2 samples for the class of algorithms covered by the low degree conjecture. Numerical experiments show that neural networks do indeed learn to distinguish the two distributions with quadratic sample complexity. Then, to further bridge the gap between theory and practice, we show that correlations between latent variables along the directions encoded in different input cumulants speed up learning from higher-order cumulants, reaching linear sample complexity. We show this effect analytically by deriving nearly sharp thresholds for the number of samples required by a single neuron to weakly-recover these directions using online SGD from a random start in high dimensions. Our analytical results are confirmed in simulations of two-layer neural networks and unveil a new mechanism for hierarchical learning in neural networks. Finally we connect to feature learning of deep networks via the proxy of independent component analysis (ICA), a simple unsupervised method that seeks the most non-Gaussian projections of its inputs. The motivation is that ICA finds features that are similar to the first-layer filters learned by deep convolutional networks on classification tasks. This similarity suggests that ICA provides a simple, yet principled model for studying feature learning. Here, we prove that FastICA, the most popular ICA algorithm, requires at least n> d^4 samples to recover a single non-Gaussian direction from d-dimensional inputs on a simple synthetic data model. We show that vanilla online SGD outperforms FastICA, and prove that the optimal sample complexity n > d^2 can be reached by smoothing the loss, albeit in a data-dependent way. We conclude by discussing how this picture extends to different distributions and architectures, such as generative diffusion models.

Learning Beyond the Gaussian Approximation: Algorithmic Hardness, Neural Network Dynamics and Independent Components

BARDONE, LORENZO

2025

Abstract

Neural networks excel at discovering statistical patterns in high-dimensional data sets. In practice, higher-order cumulants, which quantify the non-Gaussian correlations between three or more variables, are particularly important for the performance of neural networks. In this thesis we study the efficiency of neural networks at extracting features from higher-order cumulants. We first discuss the fundamental statistical and computational limits of recovering a single non-Gaussian direction, by computing the number of samples~n required to reliably distinguish between inputs from the spiked cumulant model and isotropic Gaussian inputs. Our findings confirm the existence of a wide statistical-to-computational gap: statistical distinguishability requires n > d samples, while distinguishing the two distributions in polynomial time requires n > d^2 samples for the class of algorithms covered by the low degree conjecture. Numerical experiments show that neural networks do indeed learn to distinguish the two distributions with quadratic sample complexity. Then, to further bridge the gap between theory and practice, we show that correlations between latent variables along the directions encoded in different input cumulants speed up learning from higher-order cumulants, reaching linear sample complexity. We show this effect analytically by deriving nearly sharp thresholds for the number of samples required by a single neuron to weakly-recover these directions using online SGD from a random start in high dimensions. Our analytical results are confirmed in simulations of two-layer neural networks and unveil a new mechanism for hierarchical learning in neural networks. Finally we connect to feature learning of deep networks via the proxy of independent component analysis (ICA), a simple unsupervised method that seeks the most non-Gaussian projections of its inputs. The motivation is that ICA finds features that are similar to the first-layer filters learned by deep convolutional networks on classification tasks. This similarity suggests that ICA provides a simple, yet principled model for studying feature learning. Here, we prove that FastICA, the most popular ICA algorithm, requires at least n> d^4 samples to recover a single non-Gaussian direction from d-dimensional inputs on a simple synthetic data model. We show that vanilla online SGD outperforms FastICA, and prove that the optimal sample complexity n > d^2 can be reached by smoothing the loss, albeit in a data-dependent way. We conclude by discussing how this picture extends to different distributions and architectures, such as generative diffusion models.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				Mathematical Analysis, Modelling  and Applications
			
	Data di pubblicazione
	
				25-set-2025
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				Goldt, Sebastian Dominik
			
	Nome Editore
	
				SISSA
			
	Città Editore
	
				Trieste
			
	Collezione di appartenenza
	
				Scuola Internazionale Superiore di Studi Avanzati di Trieste

File in questo prodotto:

File	Dimensione	Formato
PhD_thesis-Bardone.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 8.53 MB Formato Adobe PDF Visualizza/Apri	8.53 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/295811

Il codice NBN di questa tesi è URN:NBN:IT:SISSA-295811