This thesis investigates diffusion models with a statistical physics point of view, focusing on phase transitions and symmetry breaking events. First, we analyze the reverse diffusion process under the empirical score function for structured data. What we obtain is the description of a dynamical landscape and the characterization of the most important transition times, namely the memorization time. We also give a definition of generalization in this context, observing that interestingly it always happens after the model starts memorizing. Then, we give a geometric description, exploring the spectral properties of the score function using tools from random matrix theory. By analyzing the Jacobian spectra of the score, we identify the emergence of geometric phases linked to spectral gaps. We also study the phenomenon of geometric memorization, demonstrating that it is characterized by a loss of dimensionality where some features of the data are memorized without a full collapse on any individual training point. Finally, we investigate the speciation transition of diffusion models in a case where data are not spatially separated. We obtain a general criterion for the speciation time, as well as its scaling. This thesis thus provides both theoretical insights and empirical analyses that bridge deep learning and statistical physics, contributing to a deeper understanding of how generative models learn and represent data.

Statistical Physics of Generative Diffusion

ACHILLI, BEATRICE
2026

Abstract

This thesis investigates diffusion models with a statistical physics point of view, focusing on phase transitions and symmetry breaking events. First, we analyze the reverse diffusion process under the empirical score function for structured data. What we obtain is the description of a dynamical landscape and the characterization of the most important transition times, namely the memorization time. We also give a definition of generalization in this context, observing that interestingly it always happens after the model starts memorizing. Then, we give a geometric description, exploring the spectral properties of the score function using tools from random matrix theory. By analyzing the Jacobian spectra of the score, we identify the emergence of geometric phases linked to spectral gaps. We also study the phenomenon of geometric memorization, demonstrating that it is characterized by a loss of dimensionality where some features of the data are memorized without a full collapse on any individual training point. Finally, we investigate the speciation transition of diffusion models in a case where data are not spatially separated. We obtain a general criterion for the speciation time, as well as its scaling. This thesis thus provides both theoretical insights and empirical analyses that bridge deep learning and statistical physics, contributing to a deeper understanding of how generative models learn and represent data.
29-gen-2026
Inglese
LUCIBELLO, CARLO
MEZARD, MARC JEAN MARCEL
Università Bocconi
File in questo prodotto:
File Dimensione Formato  
thesis_achilli_final.pdf

accesso aperto

Licenza: Tutti i diritti riservati
Dimensione 16.77 MB
Formato Adobe PDF
16.77 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/355870
Il codice NBN di questa tesi è URN:NBN:IT:UNIBOCCONI-355870