This thesis investigates diffusion models with a statistical physics point of view, focusing on phase transitions and symmetry breaking events. First, we analyze the reverse diffusion process under the empirical score function for structured data. What we obtain is the description of a dynamical landscape and the characterization of the most important transition times, namely the memorization time. We also give a definition of generalization in this context, observing that interestingly it always happens after the model starts memorizing. Then, we give a geometric description, exploring the spectral properties of the score function using tools from random matrix theory. By analyzing the Jacobian spectra of the score, we identify the emergence of geometric phases linked to spectral gaps. We also study the phenomenon of geometric memorization, demonstrating that it is characterized by a loss of dimensionality where some features of the data are memorized without a full collapse on any individual training point. Finally, we investigate the speciation transition of diffusion models in a case where data are not spatially separated. We obtain a general criterion for the speciation time, as well as its scaling. This thesis thus provides both theoretical insights and empirical analyses that bridge deep learning and statistical physics, contributing to a deeper understanding of how generative models learn and represent data.
Statistical Physics of Generative Diffusion
ACHILLI, BEATRICE
2026
Abstract
This thesis investigates diffusion models with a statistical physics point of view, focusing on phase transitions and symmetry breaking events. First, we analyze the reverse diffusion process under the empirical score function for structured data. What we obtain is the description of a dynamical landscape and the characterization of the most important transition times, namely the memorization time. We also give a definition of generalization in this context, observing that interestingly it always happens after the model starts memorizing. Then, we give a geometric description, exploring the spectral properties of the score function using tools from random matrix theory. By analyzing the Jacobian spectra of the score, we identify the emergence of geometric phases linked to spectral gaps. We also study the phenomenon of geometric memorization, demonstrating that it is characterized by a loss of dimensionality where some features of the data are memorized without a full collapse on any individual training point. Finally, we investigate the speciation transition of diffusion models in a case where data are not spatially separated. We obtain a general criterion for the speciation time, as well as its scaling. This thesis thus provides both theoretical insights and empirical analyses that bridge deep learning and statistical physics, contributing to a deeper understanding of how generative models learn and represent data.| File | Dimensione | Formato | |
|---|---|---|---|
|
thesis_achilli_final.pdf
accesso aperto
Licenza:
Tutti i diritti riservati
Dimensione
16.77 MB
Formato
Adobe PDF
|
16.77 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/355870
URN:NBN:IT:UNIBOCCONI-355870