This research delves into two primary domains: the intricate critical properties of long-range systems and the feature learning mechanisms in deep neural networks. In the study of long-range systems, we examined the ferromagnetic Ising model in both one and two dimensions, characterized by interactions of the form $J_{ij} \propto r^{-(d+\sigma)}$. Utilizing a novel local dynamics on a dynamical Lévy lattice (DLL), we were able to reproduce the static critical exponents consistent with established literature, based on the interaction parameter $\sigma$. This localized approach offers a versatile methodology to probe the dynamical properties of various long-range models. Notably, our analysis of the relaxation time at the critical temperature revealed nuances in the relationship between the dynamical exponent $z$ and the decay parameter $\sigma$, suggesting a potential disparity between dynamical and equilibrium critical properties. Moreover, due to the versatility of our strategy (DLL), we were able to conduct preliminary work in the study of the critical properties of the Long Range $XY$ model. Turning to deep neural networks, we explored the disparities in feature learning between fully-connected networks (FCN) and convolutional architectures (CNNs). Empirical studies on fully-connected networks in the infinite-width regime revealed a plateau in performance enhancement, attributed to the static nature of their kernel during training. This suggests that any inherent feature learning in such FCN structures has limited impact on generalization. Conversely, convolutional architectures (CNNs), particularly in the finite-width setting, have shown superior performance. Our theoretical framework for single hidden layer networks elucidates this disparity. While an infinite-width FCN’s performance can be replicated by its finite-width counterpart using specific Gaussian priors, CNNs with a single convolutional hidden layer undergo a different kernel renormalization process. Unlike the global adjustments seen in FC networks, CNNs experience a localized renormalization, enabling adaptive selection of data-dependent components for predictions. This distinction emphasizes the advanced feature learning potential present in overparametrized shallow CNNs, which is not observed in equivalent FC architectures. Collectively, these studies shed light on the profound influence of topology in diverse systems, ranging from the behavior of long-range physical models to the intricate feature extraction processes in neural architectures.
Questa ricerca approfondisce due ambiti principali: le proprietà critiche dei sistemi a lungo raggio e i meccanismi di apprendimento delle caratteristiche nelle reti neurali profonde. Nello studio dei sistemi a lungo raggio, abbiamo esaminato il modello di Ising ferromagnetico in una e due dimensioni, caratterizzato da interazioni della forma $J_{ij}\propto r_{ij}^{-(d+\sigma)}$. Utilizzando una nuova dinamica locale su una rete di Lévy dinamica (DLL), siamo stati in grado di riprodurre gli esponenti critici statici coerenti con la letteratura consolidata. Questo approccio localizzato offre una metodologia versatile per esplorare le proprietà dinamiche di vari modelli a lungo raggio. In particolare, la nostra analisi del tempo di rilassamento alla temperatura critica ha rivelato sfumature nella relazione tra l’esponente dinamico $z$ e il parametro di decadimento $\sigma$, suggerendo una possibile disparità tra le proprietà critiche dinamiche e di equilibrio. Inoltre, grazie alla versatilità della nostra strategia (DLL), siamo stati in grado di condurre lavori preliminari nello studio delle proprietà critiche del modello Long Range $XY$. Passando alle reti neurali profonde, abbiamo esplorato le disparità nell’apprendimento delle caratteristiche tra le reti completamente connesse (FCN) e le architetture convoluzionali (CNNs). Studi empirici su reti completamente connesse nel regime di larghezza infinita hanno rivelato un plateau nel miglioramento delle prestazioni, attribuito alla natura statica del loro kernel durante l’addestramento. Questo suggerisce che qualsiasi apprendimento intrinseco delle caratteristiche in tali strutture FCN ha un impatto limitato sulla generalizzazione. Al contrario, le architetture convoluzionali (CNNs), in particolare nell’impostazione a larghezza finita, hanno mostrato prestazioni superiori. Il nostro quadro teorico per le reti a singolo strato nascosto chiarisce questa disparità. Mentre le prestazioni di una FCN a larghezza infinita possono essere replicate dalla sua controparte a larghezza finita usando specifici priori gaussiani, le CNNs con un singolo strato nascosto convoluzionale subiscono un diverso processo di rinormalizzazione del kernel. A differenza degli aggiustamenti globali osservati nelle reti FC, le CNNs sperimentano una rinormalizzazione localizzata, consentendo la selezione adattiva dei componenti dipendenti dai dati per le previsioni. Questa distinzione enfatizza l’elevata capacità di apprendimento delle caratteristiche presente nelle CNNs sovraparametrizzate, che non si osserva in architetture FC equivalenti. Collettivamente, questi studi gettano luce sulla profonda influenza della topologia in sistemi diversi, che vanno dal comportamento dei modelli fisici a lungo raggio ai complessi processi di estrazione delle caratteristiche nelle architetture neurali.
Unraveling the role of topology in complex long range systems and deep neural networks
Riccardo, Aiudi
2024
Abstract
This research delves into two primary domains: the intricate critical properties of long-range systems and the feature learning mechanisms in deep neural networks. In the study of long-range systems, we examined the ferromagnetic Ising model in both one and two dimensions, characterized by interactions of the form $J_{ij} \propto r^{-(d+\sigma)}$. Utilizing a novel local dynamics on a dynamical Lévy lattice (DLL), we were able to reproduce the static critical exponents consistent with established literature, based on the interaction parameter $\sigma$. This localized approach offers a versatile methodology to probe the dynamical properties of various long-range models. Notably, our analysis of the relaxation time at the critical temperature revealed nuances in the relationship between the dynamical exponent $z$ and the decay parameter $\sigma$, suggesting a potential disparity between dynamical and equilibrium critical properties. Moreover, due to the versatility of our strategy (DLL), we were able to conduct preliminary work in the study of the critical properties of the Long Range $XY$ model. Turning to deep neural networks, we explored the disparities in feature learning between fully-connected networks (FCN) and convolutional architectures (CNNs). Empirical studies on fully-connected networks in the infinite-width regime revealed a plateau in performance enhancement, attributed to the static nature of their kernel during training. This suggests that any inherent feature learning in such FCN structures has limited impact on generalization. Conversely, convolutional architectures (CNNs), particularly in the finite-width setting, have shown superior performance. Our theoretical framework for single hidden layer networks elucidates this disparity. While an infinite-width FCN’s performance can be replicated by its finite-width counterpart using specific Gaussian priors, CNNs with a single convolutional hidden layer undergo a different kernel renormalization process. Unlike the global adjustments seen in FC networks, CNNs experience a localized renormalization, enabling adaptive selection of data-dependent components for predictions. This distinction emphasizes the advanced feature learning potential present in overparametrized shallow CNNs, which is not observed in equivalent FC architectures. Collectively, these studies shed light on the profound influence of topology in diverse systems, ranging from the behavior of long-range physical models to the intricate feature extraction processes in neural architectures.File | Dimensione | Formato | |
---|---|---|---|
Thesis_phd_aiudi.pdf
embargo fino al 01/04/2025
Dimensione
9.35 MB
Formato
Adobe PDF
|
9.35 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/196724
URN:NBN:IT:UNIPR-196724