Theory of Transformers and their application to Neural Quantum States

Rende, Riccardo

My PhD research focused on the Transformer architecture, a powerful deep neural network model that has emerged as a cornerstone for solving complex problems in natural language processing, image analysis, signal processing, and beyond. In particular, we studied the learning dynamics of this architecture, and its application to the representation of many-body wavefunctions, the so-called Neural Quantum States. Initially, we investigated the representational capabilities of Transformers by characterizing the statistical structures that a simplified Transformer layer, utilizing the so-called factored attention, is capable of learning. Building on these results, we utilized factored attention in deep Transformers to develop an accurate ansatz for approximating the ground states of quantum many-body Hamiltonians within the variational Monte Carlo framework. In this specific application, factored attention is crucial for achieving accurate results, demonstrating superior performance compared to the standard attention mechanism used in most of the other applications of the Transformers, and in particular in natural language processing. Alongside the development of an efficient optimization method for large-scale neural networks, we achieved state-of-the-art results on the most popular benchmark in Neural Quantum States and addressed complex physical problems that are subjects of ongoing debate. Finally, we developed a framework to train Foundation Neural Quantum States, which are versatile neural network models that approximate quantum wave functions of multiple systems simultaneously, enabling accurate estimates of challenging quantities such as disorder averages and fidelity susceptibility. We envision numerous future directions for this approach, including its extension to quantum dynamics by explicitly modeling time-dependent variational states, as well as its application to the design of novel materials in fermionic systems.

Theory of Transformers and their application to Neural Quantum States

RENDE, RICCARDO

2025

Abstract

My PhD research focused on the Transformer architecture, a powerful deep neural network model that has emerged as a cornerstone for solving complex problems in natural language processing, image analysis, signal processing, and beyond. In particular, we studied the learning dynamics of this architecture, and its application to the representation of many-body wavefunctions, the so-called Neural Quantum States. Initially, we investigated the representational capabilities of Transformers by characterizing the statistical structures that a simplified Transformer layer, utilizing the so-called factored attention, is capable of learning. Building on these results, we utilized factored attention in deep Transformers to develop an accurate ansatz for approximating the ground states of quantum many-body Hamiltonians within the variational Monte Carlo framework. In this specific application, factored attention is crucial for achieving accurate results, demonstrating superior performance compared to the standard attention mechanism used in most of the other applications of the Transformers, and in particular in natural language processing. Alongside the development of an efficient optimization method for large-scale neural networks, we achieved state-of-the-art results on the most popular benchmark in Neural Quantum States and addressed complex physical problems that are subjects of ongoing debate. Finally, we developed a framework to train Foundation Neural Quantum States, which are versatile neural network models that approximate quantum wave functions of multiple systems simultaneously, enabling accurate estimates of challenging quantities such as disorder averages and fidelity susceptibility. We envision numerous future directions for this approach, including its extension to quantum dynamics by explicitly modeling time-dependent variational states, as well as its application to the design of novel materials in fermionic systems.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				Theoretical and Scientific Data Science
			
	Data di pubblicazione
	
				11-set-2025
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				Laio, Alessandro
Goldt, Sebastian Dominik
			
	Nome Editore
	
				SISSA
			
	Città Editore
	
				Trieste
			
	Collezione di appartenenza
	
				Scuola Internazionale Superiore di Studi Avanzati di Trieste

File in questo prodotto:

File	Dimensione	Formato
phd_thesis.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 14.45 MB Formato Adobe PDF Visualizza/Apri	14.45 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/220401

Il codice NBN di questa tesi è URN:NBN:IT:SISSA-220401