My PhD research focused on the Transformer architecture, a powerful deep neural network model that has emerged as a cornerstone for solving complex problems in natural language processing, image analysis, signal processing, and beyond. In particular, we studied the learning dynamics of this architecture, and its application to the representation of many-body wavefunctions, the so-called Neural Quantum States. Initially, we investigated the representational capabilities of Transformers by characterizing the statistical structures that a simplified Transformer layer, utilizing the so-called factored attention, is capable of learning. Building on these results, we utilized factored attention in deep Transformers to develop an accurate ansatz for approximating the ground states of quantum many-body Hamiltonians within the variational Monte Carlo framework. In this specific application, factored attention is crucial for achieving accurate results, demonstrating superior performance compared to the standard attention mechanism used in most of the other applications of the Transformers, and in particular in natural language processing. Alongside the development of an efficient optimization method for large-scale neural networks, we achieved state-of-the-art results on the most popular benchmark in Neural Quantum States and addressed complex physical problems that are subjects of ongoing debate. Finally, we developed a framework to train Foundation Neural Quantum States, which are versatile neural network models that approximate quantum wave functions of multiple systems simultaneously, enabling accurate estimates of challenging quantities such as disorder averages and fidelity susceptibility. We envision numerous future directions for this approach, including its extension to quantum dynamics by explicitly modeling time-dependent variational states, as well as its application to the design of novel materials in fermionic systems.

Theory of Transformers and their application to Neural Quantum States

RENDE, RICCARDO
2025

Abstract

My PhD research focused on the Transformer architecture, a powerful deep neural network model that has emerged as a cornerstone for solving complex problems in natural language processing, image analysis, signal processing, and beyond. In particular, we studied the learning dynamics of this architecture, and its application to the representation of many-body wavefunctions, the so-called Neural Quantum States. Initially, we investigated the representational capabilities of Transformers by characterizing the statistical structures that a simplified Transformer layer, utilizing the so-called factored attention, is capable of learning. Building on these results, we utilized factored attention in deep Transformers to develop an accurate ansatz for approximating the ground states of quantum many-body Hamiltonians within the variational Monte Carlo framework. In this specific application, factored attention is crucial for achieving accurate results, demonstrating superior performance compared to the standard attention mechanism used in most of the other applications of the Transformers, and in particular in natural language processing. Alongside the development of an efficient optimization method for large-scale neural networks, we achieved state-of-the-art results on the most popular benchmark in Neural Quantum States and addressed complex physical problems that are subjects of ongoing debate. Finally, we developed a framework to train Foundation Neural Quantum States, which are versatile neural network models that approximate quantum wave functions of multiple systems simultaneously, enabling accurate estimates of challenging quantities such as disorder averages and fidelity susceptibility. We envision numerous future directions for this approach, including its extension to quantum dynamics by explicitly modeling time-dependent variational states, as well as its application to the design of novel materials in fermionic systems.
11-set-2025
Inglese
Inglese
Laio, Alessandro
Goldt, Sebastian Dominik
SISSA
Trieste
File in questo prodotto:
File Dimensione Formato  
phd_thesis.pdf

accesso aperto

Dimensione 14.45 MB
Formato Adobe PDF
14.45 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/220401
Il codice NBN di questa tesi è URN:NBN:IT:SISSA-220401