In recent years, the field of image generation has made significant advancements, as modern deep learning systems became capable of producing photo-realistic pictures of stunning quality. In the vast amount of possible architectures that can perform such a task, we recognize an emerging subclass, which we term Multiple Latent Variable Generative Models (MLVGMs). These systems, developed as an improvement over traditional latent variable frameworks such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), employ multiple latent variables that enter the generative process at subsequent stages, gradually refining image features from coarse, global aspects to finer, local details. Nevertheless, these properties have so far been observed only at an empirical level, and independently for various models. In this work, we formally recognize MLVGMs as a distinct category of generative models, and propose the first theoretical interpretation that discusses the reasons behind their "global-to-local" subdivision of image information. More specifically, our study is grounded on Information Theory, drawing connections between the generative process of MLVGMs and the framework known as Successive Refinement of Information. Furthermore, to better understand how the gradual refinement of the image features operates on real models, we develop an algorithm to quantitatively measure this phenomenon, estimating the contribution of each latent variable from the Mutual Information shift produced on real images. The proposed procedure enhances the control over the entire generative process, allowing to use MLVGMs in new and unexplored ways. On this basis, we study different potential applications of MLVGMs beyond powerful image generation, specifically in tasks such as Self-Supervised Contrastive Representation Learning (SSCRL) and Adversarial Robustness. In SSCRL, we leverage the generative process to sample better positive views, which lead to more discriminative feature learning. In the context of Adversarial Robustness, we employ MLVGMs as foundation models to purify adversarial examples, offering a novel defense mechanism. Finally, we investigate how our findings can be extended from continuous to discrete variables, allowing the consideration of other types of generative models, and in particular those operating with Vector Quantized Variational Autoencoders (VQ-VAEs). We find that the Successive Refinement framework well suits also this kind of architecture, and propose a new variant that shows some interesting results, broadening the impact of our study.

Multiple Latent Variable Generative Models: an Information-Theoretic Perspective and Applications

SEREZ, DARIO
2025

Abstract

In recent years, the field of image generation has made significant advancements, as modern deep learning systems became capable of producing photo-realistic pictures of stunning quality. In the vast amount of possible architectures that can perform such a task, we recognize an emerging subclass, which we term Multiple Latent Variable Generative Models (MLVGMs). These systems, developed as an improvement over traditional latent variable frameworks such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), employ multiple latent variables that enter the generative process at subsequent stages, gradually refining image features from coarse, global aspects to finer, local details. Nevertheless, these properties have so far been observed only at an empirical level, and independently for various models. In this work, we formally recognize MLVGMs as a distinct category of generative models, and propose the first theoretical interpretation that discusses the reasons behind their "global-to-local" subdivision of image information. More specifically, our study is grounded on Information Theory, drawing connections between the generative process of MLVGMs and the framework known as Successive Refinement of Information. Furthermore, to better understand how the gradual refinement of the image features operates on real models, we develop an algorithm to quantitatively measure this phenomenon, estimating the contribution of each latent variable from the Mutual Information shift produced on real images. The proposed procedure enhances the control over the entire generative process, allowing to use MLVGMs in new and unexplored ways. On this basis, we study different potential applications of MLVGMs beyond powerful image generation, specifically in tasks such as Self-Supervised Contrastive Representation Learning (SSCRL) and Adversarial Robustness. In SSCRL, we leverage the generative process to sample better positive views, which lead to more discriminative feature learning. In the context of Adversarial Robustness, we employ MLVGMs as foundation models to purify adversarial examples, offering a novel defense mechanism. Finally, we investigate how our findings can be extended from continuous to discrete variables, allowing the consideration of other types of generative models, and in particular those operating with Vector Quantized Variational Autoencoders (VQ-VAEs). We find that the Successive Refinement framework well suits also this kind of architecture, and propose a new variant that shows some interesting results, broadening the impact of our study.
13-mar-2025
Inglese
MURINO, VITTORIO
VALLE, MAURIZIO
Università degli studi di Genova
File in questo prodotto:
File Dimensione Formato  
phdunige_5191725_1.pdf

accesso aperto

Dimensione 11.5 MB
Formato Adobe PDF
11.5 MB Adobe PDF Visualizza/Apri
phdunige_5191725_2.pdf

accesso aperto

Dimensione 10.81 MB
Formato Adobe PDF
10.81 MB Adobe PDF Visualizza/Apri
phdunige_5191725_3.pdf

accesso aperto

Dimensione 13.02 MB
Formato Adobe PDF
13.02 MB Adobe PDF Visualizza/Apri
phdunige_5191725_4.pdf

accesso aperto

Dimensione 6.08 MB
Formato Adobe PDF
6.08 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/197281
Il codice NBN di questa tesi è URN:NBN:IT:UNIGE-197281