In recent years, the field of image generation has made significant advancements, as modern deep learning systems became capable of producing photo-realistic pictures of stunning quality. In the vast amount of possible architectures that can perform such a task, we recognize an emerging subclass, which we term Multiple Latent Variable Generative Models (MLVGMs). These systems, developed as an improvement over traditional latent variable frameworks such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), employ multiple latent variables that enter the generative process at subsequent stages, gradually refining image features from coarse, global aspects to finer, local details. Nevertheless, these properties have so far been observed only at an empirical level, and independently for various models. In this work, we formally recognize MLVGMs as a distinct category of generative models, and propose the first theoretical interpretation that discusses the reasons behind their "global-to-local" subdivision of image information. More specifically, our study is grounded on Information Theory, drawing connections between the generative process of MLVGMs and the framework known as Successive Refinement of Information. Furthermore, to better understand how the gradual refinement of the image features operates on real models, we develop an algorithm to quantitatively measure this phenomenon, estimating the contribution of each latent variable from the Mutual Information shift produced on real images. The proposed procedure enhances the control over the entire generative process, allowing to use MLVGMs in new and unexplored ways. On this basis, we study different potential applications of MLVGMs beyond powerful image generation, specifically in tasks such as Self-Supervised Contrastive Representation Learning (SSCRL) and Adversarial Robustness. In SSCRL, we leverage the generative process to sample better positive views, which lead to more discriminative feature learning. In the context of Adversarial Robustness, we employ MLVGMs as foundation models to purify adversarial examples, offering a novel defense mechanism. Finally, we investigate how our findings can be extended from continuous to discrete variables, allowing the consideration of other types of generative models, and in particular those operating with Vector Quantized Variational Autoencoders (VQ-VAEs). We find that the Successive Refinement framework well suits also this kind of architecture, and propose a new variant that shows some interesting results, broadening the impact of our study.
Multiple Latent Variable Generative Models: an Information-Theoretic Perspective and Applications
SEREZ, DARIO
2025
Abstract
In recent years, the field of image generation has made significant advancements, as modern deep learning systems became capable of producing photo-realistic pictures of stunning quality. In the vast amount of possible architectures that can perform such a task, we recognize an emerging subclass, which we term Multiple Latent Variable Generative Models (MLVGMs). These systems, developed as an improvement over traditional latent variable frameworks such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), employ multiple latent variables that enter the generative process at subsequent stages, gradually refining image features from coarse, global aspects to finer, local details. Nevertheless, these properties have so far been observed only at an empirical level, and independently for various models. In this work, we formally recognize MLVGMs as a distinct category of generative models, and propose the first theoretical interpretation that discusses the reasons behind their "global-to-local" subdivision of image information. More specifically, our study is grounded on Information Theory, drawing connections between the generative process of MLVGMs and the framework known as Successive Refinement of Information. Furthermore, to better understand how the gradual refinement of the image features operates on real models, we develop an algorithm to quantitatively measure this phenomenon, estimating the contribution of each latent variable from the Mutual Information shift produced on real images. The proposed procedure enhances the control over the entire generative process, allowing to use MLVGMs in new and unexplored ways. On this basis, we study different potential applications of MLVGMs beyond powerful image generation, specifically in tasks such as Self-Supervised Contrastive Representation Learning (SSCRL) and Adversarial Robustness. In SSCRL, we leverage the generative process to sample better positive views, which lead to more discriminative feature learning. In the context of Adversarial Robustness, we employ MLVGMs as foundation models to purify adversarial examples, offering a novel defense mechanism. Finally, we investigate how our findings can be extended from continuous to discrete variables, allowing the consideration of other types of generative models, and in particular those operating with Vector Quantized Variational Autoencoders (VQ-VAEs). We find that the Successive Refinement framework well suits also this kind of architecture, and propose a new variant that shows some interesting results, broadening the impact of our study.File | Dimensione | Formato | |
---|---|---|---|
phdunige_5191725_1.pdf
accesso aperto
Dimensione
11.5 MB
Formato
Adobe PDF
|
11.5 MB | Adobe PDF | Visualizza/Apri |
phdunige_5191725_2.pdf
accesso aperto
Dimensione
10.81 MB
Formato
Adobe PDF
|
10.81 MB | Adobe PDF | Visualizza/Apri |
phdunige_5191725_3.pdf
accesso aperto
Dimensione
13.02 MB
Formato
Adobe PDF
|
13.02 MB | Adobe PDF | Visualizza/Apri |
phdunige_5191725_4.pdf
accesso aperto
Dimensione
6.08 MB
Formato
Adobe PDF
|
6.08 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/197281
URN:NBN:IT:UNIGE-197281