Multiple Latent Variable Generative Models: an Information-Theoretic Perspective and Applications

Serez, Dario

In recent years, the field of image generation has made significant advancements, as modern deep learning systems became capable of producing photo-realistic pictures of stunning quality. In the vast amount of possible architectures that can perform such a task, we recognize an emerging subclass, which we term Multiple Latent Variable Generative Models (MLVGMs). These systems, developed as an improvement over traditional latent variable frameworks such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), employ multiple latent variables that enter the generative process at subsequent stages, gradually refining image features from coarse, global aspects to finer, local details. Nevertheless, these properties have so far been observed only at an empirical level, and independently for various models. In this work, we formally recognize MLVGMs as a distinct category of generative models, and propose the first theoretical interpretation that discusses the reasons behind their "global-to-local" subdivision of image information. More specifically, our study is grounded on Information Theory, drawing connections between the generative process of MLVGMs and the framework known as Successive Refinement of Information. Furthermore, to better understand how the gradual refinement of the image features operates on real models, we develop an algorithm to quantitatively measure this phenomenon, estimating the contribution of each latent variable from the Mutual Information shift produced on real images. The proposed procedure enhances the control over the entire generative process, allowing to use MLVGMs in new and unexplored ways. On this basis, we study different potential applications of MLVGMs beyond powerful image generation, specifically in tasks such as Self-Supervised Contrastive Representation Learning (SSCRL) and Adversarial Robustness. In SSCRL, we leverage the generative process to sample better positive views, which lead to more discriminative feature learning. In the context of Adversarial Robustness, we employ MLVGMs as foundation models to purify adversarial examples, offering a novel defense mechanism. Finally, we investigate how our findings can be extended from continuous to discrete variables, allowing the consideration of other types of generative models, and in particular those operating with Vector Quantized Variational Autoencoders (VQ-VAEs). We find that the Successive Refinement framework well suits also this kind of architecture, and propose a new variant that shows some interesting results, broadening the impact of our study.

Multiple Latent Variable Generative Models: an Information-Theoretic Perspective and Applications

SEREZ, DARIO

2025

Abstract

In recent years, the field of image generation has made significant advancements, as modern deep learning systems became capable of producing photo-realistic pictures of stunning quality. In the vast amount of possible architectures that can perform such a task, we recognize an emerging subclass, which we term Multiple Latent Variable Generative Models (MLVGMs). These systems, developed as an improvement over traditional latent variable frameworks such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), employ multiple latent variables that enter the generative process at subsequent stages, gradually refining image features from coarse, global aspects to finer, local details. Nevertheless, these properties have so far been observed only at an empirical level, and independently for various models. In this work, we formally recognize MLVGMs as a distinct category of generative models, and propose the first theoretical interpretation that discusses the reasons behind their "global-to-local" subdivision of image information. More specifically, our study is grounded on Information Theory, drawing connections between the generative process of MLVGMs and the framework known as Successive Refinement of Information. Furthermore, to better understand how the gradual refinement of the image features operates on real models, we develop an algorithm to quantitatively measure this phenomenon, estimating the contribution of each latent variable from the Mutual Information shift produced on real images. The proposed procedure enhances the control over the entire generative process, allowing to use MLVGMs in new and unexplored ways. On this basis, we study different potential applications of MLVGMs beyond powerful image generation, specifically in tasks such as Self-Supervised Contrastive Representation Learning (SSCRL) and Adversarial Robustness. In SSCRL, we leverage the generative process to sample better positive views, which lead to more discriminative feature learning. In the context of Adversarial Robustness, we employ MLVGMs as foundation models to purify adversarial examples, offering a novel defense mechanism. Finally, we investigate how our findings can be extended from continuous to discrete variables, allowing the consideration of other types of generative models, and in particular those operating with Vector Quantized Variational Autoencoders (VQ-VAEs). We find that the Successive Refinement framework well suits also this kind of architecture, and propose a new variant that shows some interesting results, broadening the impact of our study.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				100026 - Dipartimento di Ingegneria navale, elettrica, elettronica e delle telecomunicazioni
			
	Corso di studio
	
				XXXVII CICLO - SCIENZE E TECNOLOGIE PER L'INGEGNERIA ELETTRONICA E DELLE TELECOMUNICAZIONI - visione computazionale, riconoscimento e apprendimento automatico
			
	Data di pubblicazione
	
				13-mar-2025
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				MURINO, VITTORIO
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				VALLE, MAURIZIO
			
	Nome Editore
	
				Università degli studi di Genova
			
	Collezione di appartenenza
	
				Università degli Studi di Genova

File in questo prodotto:

File	Dimensione	Formato
phdunige_5191725_1.pdf accesso aperto Dimensione 11.5 MB Formato Adobe PDF Visualizza/Apri	11.5 MB	Adobe PDF	Visualizza/Apri
phdunige_5191725_2.pdf accesso aperto Dimensione 10.81 MB Formato Adobe PDF Visualizza/Apri	10.81 MB	Adobe PDF	Visualizza/Apri
phdunige_5191725_3.pdf accesso aperto Dimensione 13.02 MB Formato Adobe PDF Visualizza/Apri	13.02 MB	Adobe PDF	Visualizza/Apri
phdunige_5191725_4.pdf accesso aperto Dimensione 6.08 MB Formato Adobe PDF Visualizza/Apri	6.08 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/197281

Il codice NBN di questa tesi è URN:NBN:IT:UNIGE-197281