Improving neural networks efficiency via representation similarities

Cannistraci, Irene

As large-scale Neural Networks (NNs) continue to push the boundaries of performance in different fields ranging from drug discovery to climate science, their computational demands have become a major bottleneck. These models require extensive resources, limiting their accessibility and raising concerns about sustainability. Additionally, the reusability of these models is constrained by the need for costly retraining or fine-tuning when adapting them to new tasks or data. This dissertation presents novel approaches to address these challenges by exploiting similarities between and within NNs, thereby reducing computational and data requirements without compromising performance. The core contribution of this research lies in leveraging latent space representations of NNs to enable model reuse, and reduce computational complexity. First, we introduce a framework for combining latent spaces from different models, facilitating the unification of these neural representations in a meaningful way, allowing for the reuse of existing neural components without the need for further training. Then we exploit the aggregation of latent spaces, that may partially overlap or be entirely disjoint, to unify them in an efficient and meaningful way. Additionally, we develop an optimization method to align neural representations across diverse domains, addressing the limitations of existing methods that often depend on large sets of parallel samples to unify different latent spaces, which is an impractical requirement in many real-world scenarios. Finally, we investigate intra-network similarities to simplify large pretrained models. By identifying redundant computational blocks within individual NNs and approximating them using simpler transformations, our approach reduces the number of parameters and speeds up inference while maintaining the model’s integrity. Our findings demonstrate that leveraging similarities in latent spaces can simplify large-scale models through representation alignment and approximation, making them more efficient, accessible, and sustainable while maintaining their effectiveness. These methods are applicable across various architectures, such as transformers and convolutional networks, and support a wide range of tasks, as well as different data modalities. By enabling the reuse and simplification of NNs, this research contributes to the democratization of Machine Learning technologies and the development of more sustainable and efficient models.

Improving neural networks efficiency via representation similarities

CANNISTRACI, IRENE

2025

Abstract

As large-scale Neural Networks (NNs) continue to push the boundaries of performance in different fields ranging from drug discovery to climate science, their computational demands have become a major bottleneck. These models require extensive resources, limiting their accessibility and raising concerns about sustainability. Additionally, the reusability of these models is constrained by the need for costly retraining or fine-tuning when adapting them to new tasks or data. This dissertation presents novel approaches to address these challenges by exploiting similarities between and within NNs, thereby reducing computational and data requirements without compromising performance. The core contribution of this research lies in leveraging latent space representations of NNs to enable model reuse, and reduce computational complexity. First, we introduce a framework for combining latent spaces from different models, facilitating the unification of these neural representations in a meaningful way, allowing for the reuse of existing neural components without the need for further training. Then we exploit the aggregation of latent spaces, that may partially overlap or be entirely disjoint, to unify them in an efficient and meaningful way. Additionally, we develop an optimization method to align neural representations across diverse domains, addressing the limitations of existing methods that often depend on large sets of parallel samples to unify different latent spaces, which is an impractical requirement in many real-world scenarios. Finally, we investigate intra-network similarities to simplify large pretrained models. By identifying redundant computational blocks within individual NNs and approximating them using simpler transformations, our approach reduces the number of parameters and speeds up inference while maintaining the model’s integrity. Our findings demonstrate that leveraging similarities in latent spaces can simplify large-scale models through representation alignment and approximation, making them more efficient, accessible, and sustainable while maintaining their effectiveness. These methods are applicable across various architectures, such as transformers and convolutional networks, and support a wide range of tasks, as well as different data modalities. By enabling the reuse and simplification of NNs, this research contributes to the democratization of Machine Learning technologies and the development of more sustainable and efficient models.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI INFORMATICA
			
	Corso di studio
	
				Informatica
			
	Data di pubblicazione
	
				15-gen-2025
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				RODOLA', EMANUELE
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				MANCINI, MAURIZIO
			
	Nome Editore
	
				Università degli Studi di Roma "La Sapienza"
			
	Numero di pagine
	
				111
			
	Collezione di appartenenza
	
				Università degli Studi di Roma La Sapienza

File in questo prodotto:

File	Dimensione	Formato
Tesi_dottorato_Cannistraci.pdf accesso aperto Dimensione 19.17 MB Formato Adobe PDF Visualizza/Apri	19.17 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/189685

Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-189685