Efficient deep learning for computer vision

Fontana, Federico

The advancement of modern computer vision, while driven by the remarkable performance of Deep Neural Networks (DNNs), is increasingly confronted by the practical limitations imposed by their own escalating complexity. A prevailing trend toward ever-larger and more computationally demanding models has created a significant barrier to the deployment of sophisticated artificial intelligence on resource-constrained platforms, such as mobile devices, wearables, and autonomous drones. This thesis directly confronts this critical challenge by investigating and advancing techniques for efficient deep learning, with a central focus on two powerful and aggressive model compression strategies: Binary Neural Networks (BNNs) \cite{courbariaux2016binarized} and neural network pruning. The core hypothesis guiding this work is that extreme model compression can be achieved while maintaining high performance across a diverse set of real-world computer vision problems, thereby enabling the deployment of advanced AI in practical, resource-limited scenarios. Methodologically, this research first addresses the foundational problems that have impeded the widespread adoption of these efficient models. This includes the development of CycleBNN \cite{cyclebnn}, a novel cyclic precision training methodology designed to significantly reduce the substantial computational overhead associated with training BNNs. In a complementary investigation into sparsity, this work also introduces Distilled Gradual Pruning with Pruned Fine-tuning (DG2PF) \cite{gradual_pruning}, a comprehensive algorithm that synergizes network pruning with knowledge distillation to achieve high levels of model compression with minimal degradation in accuracy. Having established these core techniques, the thesis proceeds to demonstrate their practical viability through their application to a series of challenging, latency-sensitive tasks. Highly efficient BNN-based solutions are presented for hand gesture recognition (BNNAction-Net)\cite{bnnaction}, deepfake detection (Faster Than Lies) \cite{lanzino2024faster}, and cross-view geolocalization for UAVs (BiCrossNet) \cite{bicrossnet}. In each case, the proposed models achieve performance competitive with their full-precision counterparts while offering reductions in computational complexity and memory footprint that span orders of magnitude. Furthermore, this research extends the study of deepfake detection beyond the paradigm of static classification by reframing it as a continual learning problem. Through a novel chronological evaluation framework that simulates the real-world evolution of generative technologies, a fundamental limitation in the generalization capabilities of current detectors is identified. This critical finding leads to the proposal of the Non-Universal Deepfake Distribution Hypothesis\cite{fontana2025revisiting}, which posits that each deepfake generator imprints a unique, non-transferable signature, thereby underscoring the absolute necessity of continuous model adaptation for any robust, long-term detection strategy. The collective results of this thesis empirically validate that BNNs and neural network pruning are not merely theoretical concepts but are powerful, practical tools for developing high-performance, resource-efficient computer vision systems. By systematically bridging the gap between state-of-the-art accuracy and real-world deployability, this work contributes to a more accessible, sustainable, and scalable future for the field of artificial intelligence.

Efficient deep learning for computer vision

FONTANA, FEDERICO

2026

Abstract

The advancement of modern computer vision, while driven by the remarkable performance of Deep Neural Networks (DNNs), is increasingly confronted by the practical limitations imposed by their own escalating complexity. A prevailing trend toward ever-larger and more computationally demanding models has created a significant barrier to the deployment of sophisticated artificial intelligence on resource-constrained platforms, such as mobile devices, wearables, and autonomous drones. This thesis directly confronts this critical challenge by investigating and advancing techniques for efficient deep learning, with a central focus on two powerful and aggressive model compression strategies: Binary Neural Networks (BNNs) \cite{courbariaux2016binarized} and neural network pruning. The core hypothesis guiding this work is that extreme model compression can be achieved while maintaining high performance across a diverse set of real-world computer vision problems, thereby enabling the deployment of advanced AI in practical, resource-limited scenarios. Methodologically, this research first addresses the foundational problems that have impeded the widespread adoption of these efficient models. This includes the development of CycleBNN \cite{cyclebnn}, a novel cyclic precision training methodology designed to significantly reduce the substantial computational overhead associated with training BNNs. In a complementary investigation into sparsity, this work also introduces Distilled Gradual Pruning with Pruned Fine-tuning (DG2PF) \cite{gradual_pruning}, a comprehensive algorithm that synergizes network pruning with knowledge distillation to achieve high levels of model compression with minimal degradation in accuracy. Having established these core techniques, the thesis proceeds to demonstrate their practical viability through their application to a series of challenging, latency-sensitive tasks. Highly efficient BNN-based solutions are presented for hand gesture recognition (BNNAction-Net)\cite{bnnaction}, deepfake detection (Faster Than Lies) \cite{lanzino2024faster}, and cross-view geolocalization for UAVs (BiCrossNet) \cite{bicrossnet}. In each case, the proposed models achieve performance competitive with their full-precision counterparts while offering reductions in computational complexity and memory footprint that span orders of magnitude. Furthermore, this research extends the study of deepfake detection beyond the paradigm of static classification by reframing it as a continual learning problem. Through a novel chronological evaluation framework that simulates the real-world evolution of generative technologies, a fundamental limitation in the generalization capabilities of current detectors is identified. This critical finding leads to the proposal of the Non-Universal Deepfake Distribution Hypothesis\cite{fontana2025revisiting}, which posits that each deepfake generator imprints a unique, non-transferable signature, thereby underscoring the absolute necessity of continuous model adaptation for any robust, long-term detection strategy. The collective results of this thesis empirically validate that BNNs and neural network pruning are not merely theoretical concepts but are powerful, practical tools for developing high-performance, resource-efficient computer vision systems. By systematically bridging the gap between state-of-the-art accuracy and real-world deployability, this work contributes to a more accessible, sustainable, and scalable future for the field of artificial intelligence.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI INFORMATICA
			
	Corso di studio
	
				Informatica
			
	Data di pubblicazione
	
				26-gen-2026
			
	Lingua
	
				Inglese
			
	Relatore, Supervisor, Advisor o Tutor
	
				CINQUE, LUIGI
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				MANCINI, MAURIZIO
			
	Nome Editore
	
				Università degli Studi di Roma "La Sapienza"
			
	Collezione di appartenenza
	
				Università degli Studi di Roma La Sapienza

File in questo prodotto:

File	Dimensione	Formato
Tesi_dottorato_Fontana.pdf accesso aperto Licenza: Creative Commons Dimensione 10.32 MB Formato Adobe PDF Visualizza/Apri	10.32 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/356958

Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-356958