The advancement of modern computer vision, while driven by the remarkable performance of Deep Neural Networks (DNNs), is increasingly confronted by the practical limitations imposed by their own escalating complexity. A prevailing trend toward ever-larger and more computationally demanding models has created a significant barrier to the deployment of sophisticated artificial intelligence on resource-constrained platforms, such as mobile devices, wearables, and autonomous drones. This thesis directly confronts this critical challenge by investigating and advancing techniques for efficient deep learning, with a central focus on two powerful and aggressive model compression strategies: Binary Neural Networks (BNNs) \cite{courbariaux2016binarized} and neural network pruning. The core hypothesis guiding this work is that extreme model compression can be achieved while maintaining high performance across a diverse set of real-world computer vision problems, thereby enabling the deployment of advanced AI in practical, resource-limited scenarios. Methodologically, this research first addresses the foundational problems that have impeded the widespread adoption of these efficient models. This includes the development of CycleBNN \cite{cyclebnn}, a novel cyclic precision training methodology designed to significantly reduce the substantial computational overhead associated with training BNNs. In a complementary investigation into sparsity, this work also introduces Distilled Gradual Pruning with Pruned Fine-tuning (DG2PF) \cite{gradual_pruning}, a comprehensive algorithm that synergizes network pruning with knowledge distillation to achieve high levels of model compression with minimal degradation in accuracy. Having established these core techniques, the thesis proceeds to demonstrate their practical viability through their application to a series of challenging, latency-sensitive tasks. Highly efficient BNN-based solutions are presented for hand gesture recognition (BNNAction-Net)\cite{bnnaction}, deepfake detection (Faster Than Lies) \cite{lanzino2024faster}, and cross-view geolocalization for UAVs (BiCrossNet) \cite{bicrossnet}. In each case, the proposed models achieve performance competitive with their full-precision counterparts while offering reductions in computational complexity and memory footprint that span orders of magnitude. Furthermore, this research extends the study of deepfake detection beyond the paradigm of static classification by reframing it as a continual learning problem. Through a novel chronological evaluation framework that simulates the real-world evolution of generative technologies, a fundamental limitation in the generalization capabilities of current detectors is identified. This critical finding leads to the proposal of the Non-Universal Deepfake Distribution Hypothesis\cite{fontana2025revisiting}, which posits that each deepfake generator imprints a unique, non-transferable signature, thereby underscoring the absolute necessity of continuous model adaptation for any robust, long-term detection strategy. The collective results of this thesis empirically validate that BNNs and neural network pruning are not merely theoretical concepts but are powerful, practical tools for developing high-performance, resource-efficient computer vision systems. By systematically bridging the gap between state-of-the-art accuracy and real-world deployability, this work contributes to a more accessible, sustainable, and scalable future for the field of artificial intelligence.
Efficient deep learning for computer vision
FONTANA, FEDERICO
2026
Abstract
The advancement of modern computer vision, while driven by the remarkable performance of Deep Neural Networks (DNNs), is increasingly confronted by the practical limitations imposed by their own escalating complexity. A prevailing trend toward ever-larger and more computationally demanding models has created a significant barrier to the deployment of sophisticated artificial intelligence on resource-constrained platforms, such as mobile devices, wearables, and autonomous drones. This thesis directly confronts this critical challenge by investigating and advancing techniques for efficient deep learning, with a central focus on two powerful and aggressive model compression strategies: Binary Neural Networks (BNNs) \cite{courbariaux2016binarized} and neural network pruning. The core hypothesis guiding this work is that extreme model compression can be achieved while maintaining high performance across a diverse set of real-world computer vision problems, thereby enabling the deployment of advanced AI in practical, resource-limited scenarios. Methodologically, this research first addresses the foundational problems that have impeded the widespread adoption of these efficient models. This includes the development of CycleBNN \cite{cyclebnn}, a novel cyclic precision training methodology designed to significantly reduce the substantial computational overhead associated with training BNNs. In a complementary investigation into sparsity, this work also introduces Distilled Gradual Pruning with Pruned Fine-tuning (DG2PF) \cite{gradual_pruning}, a comprehensive algorithm that synergizes network pruning with knowledge distillation to achieve high levels of model compression with minimal degradation in accuracy. Having established these core techniques, the thesis proceeds to demonstrate their practical viability through their application to a series of challenging, latency-sensitive tasks. Highly efficient BNN-based solutions are presented for hand gesture recognition (BNNAction-Net)\cite{bnnaction}, deepfake detection (Faster Than Lies) \cite{lanzino2024faster}, and cross-view geolocalization for UAVs (BiCrossNet) \cite{bicrossnet}. In each case, the proposed models achieve performance competitive with their full-precision counterparts while offering reductions in computational complexity and memory footprint that span orders of magnitude. Furthermore, this research extends the study of deepfake detection beyond the paradigm of static classification by reframing it as a continual learning problem. Through a novel chronological evaluation framework that simulates the real-world evolution of generative technologies, a fundamental limitation in the generalization capabilities of current detectors is identified. This critical finding leads to the proposal of the Non-Universal Deepfake Distribution Hypothesis\cite{fontana2025revisiting}, which posits that each deepfake generator imprints a unique, non-transferable signature, thereby underscoring the absolute necessity of continuous model adaptation for any robust, long-term detection strategy. The collective results of this thesis empirically validate that BNNs and neural network pruning are not merely theoretical concepts but are powerful, practical tools for developing high-performance, resource-efficient computer vision systems. By systematically bridging the gap between state-of-the-art accuracy and real-world deployability, this work contributes to a more accessible, sustainable, and scalable future for the field of artificial intelligence.| File | Dimensione | Formato | |
|---|---|---|---|
|
Tesi_dottorato_Fontana.pdf
accesso aperto
Licenza:
Creative Commons
Dimensione
10.32 MB
Formato
Adobe PDF
|
10.32 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/356958
URN:NBN:IT:UNIROMA1-356958