On Improving Efficiency/Effectiveness trade-offs with Neural Network Compression

Rulli, Cosimo

Deep Neural Networks (DNNs) deliver state-of-the-art performance in various fields at the price of huge computational requirements. In this thesis, we propose three solutions to reduce the computational requirements of DNNs in Learning to Rank (LtR), Image Classification, and multi-term Dense Retrieval (DR). LtR is the field of machine learning employed to rank candidate documents in a search engine. We propose a methodology to train efficient and effective neural networks for LtR by e employing pruning and cross-modal knowledge distillation. Furthermore, we develop analytic time predictors estimating the execution time of sparse and dense neural networks, thus easing the design of neural models matching the desired time requirements. In Image Classification, we propose Automatic Prune Binarization (APB), a novel compression framework enriching the expressiveness of binary networks with few full-precision weights. Moreover, we design two innovative matrix multiplication algorithms for extremely low bits configurations, based on highly efficient bitwise and logical CPU instructions. In multi-term DR, we propose two different contributions, working with uncompressed and compressed vector representations, respectively. The former exploits query terms and document terms merging to speedup the search phase while jointly reducing the memory footprint. The latter introduces Product Quantization during the document scoring phase and presents a highly efficient filtering step implemented using bit vectors. Le Reti Neurali Profonde (DNN) sono l’attuale stato dell’arte nel Machine Learning (ML), ma richiedono enormi requisiti computazionali. In questa tesi, proponiamo tre soluzioni per ridurre tali requisiti nei tasks di Learning to Rank (LtR), classificazione delle immagini e multi-term Dense Retrieval (DR). LtR è il campo del (ML) utilizzato per ordinare i documenti candidati in un motore di ricerca. Viene proposta una metodologia per addestrare reti neurali efficienti ed efficaci per LtR utilizzando il pruning e la knowledge distillation. Inoltre, vengono sviluppati dei predittori analitici che stimano la latenza di reti neurali sparse e dense, semplificandonde così la progettazione. Nella classificazione delle immagini, proponiamo Automatic Prune Binarization (APB), un nuovo framework di compressione che arricchisce l'espressività delle reti binarie con pochi pesi full-precision. Inoltre, progettiamo due algoritmi innovativi di moltiplicazione tra matrici per configurazioni a pochi bit, basati sulle efficienti istruzioni bitwise e logiche della CPU. Nel multi-term DR, vengono proposti due contributi, rispettivamente per rappresentazioni vettoriali compresse e non compresse. Il primo sfrutta la fusione dei termini di query e documenti per velocizzare la fase di ricerca, riducendo anche la memoria necessaria. Il secondo introduce Product Quantization durante la fase di scoring del documento e presenta una fase di filtraggio efficiente implementata utilizzando bit vectors..

File	Dimensione	Formato
main.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 6.14 MB Formato Adobe PDF Visualizza/Apri	6.14 MB	Adobe PDF	Visualizza/Apri
ReportFinaleDottorato.pdf non disponibili Licenza: Tutti i diritti riservati Dimensione 72.48 kB Formato Adobe PDF	72.48 kB	Adobe PDF

On Improving Efficiency/Effectiveness trade-offs with Neural Network Compression

RULLI, COSIMO

2023

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

On Improving Efficiency/Effectiveness trade-offs with Neural Network Compression

RULLI, COSIMO

2023

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)