In this thesis, we continue the research on repetitive data compression by investigating novel general compression schemes that are data-independent. Although we specifically focus on machine learning and key-value systems, we believe that our methods provide insights applicable to a wider range of application domains. Our proposed methods adapt one-dimensional general-purpose compression tools to handle complex data structures such as matrices, graphs and tries. These schemes effectively capture redundancies and interdependencies among the data, enabling compression that surpasses what can be achieved through sparsity alone, and without compromising the quality metrics such as precision or recall of the resulting models. Following the “computation-friendly” paradigm, our compressed representations allow for direct operations on the compressed data, with time comparable to operations on uncompressed data.

Computation-friendly compression of matrices and tries

TOSONI, FRANCESCO
2024

Abstract

In this thesis, we continue the research on repetitive data compression by investigating novel general compression schemes that are data-independent. Although we specifically focus on machine learning and key-value systems, we believe that our methods provide insights applicable to a wider range of application domains. Our proposed methods adapt one-dimensional general-purpose compression tools to handle complex data structures such as matrices, graphs and tries. These schemes effectively capture redundancies and interdependencies among the data, enabling compression that surpasses what can be achieved through sparsity alone, and without compromising the quality metrics such as precision or recall of the resulting models. Following the “computation-friendly” paradigm, our compressed representations allow for direct operations on the compressed data, with time comparable to operations on uncompressed data.
4-mag-2024
Italiano
basi dati chiave-valore
compressione dati ripetitivi
compressione senza perdita
dizionari di stringhe
green computing
key-value stores
lossless compression
matrix-vector multiplications
moltiplicazioni matrice-vettore
repetitive data compression
string dictionaries
trie
tries
Ferragina, Paolo
Manzini, Giovanni
File in questo prodotto:
File Dimensione Formato  
20240418_thesis.pdf

embargo fino al 06/05/2027

Dimensione 1.18 MB
Formato Adobe PDF
1.18 MB Adobe PDF
resoconto_tosoni.pdf

non disponibili

Dimensione 276.42 kB
Formato Adobe PDF
276.42 kB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/216698
Il codice NBN di questa tesi è URN:NBN:IT:UNIPI-216698