Learning-based compressed data structures

Vinciguerra, Giorgio

This thesis revisits two fundamental problems in data structure design: predecessor search and rank/select primitives. These problems are pervasive in applications, particularly in areas such as database systems, search engines, bioinformatics, and Internet routing. We show that real data present a peculiar kind of regularity that can be explained in terms of geometric considerations. We name it "approximate linearity" and analyse its algorithmic effectiveness in a variety of possible input data distributions. We then expand the horizon of compressed data structures by presenting solutions for the problems above that discover, or "learn", in a rigorous and efficient algorithmic way, the approximate linearities present in the data. In addition, we show how to combine this new form of compressibility with the classic repetition-aware approaches thus introducing a new class of compressed indexes. We accompany our theoretical results with implementations and experiments on large amounts of data, and we show that, compared to several well-engineered known compressed indexes, our data structures provide improvements in time, in space or both (often of orders of magnitude).

Learning-based compressed data structures

VINCIGUERRA, GIORGIO

2022

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
				18-feb-2022
			
	Lingua
	
				Italiano
			
	Parola chiave
	
				algorithm engineering
algorithms
compressed data structures
data compression
			
	Relatore, Supervisor, Advisor o Tutor
	
				Ferragina, Paolo
			
	Collezione di appartenenza
	
				Università degli Studi di Pisa

File in questo prodotto:

File	Dimensione	Formato
report.pdf accesso aperto Dimensione 141.01 kB Formato Adobe PDF Visualizza/Apri	141.01 kB	Adobe PDF	Visualizza/Apri
thesis.pdf accesso aperto Dimensione 2.25 MB Formato Adobe PDF Visualizza/Apri	2.25 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/216283

Il codice NBN di questa tesi è URN:NBN:IT:UNIPI-216283