This thesis focuses on the design of efficient (in terms of time and space) and efficacious (in terms of precision and recall) algorithms and data structures that leverage upon the information encoded in big labeled graphs, with the ultimate goal of deploying them in the solution of problems emerging in the realm of Search Engines and Social Networks. In the first part of the dissertation, we focus on Entity Linking and its applications. The procedure can be thought as a semantic augmentation of documents with entities drawn from a large knowledge base, such as Wikipedia, that are linked to meaningful sequences of terms found in those documents. Our contributions are the following ones: first, we introduce a new and easy to use benchmarking platform that can be deployed to compare entity linkers with minimum effort; second, we propose a scalable disambiguation algorithm that hinges on the theory of word embeddings and language models in order to provide state-of-the-art or near state-of-the-art entity-linking performance at Web-scale; third, we show the application of entity-linking in a noisy and difficult domain like Twitter’s messages and derive a graph-based representation of hashtags’ meaning, which, in turn, can be employed to tackle the challenging problems of hashtag relatedness and hashtag classification. In the second part of the dissertation, we deal with the problem of compressed representation of knowledge graphs and large labeled graphs in general. We propose a novel and efficient encoding of labeled graphs that need to support sophisticated search operations which involve both the linked structure of the graph and the string content of its nodes.

Algorithms and data structures for big labeled graphs

2017

Abstract

This thesis focuses on the design of efficient (in terms of time and space) and efficacious (in terms of precision and recall) algorithms and data structures that leverage upon the information encoded in big labeled graphs, with the ultimate goal of deploying them in the solution of problems emerging in the realm of Search Engines and Social Networks. In the first part of the dissertation, we focus on Entity Linking and its applications. The procedure can be thought as a semantic augmentation of documents with entities drawn from a large knowledge base, such as Wikipedia, that are linked to meaningful sequences of terms found in those documents. Our contributions are the following ones: first, we introduce a new and easy to use benchmarking platform that can be deployed to compare entity linkers with minimum effort; second, we propose a scalable disambiguation algorithm that hinges on the theory of word embeddings and language models in order to provide state-of-the-art or near state-of-the-art entity-linking performance at Web-scale; third, we show the application of entity-linking in a noisy and difficult domain like Twitter’s messages and derive a graph-based representation of hashtags’ meaning, which, in turn, can be employed to tackle the challenging problems of hashtag relatedness and hashtag classification. In the second part of the dissertation, we deal with the problem of compressed representation of knowledge graphs and large labeled graphs in general. We propose a novel and efficient encoding of labeled graphs that need to support sophisticated search operations which involve both the linked structure of the graph and the string content of its nodes.
4-apr-2017
Italiano
Ferragina, Paolo
Venturini, Rossano
Università degli Studi di Pisa
File in questo prodotto:
File Dimensione Formato  
relazione.pdf

accesso aperto

Tipologia: Altro materiale allegato
Dimensione 428.56 kB
Formato Adobe PDF
428.56 kB Adobe PDF Visualizza/Apri
thesis.pdf

accesso aperto

Tipologia: Altro materiale allegato
Dimensione 5.14 MB
Formato Adobe PDF
5.14 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/150549
Il codice NBN di questa tesi è URN:NBN:IT:UNIPI-150549