Thinking Like A Vertex (TLAV) is a popular computational paradigm suitable to express many distributed and iterative graph algorithms. It has been adopted as base computational paradigm for many of the currently available distributed frameworks and endorsed by numerous industries and academias. Also, it has been exploited to define algorithms to extract useful information from the nowadays increasing production of data which can be modeled as graphs. These facts strengthen the idea that exploiting distributed frameworks for graph analysis is an hot topic of research. As a matter of fact, we found that a solution for several algorithms is not always available or state-of-art algorithms are unsatisfactory, under many points of view. This thesis aims at providing guidelines for defining distributed graph algorithms structured according to TLAV and showing their applicability to real applications. We show how approximation, simplification and versatility can be combined to define novel distributed algorithms to improve the currently available solutions with the goal to enhance the functionalities of the algorithms. We show also how algorithms may be combined to define complex solutions and how they can be employed to solve relevant applications. In particular, we present novel algorithms for computing betweenness centrality, connected components and clustering. Such algorithms are exploited for Spam campaign detection, population estimation and hashtag centrality. To this end, we make use of real large dataset provided from our collaborations, Symantec for Spam emails, a large Italian Mobile Phone provider, mobile calls, and ISTI, CNR for a two years collection of real tweets from the Twitter social network.

Distributed Graph Processing: Algorithms And Applications

2017

Abstract

Thinking Like A Vertex (TLAV) is a popular computational paradigm suitable to express many distributed and iterative graph algorithms. It has been adopted as base computational paradigm for many of the currently available distributed frameworks and endorsed by numerous industries and academias. Also, it has been exploited to define algorithms to extract useful information from the nowadays increasing production of data which can be modeled as graphs. These facts strengthen the idea that exploiting distributed frameworks for graph analysis is an hot topic of research. As a matter of fact, we found that a solution for several algorithms is not always available or state-of-art algorithms are unsatisfactory, under many points of view. This thesis aims at providing guidelines for defining distributed graph algorithms structured according to TLAV and showing their applicability to real applications. We show how approximation, simplification and versatility can be combined to define novel distributed algorithms to improve the currently available solutions with the goal to enhance the functionalities of the algorithms. We show also how algorithms may be combined to define complex solutions and how they can be employed to solve relevant applications. In particular, we present novel algorithms for computing betweenness centrality, connected components and clustering. Such algorithms are exploited for Spam campaign detection, population estimation and hashtag centrality. To this end, we make use of real large dataset provided from our collaborations, Symantec for Spam emails, a large Italian Mobile Phone provider, mobile calls, and ISTI, CNR for a two years collection of real tweets from the Twitter social network.
6-apr-2017
Italiano
Ricci, Laura Emilia Maria
Dazzi, Patrizio
Università degli Studi di Pisa
File in questo prodotto:
File Dimensione Formato  
AlessandroLulli_PhDThesis.pdf

accesso aperto

Tipologia: Altro materiale allegato
Dimensione 5.22 MB
Formato Adobe PDF
5.22 MB Adobe PDF Visualizza/Apri
reportborsista_finale.pdf

accesso aperto

Tipologia: Altro materiale allegato
Dimensione 346.23 kB
Formato Adobe PDF
346.23 kB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/133896
Il codice NBN di questa tesi è URN:NBN:IT:UNIPI-133896