Label Propagation Algorithms are a class of heuristics for the problem of graph clustering, i.e., the problem of detecting groups of nodes whose connections are dense within each group and sparse between the groups. At the onset, a label is assigned to each node of the graph; then, each node iteratively updates its label according to a function of the labels of its neighbors. Empirical studies show that, after only a few rounds, nodes in the same cluster share the same label while nodes in different clusters have different labels. Although they are widely used in practice given their simplicity, efficiency, and effectiveness, there is no theoretical foundation to explain why such simple algorithms are able to perform such a hard task. The absence of theoretical progress in the analysis of Label Propagation Algorithms is due to the lack of mathematical techniques for handling the interplay between the non-linearity of their update rule and the topology of the underlying graph. In this thesis we contextualize Label Propagation Algorithms in the framework of computational dynamics, simple dynamical processes on networks whose behavior has been formally characterized on some classes of graphs. The analyses of computational dynamics were mainly focused on graphs with good connectivity properties, such as cliques or expanders, and on the problem of consensus, showing that they naturally converge to a configuration in which all the nodes agree on some value. We move a step forward in this direction by rigorously analyzing two simple dynamics, the 2-Choices dynamics and the Averaging dynamics, reaching a more fine-grained comprehension of their consensus behavior in classes of graphs that exhibit a clustered structure. In particular we formally prove that, with non-negligible probability, the two dynamics quickly bring the graph in a configuration where each cluster reaches an internal consensus on a value that is different among the clusters, and then enters a long metastable phase in which the internal consensus are maintained. We show how to exploit such metastable behavior to design simple randomized distributed algorithms for graph clustering.

Simple Randomized Distributed Algorithms for Graph Clustering

CRUCIANI, EMILIO
2019

Abstract

Label Propagation Algorithms are a class of heuristics for the problem of graph clustering, i.e., the problem of detecting groups of nodes whose connections are dense within each group and sparse between the groups. At the onset, a label is assigned to each node of the graph; then, each node iteratively updates its label according to a function of the labels of its neighbors. Empirical studies show that, after only a few rounds, nodes in the same cluster share the same label while nodes in different clusters have different labels. Although they are widely used in practice given their simplicity, efficiency, and effectiveness, there is no theoretical foundation to explain why such simple algorithms are able to perform such a hard task. The absence of theoretical progress in the analysis of Label Propagation Algorithms is due to the lack of mathematical techniques for handling the interplay between the non-linearity of their update rule and the topology of the underlying graph. In this thesis we contextualize Label Propagation Algorithms in the framework of computational dynamics, simple dynamical processes on networks whose behavior has been formally characterized on some classes of graphs. The analyses of computational dynamics were mainly focused on graphs with good connectivity properties, such as cliques or expanders, and on the problem of consensus, showing that they naturally converge to a configuration in which all the nodes agree on some value. We move a step forward in this direction by rigorously analyzing two simple dynamics, the 2-Choices dynamics and the Averaging dynamics, reaching a more fine-grained comprehension of their consensus behavior in classes of graphs that exhibit a clustered structure. In particular we formally prove that, with non-negligible probability, the two dynamics quickly bring the graph in a configuration where each cluster reaches an internal consensus on a value that is different among the clusters, and then enters a long metastable phase in which the internal consensus are maintained. We show how to exploit such metastable behavior to design simple randomized distributed algorithms for graph clustering.
19-dic-2019
Inglese
D'ANGELO, GIANLORENZO
Gran Sasso Science Institute
L'Aquila
File in questo prodotto:
File Dimensione Formato  
2019_Cruciani.pdf

accesso aperto

Dimensione 15.51 MB
Formato Adobe PDF
15.51 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/62195
Il codice NBN di questa tesi è URN:NBN:IT:GSSI-62195