The work presented in this thesis focuses on an issue that very commonly arise when studying a network: missing information. There are many phenomena that can cause such a lack of knowledge, but prior to any attempt at studying the data, it is desirable to have a knowledge of the network at hand that is as complete as possible. Here I will address specifically two types of missing information problems, namely network reconstruction and link prediction. In the former case, the network structure is hidden, the only information we have access to is the size of the network and some aggregate nodespecific quantity. In the context of link prediction we face a different issue: there is a real underlying network that represents the phenomenon we want to study, of which we can only observe an incomplete version where some links are not present. Our goal will be to identify the most likely candidates to be the missing links and, for weighted networks, their intensity. Both problem will be tackled using entropybased methods, that guarantee the results to be unbiased. The thesis presents advancements on three major fronts. It generalizes the formalism for network reconstruction, proposing a flexible methodology that allows to include any prior topological knowledge and to derive a compatible, unbiased weighted distribution. It proposes a new approach to link prediction, whose key idea is to tune reconstruction models on the accessible portion of network to infer the partiallyobserved portion, i.e. the most likely missing links. Finally, in the case of weighted prediction, unlike the vast majority of alternative methods, it provides an explicit recipe to estimate the links weights, together with their confidence intervals.

Entropy-based methods to tackle missing information in complex networks

2019

Abstract

The work presented in this thesis focuses on an issue that very commonly arise when studying a network: missing information. There are many phenomena that can cause such a lack of knowledge, but prior to any attempt at studying the data, it is desirable to have a knowledge of the network at hand that is as complete as possible. Here I will address specifically two types of missing information problems, namely network reconstruction and link prediction. In the former case, the network structure is hidden, the only information we have access to is the size of the network and some aggregate nodespecific quantity. In the context of link prediction we face a different issue: there is a real underlying network that represents the phenomenon we want to study, of which we can only observe an incomplete version where some links are not present. Our goal will be to identify the most likely candidates to be the missing links and, for weighted networks, their intensity. Both problem will be tackled using entropybased methods, that guarantee the results to be unbiased. The thesis presents advancements on three major fronts. It generalizes the formalism for network reconstruction, proposing a flexible methodology that allows to include any prior topological knowledge and to derive a compatible, unbiased weighted distribution. It proposes a new approach to link prediction, whose key idea is to tune reconstruction models on the accessible portion of network to infer the partiallyobserved portion, i.e. the most likely missing links. Finally, in the case of weighted prediction, unlike the vast majority of alternative methods, it provides an explicit recipe to estimate the links weights, together with their confidence intervals.
14-mar-2019
Inglese
HB Economic Theory
Caldarelli, Prof. Guido
Scuola IMT Alti Studi di Lucca
File in questo prodotto:
File Dimensione Formato  
Parisi_phdthesis.pdf

accesso aperto

Tipologia: Altro materiale allegato
Dimensione 3.09 MB
Formato Adobe PDF
3.09 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/130355
Il codice NBN di questa tesi è URN:NBN:IT:IMTLUCCA-130355