The work presented in this thesis focuses on an issue that very commonly arise when studying a network: missing information. There are many phenomena that can cause such a lack of knowledge, but prior to any attempt at studying the data, it is desirable to have a knowledge of the network at hand that is as complete as possible. Here I will address specifically two types of missing information problems, namely network reconstruction and link prediction. In the former case, the network structure is hidden, the only information we have access to is the size of the network and some aggregate nodespecific quantity. In the context of link prediction we face a different issue: there is a real underlying network that represents the phenomenon we want to study, of which we can only observe an incomplete version where some links are not present. Our goal will be to identify the most likely candidates to be the missing links and, for weighted networks, their intensity. Both problem will be tackled using entropybased methods, that guarantee the results to be unbiased. The thesis presents advancements on three major fronts. It generalizes the formalism for network reconstruction, proposing a flexible methodology that allows to include any prior topological knowledge and to derive a compatible, unbiased weighted distribution. It proposes a new approach to link prediction, whose key idea is to tune reconstruction models on the accessible portion of network to infer the partiallyobserved portion, i.e. the most likely missing links. Finally, in the case of weighted prediction, unlike the vast majority of alternative methods, it provides an explicit recipe to estimate the links weights, together with their confidence intervals.
Entropy-based methods to tackle missing information in complex networks
2019
Abstract
The work presented in this thesis focuses on an issue that very commonly arise when studying a network: missing information. There are many phenomena that can cause such a lack of knowledge, but prior to any attempt at studying the data, it is desirable to have a knowledge of the network at hand that is as complete as possible. Here I will address specifically two types of missing information problems, namely network reconstruction and link prediction. In the former case, the network structure is hidden, the only information we have access to is the size of the network and some aggregate nodespecific quantity. In the context of link prediction we face a different issue: there is a real underlying network that represents the phenomenon we want to study, of which we can only observe an incomplete version where some links are not present. Our goal will be to identify the most likely candidates to be the missing links and, for weighted networks, their intensity. Both problem will be tackled using entropybased methods, that guarantee the results to be unbiased. The thesis presents advancements on three major fronts. It generalizes the formalism for network reconstruction, proposing a flexible methodology that allows to include any prior topological knowledge and to derive a compatible, unbiased weighted distribution. It proposes a new approach to link prediction, whose key idea is to tune reconstruction models on the accessible portion of network to infer the partiallyobserved portion, i.e. the most likely missing links. Finally, in the case of weighted prediction, unlike the vast majority of alternative methods, it provides an explicit recipe to estimate the links weights, together with their confidence intervals.File | Dimensione | Formato | |
---|---|---|---|
Parisi_phdthesis.pdf
accesso aperto
Tipologia:
Altro materiale allegato
Dimensione
3.09 MB
Formato
Adobe PDF
|
3.09 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/130355
URN:NBN:IT:IMTLUCCA-130355