Geolocation of microblog messages has been largely investigated in the lit- erature. Many solutions have been proposed that achieve good results at the city-level. Existing approaches are mainly data-driven (i.e., they rely on a training phase). However, the development of algorithms for geolocation at sub-city level is still an open problem also due to the absence of good training datasets. In this thesis, we investigate the role that external geographic know- ledge can play in geolocation approaches. We show how di)erent geographical data sources can be combined with a semantic layer to achieve reasonably accurate sub-city level geolocation. Moreover, we propose a knowledge-based method, called Sherloc, to accurately geolocate messages at sub-city level, by exploiting the presence in the message of toponyms possibly referring to the speci*c places in the target geographical area. Sherloc exploits the semantics associated with toponyms contained in gazetteers and embeds them into a metric space that captures the semantic distance among them. This allows toponyms to be represented as points and indexed by a spatial access method, allowing us to identify the semantically closest terms to a microblog message, that also form a cluster with respect to their spatial locations. In contrast to state-of-the-art methods, Sherloc requires no prior training, it is not limited to geolocating on a *xed spatial grid and it experimentally demonstrated its ability to infer the location at sub-city level with higher accuracy.

The role of geographic knowledge in sub-city level geolocation algorithms

DI ROCCO, LAURA
2019

Abstract

Geolocation of microblog messages has been largely investigated in the lit- erature. Many solutions have been proposed that achieve good results at the city-level. Existing approaches are mainly data-driven (i.e., they rely on a training phase). However, the development of algorithms for geolocation at sub-city level is still an open problem also due to the absence of good training datasets. In this thesis, we investigate the role that external geographic know- ledge can play in geolocation approaches. We show how di)erent geographical data sources can be combined with a semantic layer to achieve reasonably accurate sub-city level geolocation. Moreover, we propose a knowledge-based method, called Sherloc, to accurately geolocate messages at sub-city level, by exploiting the presence in the message of toponyms possibly referring to the speci*c places in the target geographical area. Sherloc exploits the semantics associated with toponyms contained in gazetteers and embeds them into a metric space that captures the semantic distance among them. This allows toponyms to be represented as points and indexed by a spatial access method, allowing us to identify the semantically closest terms to a microblog message, that also form a cluster with respect to their spatial locations. In contrast to state-of-the-art methods, Sherloc requires no prior training, it is not limited to geolocating on a *xed spatial grid and it experimentally demonstrated its ability to infer the location at sub-city level with higher accuracy.
14-mar-2019
Inglese
GUERRINI, GIOVANNA
DELZANNO, GIORGIO
Università degli studi di Genova
File in questo prodotto:
File Dimensione Formato  
phdunige_3348743_1.pdf

accesso aperto

Dimensione 12.56 MB
Formato Adobe PDF
12.56 MB Adobe PDF Visualizza/Apri
phdunige_3348743_2.pdf

accesso aperto

Dimensione 13.66 MB
Formato Adobe PDF
13.66 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/170337
Il codice NBN di questa tesi è URN:NBN:IT:UNIGE-170337