Algorithms for Knowledge and Information Extraction in Text with Wikipedia

Ponza, Marco

This thesis focuses on the design of algorithms for the extraction of knowledge (in terms of entities belonging to a knowledge graph) and information (in terms of open facts) from text through the use of Wikipedia as main repository of world knowledge. The first part of the dissertation focuses on research problems that specifically lie in the domain of knowledge and information extraction. In this context, we contribute to the scientific literature with the following three achievements: first, we study the problem of computing the relatedness between Wikipedia entities, through the introduction of a new dataset of human judgements complemented by a study of all entity relatedness measures proposed in recent literature as well as with the proposal of a new computationally lightweight two-stage framework for relatedness computation; second, we study the problem of entity salience through the design and implementation of a new system that aims at identifying the salient Wikipedia entities occurring in an input text and that improves the state-of-the-art over different datasets; third, we introduce a new research problem called fact salience, which addresses the task of detecting salient open facts extracted from an input text, and we propose, design and implement the first system that efficaciously solves it. In the second part of the dissertation we study an application of knowledge extraction tools in the domain of expert finding. We propose a new system which hinges upon a novel profiling technique that models people (i.e., experts) through a small and labeled graph drawn from Wikipedia. This new profiling technique is then used for designing a novel suite of ranking algorithms for matching the user query and whose effectiveness is shown by improving state-of-the-art solutions.

Algorithms for Knowledge and Information Extraction in Text with Wikipedia

PONZA, MARCO

2019

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
				21-feb-2019
			
	Lingua
	
				Italiano
			
	Parola chiave
	
				Information Extraction
Information Retrieval
Knowledge Extraction
Knowledge Graph
Machine Learning
Natural Language Processing
Natural Language Understanding
Wikipedia
			
	Relatore, Supervisor, Advisor o Tutor
	
				Ferragina, Paolo
			
	Collezione di appartenenza
	
				Università degli Studi di Pisa

File in questo prodotto:

File	Dimensione	Formato
dissertation.pdf accesso aperto Tipologia: Altro materiale allegato Licenza: Tutti i diritti riservati Dimensione 6.83 MB Formato Adobe PDF Visualizza/Apri	6.83 MB	Adobe PDF	Visualizza/Apri
report.pdf accesso aperto Tipologia: Altro materiale allegato Licenza: Tutti i diritti riservati Dimensione 29.44 kB Formato Adobe PDF Visualizza/Apri	29.44 kB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/134210

Il codice NBN di questa tesi è URN:NBN:IT:UNIPI-134210