Barrier and Syntactic Features for Information Retrieval

Alicante, Anita

Information Retrieval (IR) goal consists in retrieving all the documents in a collection that are relevant to a given query. A subtask of IR is Information Extraction (IE) which includes machine learning approaches automatically extract from the documents information about, for example, entities or relations or events etc. In this thesis a novel type of features, called barrier features, is introduced. They are based on PoS-tagging. We use these features to solve several IR and IE problems. In details we build several IR or IE systems and overcame both the state-of-art methods and baseline systems built without these features. Again exploiting syntactic information in the second part of this thesis we apply constituency and dependency parsing, to two different areas: to support Concept Location in Software Engineering and to study the influence of the constituent order on the data-driven parsing in Computational Linguistic. In the former we have evaluated the use of off-the-shelf and trained natural language analyzers to parse identifier names, extract an ontology and use it to support concept location; in the latter we use two state-of-the-art data-driven parsers to study influence of the constituent order on the data-driven parsing of Italian.

Barrier and Syntactic Features for Information Retrieval

Alicante, Anita

2013

Abstract

Information Retrieval (IR) goal consists in retrieving all the documents in a collection that are relevant to a given query. A subtask of IR is Information Extraction (IE) which includes machine learning approaches automatically extract from the documents information about, for example, entities or relations or events etc. In this thesis a novel type of features, called barrier features, is introduced. They are based on PoS-tagging. We use these features to solve several IR and IE problems. In details we build several IR or IE systems and overcame both the state-of-art methods and baseline systems built without these features. Again exploiting syntactic information in the second part of this thesis we apply constituency and dependency parsing, to two different areas: to support Concept Location in Software Engineering and to study the influence of the constituent order on the data-driven parsing in Computational Linguistic. In the former we have evaluated the use of off-the-shelf and trained natural language analyzers to parse identifier names, extract an ontology and use it to support concept location; in the latter we use two state-of-the-art data-driven parsers to study influence of the constituent order on the data-driven parsing of Italian.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di pubblicazione
	
				2013
			
	Lingua
	
				en
			
	Collezione di appartenenza
	
				BNCF

File in questo prodotto:

File	Dimensione	Formato
alicante_anita_25.pdf accesso solo da BNCF e BNCR Tipologia: Altro materiale allegato Licenza: Tutti i diritti riservati Dimensione 1.7 MB Formato Adobe PDF	1.7 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/315214

Il codice NBN di questa tesi è URN:NBN:IT:BNCF-315214