Improving the accuracy of Natural Language Dependency Parsing

Dell'Orletta, Felice

The aim of this thesis is to improve Natural Language Dependency Parsing. We employ a linear Shift Reduce Dependency parsing algorithm avoiding the increase of computational costs. We start by presenting our experiments results achieved during our participation at the multilingual dependency shared task of Conference on Computational Natural Language (CoNLL) 2007. We perform an accurate error analysis of the best parsers presented at the conference to reveal critical aspects of parsing systems. This will lead us to introduce a new parsing method and a new parser combination algorithm with the purpose of improving the deterministic Shift Reduce parser’s accuracy. The new parsing method, called Reverse Revision Parsing, employs a Left-to-Right Shift Reduce parser that parses the sentence followed by a second Right-to-Left Shift Reduce parser that scans the sentence in reverse using additional features obtained from the prediction of the ﬁrst parser. The new parser combination algorithm, called Quasi-Linear Parser Combination, exploits the fact that its inputs are trees in order to avoid the quadratic cost of algorithms for computing the maximum spanning tree of a graph. We report on our experiments’ results obtained during the participation at CoNLL-2008 evaluation task. These results have been achieved employing the Reverse Revision Parsing and a new combination algorithm presented during the course of this thesis. We then present a number of experiments meant to select a set of features that provides the greatest improvement to a Shift Reduce statistical dependency parser. We report on the accuracy gains that such parser can obtain using features from gold chunks, from chunks produced using a statistical chunker and from approximate chunks obtained by detecting noun phrases through regular expression patterns. A parser exploiting features from approximate chunks is applied to a chunking task and its accuracy in chunking is compared to that of a specialized statistical chunker. Finally, we investigate the performances achieved by parsers when they apply to lan- guages that are characterized by a relatively free word order and by a rich morphology. Thus, we perform a detailed quantitative analysis of distributional language data highlighting the relative contribution of a number of distributed grammatical and semantic factors in parsing. We therefore introduce Animacy, a semantic feature usually not present in available treebanks, and discuss its eﬀect in parsing.

Improving the accuracy of Natural Language Dependency Parsing

DELL'ORLETTA, FELICE

2008

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)