Knowledge is information that has been contextualized in a certain domain, where it can be used and applied. Natural Language provides a most direct way to transfer knowledge at different levels of conceptual density. The opportunity provided by the evolution of the technologies of Natural Language Processing is thus of making more fluid and universal the process of knowledge transfer. Indeed, unfolding domain knowledge is one way to bring to larger audiences contents that would be otherwise restricted to specialists. This has been done so far in a totally manual way through the skills of divulgators and popular science writers. Technology provides now a way to make this transfer both less expensive and more widespread. Extracting knowledge and then generating from it suitably communicable text in natural language are the two related subtasks that need be fulfilled in order to attain the general goal. To this aim, two fields from information technology have achieved the needed maturity and can therefore be effectively combined. In fact, on the one hand Information Extraction and Retrieval (IER) can extract knowledge from texts and map it into a neutral, abstract form, hence liberating it from the stylistic constraints into which it was originated. From there, Natural Language Generation can take charge, by regenerating automatically, or semi-automatically, the extracted knowledge into texts targeting new communities. This doctoral thesis provides a contribution to making substantial this combination through the definition and implementation of a novel multidimensional model for the representation of conceptual knowledge and of a workflow that can produce strongly customized textual descriptions. By exploiting techniques for the generation of paraphrases and by profiling target users, applications and domains, a target-driven approach is proposed to automatically generate multiple texts from the same information core. An extended case study is described to demonstrate the effectiveness of the proposed model and approach in the Cultural Heritage application domain, so as to compare and position this contribution within the current state of the art and to outline future directions.

A Knowledge Multidimensional Representation Model for Automatic Text Analysis and Generation: Applications for Cultural Heritage

2016

Abstract

Knowledge is information that has been contextualized in a certain domain, where it can be used and applied. Natural Language provides a most direct way to transfer knowledge at different levels of conceptual density. The opportunity provided by the evolution of the technologies of Natural Language Processing is thus of making more fluid and universal the process of knowledge transfer. Indeed, unfolding domain knowledge is one way to bring to larger audiences contents that would be otherwise restricted to specialists. This has been done so far in a totally manual way through the skills of divulgators and popular science writers. Technology provides now a way to make this transfer both less expensive and more widespread. Extracting knowledge and then generating from it suitably communicable text in natural language are the two related subtasks that need be fulfilled in order to attain the general goal. To this aim, two fields from information technology have achieved the needed maturity and can therefore be effectively combined. In fact, on the one hand Information Extraction and Retrieval (IER) can extract knowledge from texts and map it into a neutral, abstract form, hence liberating it from the stylistic constraints into which it was originated. From there, Natural Language Generation can take charge, by regenerating automatically, or semi-automatically, the extracted knowledge into texts targeting new communities. This doctoral thesis provides a contribution to making substantial this combination through the definition and implementation of a novel multidimensional model for the representation of conceptual knowledge and of a workflow that can produce strongly customized textual descriptions. By exploiting techniques for the generation of paraphrases and by profiling target users, applications and domains, a target-driven approach is proposed to automatically generate multiple texts from the same information core. An extended case study is described to demonstrate the effectiveness of the proposed model and approach in the Cultural Heritage application domain, so as to compare and position this contribution within the current state of the art and to outline future directions.
2016
it
File in questo prodotto:
File Dimensione Formato  
MarulliPhDThesis02.pdf

accesso solo da BNCF e BNCR

Tipologia: Altro materiale allegato
Licenza: Tutti i diritti riservati
Dimensione 251 B
Formato Adobe PDF
251 B Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/339821
Il codice NBN di questa tesi è URN:NBN:IT:BNCF-339821