Textual data forms a popular form of communication; however, textual data is complex in nature as it is produced by humans. Given the huge amount of textual data currently available, it is essential to be able to mine this data automatically. Recent text mining efforts are making extensive use of knowledge bases, and this thesis pursues a similar effort. We however make use of Wikipedia to solve complex text mining tasks and current approaches do not make effective use of the category-article structure within Wikipedia. Particularly, we solve the problem of determining various topical threads in a document together with contextualization of social media content to disambiguate its various aspects. Experimental evaluations demonstrate the superiroty of our proposed methods when compared with state-of-the-art.

Utilizing Wikipedia for Text Mining Applications

QURESCHI, MUHAMMAD ATIF
2015

Abstract

Textual data forms a popular form of communication; however, textual data is complex in nature as it is produced by humans. Given the huge amount of textual data currently available, it is essential to be able to mine this data automatically. Recent text mining efforts are making extensive use of knowledge bases, and this thesis pursues a similar effort. We however make use of Wikipedia to solve complex text mining tasks and current approaches do not make effective use of the category-article structure within Wikipedia. Particularly, we solve the problem of determining various topical threads in a document together with contextualization of social media content to disambiguate its various aspects. Experimental evaluations demonstrate the superiroty of our proposed methods when compared with state-of-the-art.
28-ott-2015
Inglese
PASI, GABRIELLA
Università degli Studi di Milano-Bicocca
File in questo prodotto:
File Dimensione Formato  
phd_unimib_761334.pdf

accesso aperto

Dimensione 2.51 MB
Formato Adobe PDF
2.51 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/74815
Il codice NBN di questa tesi è URN:NBN:IT:UNIMIB-74815