In this work, we present an automated unsupervised approach for retrieval/classification in eDiscovery. This approach is an ad hoc retrieval which creates a representative for each original document in the collection using latent dirichlet allocation (LDA) model with Gibbs sampling and explores word sense disambiguation (WSD) to give these representative documents and queries deeper meanings for distributional semantic similarity. The word sense disambiguation technique by itself is a hybrid algorithm derived from the modified version of the original Lesk algorithm and the Jiang & Conrath similarity measure. Evaluation was carried out on this technique using the TREC legal track. Results and observations are discussed in chapter 8. We conclude that WSD can improve ad hoc retrieval effectiveness. Finally, we suggest further on efficient algorithms for word sense disambiguation which can further improve retrieval effectiveness if applied to original document collections against using representative collections.
A Combined Unsupervised Technique for Automatic Classification in Electronic Discovery
2017
Abstract
In this work, we present an automated unsupervised approach for retrieval/classification in eDiscovery. This approach is an ad hoc retrieval which creates a representative for each original document in the collection using latent dirichlet allocation (LDA) model with Gibbs sampling and explores word sense disambiguation (WSD) to give these representative documents and queries deeper meanings for distributional semantic similarity. The word sense disambiguation technique by itself is a hybrid algorithm derived from the modified version of the original Lesk algorithm and the Jiang & Conrath similarity measure. Evaluation was carried out on this technique using the TREC legal track. Results and observations are discussed in chapter 8. We conclude that WSD can improve ad hoc retrieval effectiveness. Finally, we suggest further on efficient algorithms for word sense disambiguation which can further improve retrieval effectiveness if applied to original document collections against using representative collections.| File | Dimensione | Formato | |
|---|---|---|---|
|
AYETIRAN_ENIAFE_FESTUS_TESI.pdf
accesso solo da BNCF e BNCR
Tipologia:
Altro materiale allegato
Licenza:
Tutti i diritti riservati
Dimensione
1.05 MB
Formato
Adobe PDF
|
1.05 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/332300
URN:NBN:IT:BNCF-332300