Automatic document classification process extracts information with an automatic analysis of the content of documents. Is is an active research field of growing importance due to the large amount of electronic documents produced almost daily and worldwide available thanks to diffused technologies. Several application areas benefits of automatic document classification, like document archiving, invoice processing in business environment, press releases, research engines, etc... Current tools classify or "tag" either text or images so they can be processed; by linking image and text-based content, a technology can improve fundamental document management tasks like retrieving information from a database or automatically routing documents to achieve more complete searches and streamlined business processes. In this work, we firstly make an investigation of a possible model for conceptual space of the joint information from the text and the images forming complex documents.We present a formal definition of pertinence and relevance concepts that apply to those documents types we name ``multimodal" and we develop a computable algorithm.Then we present the test dataset which will be used to validate and improve the model.Finally we explain the experiments performed and related results.
Automatic Document Classification: combining image and text information to enhance quality and performances
TOMAZZOLI, Claudio
2014
Abstract
Automatic document classification process extracts information with an automatic analysis of the content of documents. Is is an active research field of growing importance due to the large amount of electronic documents produced almost daily and worldwide available thanks to diffused technologies. Several application areas benefits of automatic document classification, like document archiving, invoice processing in business environment, press releases, research engines, etc... Current tools classify or "tag" either text or images so they can be processed; by linking image and text-based content, a technology can improve fundamental document management tasks like retrieving information from a database or automatically routing documents to achieve more complete searches and streamlined business processes. In this work, we firstly make an investigation of a possible model for conceptual space of the joint information from the text and the images forming complex documents.We present a formal definition of pertinence and relevance concepts that apply to those documents types we name ``multimodal" and we develop a computable algorithm.Then we present the test dataset which will be used to validate and improve the model.Finally we explain the experiments performed and related results.File | Dimensione | Formato | |
---|---|---|---|
Tesi_PDH_Tomazzoli.pdf
accesso solo da BNCF e BNCR
Dimensione
10.18 MB
Formato
Adobe PDF
|
10.18 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/115711
URN:NBN:IT:UNIVR-115711