The huge volume of images shared in the web sites and on personal archives has provided us challenges on massive multimedia management. Due to the well-known semantic gap between human-understandable high-level semantics and machine generated low-level features, recent years have witnessed plenty of research effort on multimedia content understanding and indexing. Computer vision algorithms for individual tasks such as object recognition, detection and segmentation have reached impressive results. The next challenge is to integrate all these algorithms and address the problem of the complete scene understanding, which involves explaining the image by recognizing all the objects of interest and their spatial extent or shape. True semantic understanding of an image mainly involves the scene classification and the semantic segmentation. The former has the aim to determinate the categories to which an image belongs. The later instead, provide for each pixel a semantic label, which describes the category of object where it appears. Solutions for the semantic interpretation and understanding of images will enable and enhance large variety of computer vision applications. While a human can do these tasks easily, it is laborious and the sheer quantity of data involved can make it prohibitive for a computer. This thesis proposes novel approaches for semantic scene categorization, segmentation and retrieval that enable a device with a limited amount of resources to understand images automatically. The proposed computer vision solutions use machine-learning algorithms to build robust and reusable systems. Since learning is a key component of biological vision systems, the design of automatic artificial systems that are capable to learn, is one of the most important trends in modern computer vision research.

True scene understanding: classification, semantic segmentation and retrieval

RAVI', DANIELE
2013

Abstract

The huge volume of images shared in the web sites and on personal archives has provided us challenges on massive multimedia management. Due to the well-known semantic gap between human-understandable high-level semantics and machine generated low-level features, recent years have witnessed plenty of research effort on multimedia content understanding and indexing. Computer vision algorithms for individual tasks such as object recognition, detection and segmentation have reached impressive results. The next challenge is to integrate all these algorithms and address the problem of the complete scene understanding, which involves explaining the image by recognizing all the objects of interest and their spatial extent or shape. True semantic understanding of an image mainly involves the scene classification and the semantic segmentation. The former has the aim to determinate the categories to which an image belongs. The later instead, provide for each pixel a semantic label, which describes the category of object where it appears. Solutions for the semantic interpretation and understanding of images will enable and enhance large variety of computer vision applications. While a human can do these tasks easily, it is laborious and the sheer quantity of data involved can make it prohibitive for a computer. This thesis proposes novel approaches for semantic scene categorization, segmentation and retrieval that enable a device with a limited amount of resources to understand images automatically. The proposed computer vision solutions use machine-learning algorithms to build robust and reusable systems. Since learning is a key component of biological vision systems, the design of automatic artificial systems that are capable to learn, is one of the most important trends in modern computer vision research.
10-dic-2013
Inglese
BATTIATO, SEBASTIANO
CUTELLO, Vincenzo
Università degli studi di Catania
Catania
File in questo prodotto:
File Dimensione Formato  
main.pdf

accesso aperto

Dimensione 11.97 MB
Formato Adobe PDF
11.97 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/75138
Il codice NBN di questa tesi è URN:NBN:IT:UNICT-75138