Automatic classifying different categories of objects in images and videos is one of the main goals in computer vision. Among them, pedestrians has attracted considerable attention as key component in different application domains such as video surveillance, navigation systems and robotic control. Despite continuous efforts over the last years to improve accuracy and processing performance, they are not ready for real-world applications yet. Additionally, although hardware solutions have recently demonstrated their reliability to solve some problems in computer vision, few ob ject detection systems are thought to be realized on emebedded devices. The aim of this thesis is to create an FPGA-based hardware implementation to achieve high performance on generic object class detection problems, customized for human detection. Array of covariance matrices, calculated on basic image cues, have been adopted to encode local appearance of humans. They allow to naturally encode intra features correlations and guarantee robustness to different light conditions. Massive parallelizations offered by the target platform are exploited at different levels. As a result a significant speed up of the detection process is achieved. Furthermore we propose new features based on visual and depth cue otherwise not feasible on common processors. Detection performance is evaluated systematically for different features-classifiers combinations to reach best results. All the experiments are performed on challenging real world data. The binocular approach improves performance with single camera, revealing the importance of complementary information coming from the awareness of scene geometry. Our experiments support that modular object models, based on array of covariance matrices, are effective to encapsulate multiple features from different cues and are well suited to be implemented on embedded devices.

An FPGA-Based Architecture for Binocular Scene Understanding

MARTELLI, Samuele
2012

Abstract

Automatic classifying different categories of objects in images and videos is one of the main goals in computer vision. Among them, pedestrians has attracted considerable attention as key component in different application domains such as video surveillance, navigation systems and robotic control. Despite continuous efforts over the last years to improve accuracy and processing performance, they are not ready for real-world applications yet. Additionally, although hardware solutions have recently demonstrated their reliability to solve some problems in computer vision, few ob ject detection systems are thought to be realized on emebedded devices. The aim of this thesis is to create an FPGA-based hardware implementation to achieve high performance on generic object class detection problems, customized for human detection. Array of covariance matrices, calculated on basic image cues, have been adopted to encode local appearance of humans. They allow to naturally encode intra features correlations and guarantee robustness to different light conditions. Massive parallelizations offered by the target platform are exploited at different levels. As a result a significant speed up of the detection process is achieved. Furthermore we propose new features based on visual and depth cue otherwise not feasible on common processors. Detection performance is evaluated systematically for different features-classifiers combinations to reach best results. All the experiments are performed on challenging real world data. The binocular approach improves performance with single camera, revealing the importance of complementary information coming from the awareness of scene geometry. Our experiments support that modular object models, based on array of covariance matrices, are effective to encapsulate multiple features from different cues and are well suited to be implemented on embedded devices.
2012
Inglese
Embedded; FPGA; object detection; covariance matrices; Riemannian Manifold
203
File in questo prodotto:
File Dimensione Formato  
Samuele_Martelli_PhD_Thesis_low.pdf

accesso solo da BNCF e BNCR

Dimensione 5.71 MB
Formato Adobe PDF
5.71 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/115566
Il codice NBN di questa tesi è URN:NBN:IT:UNIVR-115566