Automatic classifying different categories of objects in images and videos is one of the main goals in computer vision. Among them, pedestrians has attracted considerable attention as key component in different application domains such as video surveillance, navigation systems and robotic control. Despite continuous efforts over the last years to improve accuracy and processing performance, they are not ready for real-world applications yet. Additionally, although hardware solutions have recently demonstrated their reliability to solve some problems in computer vision, few ob ject detection systems are thought to be realized on emebedded devices. The aim of this thesis is to create an FPGA-based hardware implementation to achieve high performance on generic object class detection problems, customized for human detection. Array of covariance matrices, calculated on basic image cues, have been adopted to encode local appearance of humans. They allow to naturally encode intra features correlations and guarantee robustness to different light conditions. Massive parallelizations offered by the target platform are exploited at different levels. As a result a significant speed up of the detection process is achieved. Furthermore we propose new features based on visual and depth cue otherwise not feasible on common processors. Detection performance is evaluated systematically for different features-classifiers combinations to reach best results. All the experiments are performed on challenging real world data. The binocular approach improves performance with single camera, revealing the importance of complementary information coming from the awareness of scene geometry. Our experiments support that modular object models, based on array of covariance matrices, are effective to encapsulate multiple features from different cues and are well suited to be implemented on embedded devices.
An FPGA-Based Architecture for Binocular Scene Understanding
MARTELLI, Samuele
2012
Abstract
Automatic classifying different categories of objects in images and videos is one of the main goals in computer vision. Among them, pedestrians has attracted considerable attention as key component in different application domains such as video surveillance, navigation systems and robotic control. Despite continuous efforts over the last years to improve accuracy and processing performance, they are not ready for real-world applications yet. Additionally, although hardware solutions have recently demonstrated their reliability to solve some problems in computer vision, few ob ject detection systems are thought to be realized on emebedded devices. The aim of this thesis is to create an FPGA-based hardware implementation to achieve high performance on generic object class detection problems, customized for human detection. Array of covariance matrices, calculated on basic image cues, have been adopted to encode local appearance of humans. They allow to naturally encode intra features correlations and guarantee robustness to different light conditions. Massive parallelizations offered by the target platform are exploited at different levels. As a result a significant speed up of the detection process is achieved. Furthermore we propose new features based on visual and depth cue otherwise not feasible on common processors. Detection performance is evaluated systematically for different features-classifiers combinations to reach best results. All the experiments are performed on challenging real world data. The binocular approach improves performance with single camera, revealing the importance of complementary information coming from the awareness of scene geometry. Our experiments support that modular object models, based on array of covariance matrices, are effective to encapsulate multiple features from different cues and are well suited to be implemented on embedded devices.File | Dimensione | Formato | |
---|---|---|---|
Samuele_Martelli_PhD_Thesis_low.pdf
accesso solo da BNCF e BNCR
Dimensione
5.71 MB
Formato
Adobe PDF
|
5.71 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/115566
URN:NBN:IT:UNIVR-115566