An FPGA-Based Architecture for Binocular Scene Understanding

Martelli, Samuele

Automatic classifying different categories of objects in images and videos is one of the main goals in computer vision. Among them, pedestrians has attracted considerable attention as key component in different application domains such as video surveillance, navigation systems and robotic control. Despite continuous eﬀorts over the last years to improve accuracy and processing performance, they are not ready for real-world applications yet. Additionally, although hardware solutions have recently demonstrated their reliability to solve some problems in computer vision, few ob ject detection systems are thought to be realized on emebedded devices. The aim of this thesis is to create an FPGA-based hardware implementation to achieve high performance on generic object class detection problems, customized for human detection. Array of covariance matrices, calculated on basic image cues, have been adopted to encode local appearance of humans. They allow to naturally encode intra features correlations and guarantee robustness to different light conditions. Massive parallelizations oﬀered by the target platform are exploited at different levels. As a result a significant speed up of the detection process is achieved. Furthermore we propose new features based on visual and depth cue otherwise not feasible on common processors. Detection performance is evaluated systematically for different features-classifiers combinations to reach best results. All the experiments are performed on challenging real world data. The binocular approach improves performance with single camera, revealing the importance of complementary information coming from the awareness of scene geometry. Our experiments support that modular object models, based on array of covariance matrices, are effective to encapsulate multiple features from different cues and are well suited to be implemented on embedded devices.

An FPGA-Based Architecture for Binocular Scene Understanding

MARTELLI, Samuele

2012

Abstract

Automatic classifying different categories of objects in images and videos is one of the main goals in computer vision. Among them, pedestrians has attracted considerable attention as key component in different application domains such as video surveillance, navigation systems and robotic control. Despite continuous eﬀorts over the last years to improve accuracy and processing performance, they are not ready for real-world applications yet. Additionally, although hardware solutions have recently demonstrated their reliability to solve some problems in computer vision, few ob ject detection systems are thought to be realized on emebedded devices. The aim of this thesis is to create an FPGA-based hardware implementation to achieve high performance on generic object class detection problems, customized for human detection. Array of covariance matrices, calculated on basic image cues, have been adopted to encode local appearance of humans. They allow to naturally encode intra features correlations and guarantee robustness to different light conditions. Massive parallelizations oﬀered by the target platform are exploited at different levels. As a result a significant speed up of the detection process is achieved. Furthermore we propose new features based on visual and depth cue otherwise not feasible on common processors. Detection performance is evaluated systematically for different features-classifiers combinations to reach best results. All the experiments are performed on challenging real world data. The binocular approach improves performance with single camera, revealing the importance of complementary information coming from the awareness of scene geometry. Our experiments support that modular object models, based on array of covariance matrices, are effective to encapsulate multiple features from different cues and are well suited to be implemented on embedded devices.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				Informatica
			
	Data di pubblicazione
	
				2012
			
	Lingua
	
				Inglese
			
	Parola chiave
	
				Embedded; FPGA; object detection; covariance matrices; Riemannian Manifold
			
	Numero di pagine
	
				203
			
	Collezione di appartenenza
	
				Università degli Studi di Verona

File in questo prodotto:

File	Dimensione	Formato
Samuele_Martelli_PhD_Thesis_low.pdf accesso solo da BNCF e BNCR Dimensione 5.71 MB Formato Adobe PDF	5.71 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/115566

Il codice NBN di questa tesi è URN:NBN:IT:UNIVR-115566