Building vision applications through deep neural networks using data acquired by a robot platform

Youssef, Ali

Deep learning has demonstrated to be a successful approach in multiple computer vision applications. The significant achievements of deep learning systems has always been coupled with the presence of powerful computation and large quantity of data. Moreover, importing deep learning to robotics application raises additional challenges that have not been widely addressed by computer vision and machine learning community. On the other hand, rule-based representation approaches have provided robust and effective solutions for different computer vision tasks in the past years. Developing and designing solutions based on predefined models require a strong understanding and knowledge of the problem structure. However, generalizing those solutions to handle the complex real-world scenarios becomes one of the main drawbacks. This thesis tries to combine the knowledge obtained by the rule-based representation approaches with the generic solution provided by the deep learning based end-to-end techniques. This work aims at giving a general review on both techniques, and proposes a pipeline, modular in its design, for image classification, object detection and recognition and semantic image segmentation tasks to be used on data coming from mobile robotic platforms. The main contributions of the thesis are: • A combined approaches based on rule-based representation and end-to-end; • Pipeline for object detection and recognition with mobile platforms; • A multi-sensor approach for people detection with a social robot; • Addressing the pixel-wise semantic image segmentation in challenging applications; • An experimental evaluation for the proposed solutions. In this thesis, the end-to-end architecture for computer vision is described and different solutions for improving its performance and addressing its limitation in image classification, object detection, and semantic image segmentation are proposed. Quantitative and quantitative evaluation metrics, through experimental results in real-world applications, for the proposed solutions are shown and future directions are discussed.

Building vision applications through deep neural networks using data acquired by a robot platform

YOUSSEF, ALI

2019

Abstract

Deep learning has demonstrated to be a successful approach in multiple computer vision applications. The significant achievements of deep learning systems has always been coupled with the presence of powerful computation and large quantity of data. Moreover, importing deep learning to robotics application raises additional challenges that have not been widely addressed by computer vision and machine learning community. On the other hand, rule-based representation approaches have provided robust and effective solutions for different computer vision tasks in the past years. Developing and designing solutions based on predefined models require a strong understanding and knowledge of the problem structure. However, generalizing those solutions to handle the complex real-world scenarios becomes one of the main drawbacks. This thesis tries to combine the knowledge obtained by the rule-based representation approaches with the generic solution provided by the deep learning based end-to-end techniques. This work aims at giving a general review on both techniques, and proposes a pipeline, modular in its design, for image classification, object detection and recognition and semantic image segmentation tasks to be used on data coming from mobile robotic platforms. The main contributions of the thesis are: • A combined approaches based on rule-based representation and end-to-end; • Pipeline for object detection and recognition with mobile platforms; • A multi-sensor approach for people detection with a social robot; • Addressing the pixel-wise semantic image segmentation in challenging applications; • An experimental evaluation for the proposed solutions. In this thesis, the end-to-end architecture for computer vision is described and different solutions for improving its performance and addressing its limitation in image classification, object detection, and semantic image segmentation are proposed. Quantitative and quantitative evaluation metrics, through experimental results in real-world applications, for the proposed solutions are shown and future directions are discussed.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				DIPARTIMENTO DI INGEGNERIA INFORMATICA, AUTOMATICA E GESTIONALE -ANTONIO RUBERTI-
			
	Corso di studio
	
				Ingegneria informatica
			
	Data di pubblicazione
	
				9-set-2019
			
	Lingua
	
				Inglese
			
	Parola chiave
	
				Robot vision; computer vision; deep learning
			
	Relatore, Supervisor, Advisor o Tutor
	
				NARDI, DANIELE
BLOISI, Domenico Daniele
			
	Nome Editore
	
				Università degli Studi di Roma "La Sapienza"
			
	Collezione di appartenenza
	
				Università degli Studi di Roma La Sapienza

File in questo prodotto:

File	Dimensione	Formato
Tesi_dottorato_Youssef.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 28.74 MB Formato Adobe PDF Visualizza/Apri	28.74 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/180823

Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-180823