Deep learning has demonstrated to be a successful approach in multiple computer vision applications. The significant achievements of deep learning systems has always been coupled with the presence of powerful computation and large quantity of data. Moreover, importing deep learning to robotics application raises additional challenges that have not been widely addressed by computer vision and machine learning community. On the other hand, rule-based representation approaches have provided robust and effective solutions for different computer vision tasks in the past years. Developing and designing solutions based on predefined models require a strong understanding and knowledge of the problem structure. However, generalizing those solutions to handle the complex real-world scenarios becomes one of the main drawbacks. This thesis tries to combine the knowledge obtained by the rule-based representation approaches with the generic solution provided by the deep learning based end-to-end techniques. This work aims at giving a general review on both techniques, and proposes a pipeline, modular in its design, for image classification, object detection and recognition and semantic image segmentation tasks to be used on data coming from mobile robotic platforms. The main contributions of the thesis are: • A combined approaches based on rule-based representation and end-to-end; • Pipeline for object detection and recognition with mobile platforms; • A multi-sensor approach for people detection with a social robot; • Addressing the pixel-wise semantic image segmentation in challenging applications; • An experimental evaluation for the proposed solutions. In this thesis, the end-to-end architecture for computer vision is described and different solutions for improving its performance and addressing its limitation in image classification, object detection, and semantic image segmentation are proposed. Quantitative and quantitative evaluation metrics, through experimental results in real-world applications, for the proposed solutions are shown and future directions are discussed.

Building vision applications through deep neural networks using data acquired by a robot platform

YOUSSEF, ALI
2019

Abstract

Deep learning has demonstrated to be a successful approach in multiple computer vision applications. The significant achievements of deep learning systems has always been coupled with the presence of powerful computation and large quantity of data. Moreover, importing deep learning to robotics application raises additional challenges that have not been widely addressed by computer vision and machine learning community. On the other hand, rule-based representation approaches have provided robust and effective solutions for different computer vision tasks in the past years. Developing and designing solutions based on predefined models require a strong understanding and knowledge of the problem structure. However, generalizing those solutions to handle the complex real-world scenarios becomes one of the main drawbacks. This thesis tries to combine the knowledge obtained by the rule-based representation approaches with the generic solution provided by the deep learning based end-to-end techniques. This work aims at giving a general review on both techniques, and proposes a pipeline, modular in its design, for image classification, object detection and recognition and semantic image segmentation tasks to be used on data coming from mobile robotic platforms. The main contributions of the thesis are: • A combined approaches based on rule-based representation and end-to-end; • Pipeline for object detection and recognition with mobile platforms; • A multi-sensor approach for people detection with a social robot; • Addressing the pixel-wise semantic image segmentation in challenging applications; • An experimental evaluation for the proposed solutions. In this thesis, the end-to-end architecture for computer vision is described and different solutions for improving its performance and addressing its limitation in image classification, object detection, and semantic image segmentation are proposed. Quantitative and quantitative evaluation metrics, through experimental results in real-world applications, for the proposed solutions are shown and future directions are discussed.
9-set-2019
Inglese
Robot vision; computer vision; deep learning
NARDI, DANIELE
BLOISI, Domenico Daniele
Università degli Studi di Roma "La Sapienza"
File in questo prodotto:
File Dimensione Formato  
Tesi_dottorato_Youssef.pdf

accesso aperto

Dimensione 28.74 MB
Formato Adobe PDF
28.74 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/180823
Il codice NBN di questa tesi è URN:NBN:IT:UNIROMA1-180823