Deep learning has demonstrated to be a successful approach in multiple computer vision applications. The significant achievements of deep learning systems has always been coupled with the presence of powerful computation and large quantity of data. Moreover, importing deep learning to robotics application raises additional challenges that have not been widely addressed by computer vision and machine learning community. On the other hand, rule-based representation approaches have provided robust and effective solutions for different computer vision tasks in the past years. Developing and designing solutions based on predefined models require a strong understanding and knowledge of the problem structure. However, generalizing those solutions to handle the complex real-world scenarios becomes one of the main drawbacks. This thesis tries to combine the knowledge obtained by the rule-based representation approaches with the generic solution provided by the deep learning based end-to-end techniques. This work aims at giving a general review on both techniques, and proposes a pipeline, modular in its design, for image classification, object detection and recognition and semantic image segmentation tasks to be used on data coming from mobile robotic platforms. The main contributions of the thesis are: • A combined approaches based on rule-based representation and end-to-end; • Pipeline for object detection and recognition with mobile platforms; • A multi-sensor approach for people detection with a social robot; • Addressing the pixel-wise semantic image segmentation in challenging applications; • An experimental evaluation for the proposed solutions. In this thesis, the end-to-end architecture for computer vision is described and different solutions for improving its performance and addressing its limitation in image classification, object detection, and semantic image segmentation are proposed. Quantitative and quantitative evaluation metrics, through experimental results in real-world applications, for the proposed solutions are shown and future directions are discussed.
Building vision applications through deep neural networks using data acquired by a robot platform
YOUSSEF, ALI
2019
Abstract
Deep learning has demonstrated to be a successful approach in multiple computer vision applications. The significant achievements of deep learning systems has always been coupled with the presence of powerful computation and large quantity of data. Moreover, importing deep learning to robotics application raises additional challenges that have not been widely addressed by computer vision and machine learning community. On the other hand, rule-based representation approaches have provided robust and effective solutions for different computer vision tasks in the past years. Developing and designing solutions based on predefined models require a strong understanding and knowledge of the problem structure. However, generalizing those solutions to handle the complex real-world scenarios becomes one of the main drawbacks. This thesis tries to combine the knowledge obtained by the rule-based representation approaches with the generic solution provided by the deep learning based end-to-end techniques. This work aims at giving a general review on both techniques, and proposes a pipeline, modular in its design, for image classification, object detection and recognition and semantic image segmentation tasks to be used on data coming from mobile robotic platforms. The main contributions of the thesis are: • A combined approaches based on rule-based representation and end-to-end; • Pipeline for object detection and recognition with mobile platforms; • A multi-sensor approach for people detection with a social robot; • Addressing the pixel-wise semantic image segmentation in challenging applications; • An experimental evaluation for the proposed solutions. In this thesis, the end-to-end architecture for computer vision is described and different solutions for improving its performance and addressing its limitation in image classification, object detection, and semantic image segmentation are proposed. Quantitative and quantitative evaluation metrics, through experimental results in real-world applications, for the proposed solutions are shown and future directions are discussed.File | Dimensione | Formato | |
---|---|---|---|
Tesi_dottorato_Youssef.pdf
accesso aperto
Dimensione
28.74 MB
Formato
Adobe PDF
|
28.74 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/180823
URN:NBN:IT:UNIROMA1-180823