Deep learning-based object detection for autonomous driving

Zinelli, Andrea

Object detection consists of identifying all objects of interest (e.g. other vehicles) that are inside the field of view of one or more sensors and describe their state. Such state often encompasses the category of the detected object as well as a description of its positional state, such as a 2D bounding box on the image plane or a 3D bounding box in the world. As such, Object detection represents a crucial task in view of an Autonomous vehicle, and one that must be performed as accurately as possible, as it greatly simplifies the subsequent tasks of tracking, planning and control. Currently, Deep Learning-based pipelines are showing impressive performance on this class of tasks, markedly outperforming more traditional computer vision approaches whilst being time-efficient due to their high parallelism and the availability of powerful multi-processing GPU hardware. For the above reasons, the proposed thesis researches the problem of Object Detection in Autonomous Driving scenarios, and tackles it using Deep Learning techniques and tools. More specifically, three different systems are proposed, each one dealing with a different detection task from different sensor inputs. The first system tackles the problem of parking slot detection given surround-view RGB images as input, and it is framed as a variant to the commonly adopted image-based 2D detector Faster R-CNN, whose logic is modified to allow for the estimation of generic quadrilaterals. For the optimization of the proposed model, a small dataset depicting heterogeneous driving scenes and parking spaces was collected and annotated. The second work deals with the under-constrained problem of 3D object detection from a single, pinhole image. Compared to most state-of-the-art approaches, the proposed model is very simple, consisting in a small Multi-Layer Perceptron which estimates the 3D boxes given the features and the 2D detections returned by a Faster R-CNN model. To train the MLP, a novel cost function consisting of an extention of the Generalized Intersection-over-Union loss function to the 3D case is proposed. Experiments performed on the public KITTI dataset show the effectiveness of the method and the cost function, leading to results that surpass contemporary monocular 3D detectors. Further experiments show that the proposed cost function leads to significant improvements when applied to the completely different detector Frustum Pointnets, which advocates towards its applicability to other existing pipelines and models. The third work tackles the problem of 3D detection from LiDAR point clouds. The proposed approach builds upon Votenet, a point-based 3D detection model originally introduced to perform detection in controlled scenarios captured from RGB-D sensors. First, modifications to the original point sampling strategy and classifier are introduced, allowing for better performance on the noisier Autonomous Driving scenes. Then, the attention mechanism is studied as a mean of explicitly modelling inter-point relationships in order to strengthen the feature representations extracted by the model and, ultimately, improve the performance. Experiments on the KITTI dataset show promising results, exhibiting performance that is comparable to the state-of-the-art and validating the adoption of attention as a way of obtaining more discriminative feature representations.

Deep learning-based object detection for autonomous driving

Zinelli, Andrea

2021

Abstract

Object detection consists of identifying all objects of interest (e.g. other vehicles) that are inside the field of view of one or more sensors and describe their state. Such state often encompasses the category of the detected object as well as a description of its positional state, such as a 2D bounding box on the image plane or a 3D bounding box in the world. As such, Object detection represents a crucial task in view of an Autonomous vehicle, and one that must be performed as accurately as possible, as it greatly simplifies the subsequent tasks of tracking, planning and control. Currently, Deep Learning-based pipelines are showing impressive performance on this class of tasks, markedly outperforming more traditional computer vision approaches whilst being time-efficient due to their high parallelism and the availability of powerful multi-processing GPU hardware. For the above reasons, the proposed thesis researches the problem of Object Detection in Autonomous Driving scenarios, and tackles it using Deep Learning techniques and tools. More specifically, three different systems are proposed, each one dealing with a different detection task from different sensor inputs. The first system tackles the problem of parking slot detection given surround-view RGB images as input, and it is framed as a variant to the commonly adopted image-based 2D detector Faster R-CNN, whose logic is modified to allow for the estimation of generic quadrilaterals. For the optimization of the proposed model, a small dataset depicting heterogeneous driving scenes and parking spaces was collected and annotated. The second work deals with the under-constrained problem of 3D object detection from a single, pinhole image. Compared to most state-of-the-art approaches, the proposed model is very simple, consisting in a small Multi-Layer Perceptron which estimates the 3D boxes given the features and the 2D detections returned by a Faster R-CNN model. To train the MLP, a novel cost function consisting of an extention of the Generalized Intersection-over-Union loss function to the 3D case is proposed. Experiments performed on the public KITTI dataset show the effectiveness of the method and the cost function, leading to results that surpass contemporary monocular 3D detectors. Further experiments show that the proposed cost function leads to significant improvements when applied to the completely different detector Frustum Pointnets, which advocates towards its applicability to other existing pipelines and models. The third work tackles the problem of 3D detection from LiDAR point clouds. The proposed approach builds upon Votenet, a point-based 3D detection model originally introduced to perform detection in controlled scenarios captured from RGB-D sensors. First, modifications to the original point sampling strategy and classifier are introduced, allowing for better performance on the noisier Autonomous Driving scenes. Then, the attention mechanism is studied as a mean of explicitly modelling inter-point relationships in order to strengthen the feature representations extracted by the model and, ultimately, improve the performance. Experiments on the KITTI dataset show promising results, exhibiting performance that is comparable to the state-of-the-art and validating the adoption of attention as a way of obtaining more discriminative feature representations.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				Dottorato di ricerca in Tecnologie dell'informazione
			
	Data di pubblicazione
	
				2021
			
	Lingua
	
				Inglese
			
	Parola chiave
	
				Deep Learning
Computer Vision
Object Detection
3D
Autonomous Driving
			
	Relatore, Supervisor, Advisor o Tutor
	
				Bertozzi, Massimo
			
	Nome Editore
	
				Università degli Studi di Parma
			
	Collezione di appartenenza
	
				Università degli Studi di Parma

File in questo prodotto:

File	Dimensione	Formato
relazione_finale_Zinelli.pdf accesso solo da BNCF e BNCR Tipologia: Altro materiale allegato Dimensione 5.5 kB Formato Adobe PDF	5.5 kB	Adobe PDF
PhDThesis_Zinelli.pdf accesso solo da BNCF e BNCR Tipologia: Altro materiale allegato Dimensione 45.93 MB Formato Adobe PDF	45.93 MB	Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/151654

Il codice NBN di questa tesi è URN:NBN:IT:UNIPR-151654