For an autonomous system to completely understand a particular scene, a 3D reconstruction of the world is required which has both the geometric information such as camera pose and semantic information such as the label associated with an object (tree, chair, dog, etc.) mapped within the 3D reconstruction. In this thesis, we will study the problem of an object-centric 3D reconstruction of a scene in contrast with most of the previous work in the literature which focuses on building a 3D point cloud that has only the structure but lacking any semantic information. We will study how crucial 3D object localization is for this problem and will discuss the limitations faced by the previous related methods. We will present an approach for 3D object localization using only 2D detections observed in multiple views by including 3D object shape priors. Since our first approach relies on associating 2D detections in multiple views, we will also study an approach to re-identify multiple object instances of an object in rigid scenes and will propose a novel method of joint learning of the foreground and background of an object instance using a triplet-based network in order to identify multiple instances of the same object in multiple views. We will also propose an Augmented Reality-based application using Google's Tango by integrating both the proposed approaches. Finally, we will conclude with some open problems that might benefit from the suggested future work.
The role of object instance re-identification in 3D object localization and semantic 3D reconstruction.
BANSAL, VAIBHAV
2020
Abstract
For an autonomous system to completely understand a particular scene, a 3D reconstruction of the world is required which has both the geometric information such as camera pose and semantic information such as the label associated with an object (tree, chair, dog, etc.) mapped within the 3D reconstruction. In this thesis, we will study the problem of an object-centric 3D reconstruction of a scene in contrast with most of the previous work in the literature which focuses on building a 3D point cloud that has only the structure but lacking any semantic information. We will study how crucial 3D object localization is for this problem and will discuss the limitations faced by the previous related methods. We will present an approach for 3D object localization using only 2D detections observed in multiple views by including 3D object shape priors. Since our first approach relies on associating 2D detections in multiple views, we will also study an approach to re-identify multiple object instances of an object in rigid scenes and will propose a novel method of joint learning of the foreground and background of an object instance using a triplet-based network in order to identify multiple instances of the same object in multiple views. We will also propose an Augmented Reality-based application using Google's Tango by integrating both the proposed approaches. Finally, we will conclude with some open problems that might benefit from the suggested future work.File | Dimensione | Formato | |
---|---|---|---|
phdunige_4317173_1.pdf
accesso aperto
Dimensione
8.16 MB
Formato
Adobe PDF
|
8.16 MB | Adobe PDF | Visualizza/Apri |
phdunige_4317173_2.pdf
accesso aperto
Dimensione
19.28 MB
Formato
Adobe PDF
|
19.28 MB | Adobe PDF | Visualizza/Apri |
phdunige_4317173_3.pdf
accesso aperto
Dimensione
4.73 MB
Formato
Adobe PDF
|
4.73 MB | Adobe PDF | Visualizza/Apri |
phdunige_4317173_4.pdf
accesso aperto
Dimensione
150.94 kB
Formato
Adobe PDF
|
150.94 kB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/63497
URN:NBN:IT:UNIGE-63497