The aim of this thesis is the study of two fundamental problems in computer vision: localization from images and semantic image segmentation. The first contribution of this thesis is the development of a complete system that obtains an accurate and fast localization of a hand-held camera device, leveraging not only on a dataset of registered images but also on the three-dimensional information obtained by a Structure from Motion reconstruction. We exploit the 3D structure under two different aspect: first it is directly involved in the camera registration making available robust 2D-3D correspondences instead of 2D-2D pairs of matched features, furthermore we take advantage of the image clustering computed in the Structure from Motion algorithm during the retrieval step of the localization system improving both robustness and efficiency of the aforementioned algorithmic stage. The second part of the thesis consists in an in-depth analysis of one of the main components of the localization system, the camera pose estimation from 2D-3D correspondences. In particular we present a novel formulation of the Perspective-n-Point problem, also known as exterior orientation, in terms of an instance of anisotropic orthogonal Procrustes problem. The last contribution of the thesis is the proposal of a new approach to semantic image segmentation in urban environment that deeply involves the Structure from Motion 3D structure in terms of label transfer from a pre-labeled image to a query image. The query image can be whether an image belonging to the SfM dataset that does not have any semantic information or an external image that has just been localized by the localization system aforementioned. The label assignment problem is modeled as a Markov random field where the nodes are the superpixels of the query image.
Image localization and parsing using 3D structure
GARRO, Valeria
2013
Abstract
The aim of this thesis is the study of two fundamental problems in computer vision: localization from images and semantic image segmentation. The first contribution of this thesis is the development of a complete system that obtains an accurate and fast localization of a hand-held camera device, leveraging not only on a dataset of registered images but also on the three-dimensional information obtained by a Structure from Motion reconstruction. We exploit the 3D structure under two different aspect: first it is directly involved in the camera registration making available robust 2D-3D correspondences instead of 2D-2D pairs of matched features, furthermore we take advantage of the image clustering computed in the Structure from Motion algorithm during the retrieval step of the localization system improving both robustness and efficiency of the aforementioned algorithmic stage. The second part of the thesis consists in an in-depth analysis of one of the main components of the localization system, the camera pose estimation from 2D-3D correspondences. In particular we present a novel formulation of the Perspective-n-Point problem, also known as exterior orientation, in terms of an instance of anisotropic orthogonal Procrustes problem. The last contribution of the thesis is the proposal of a new approach to semantic image segmentation in urban environment that deeply involves the Structure from Motion 3D structure in terms of label transfer from a pre-labeled image to a query image. The query image can be whether an image belonging to the SfM dataset that does not have any semantic information or an external image that has just been localized by the localization system aforementioned. The label assignment problem is modeled as a Markov random field where the nodes are the superpixels of the query image.File | Dimensione | Formato | |
---|---|---|---|
GarroPhdThesisSmallLastVersion.pdf
accesso solo da BNCF e BNCR
Dimensione
4.2 MB
Formato
Adobe PDF
|
4.2 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/115372
URN:NBN:IT:UNIVR-115372