Abstract The automatic analysis of the content of a video sequence has captured the attention of the computer vision community for a very long time. Indeed, video understanding, which needs to incorporate both semantic and dynamic cues, may be trivial for humans, but it turned out to be a very complex task for a machine. Over the years the signal processing, computer vision, and machine learning communities contributed with algorithms that are today effective building blocks of more and more complex systems. In the meanwhile, theoretical analysis has gained a better understanding of this multifaceted type of data. Indeed, video sequences are not only high dimensional data, but they are also very peculiar, as they include spatial as well as temporal information which should be treated differently, but are both important to the overall process. The work of this thesis builds a new bridge between signal processing theory, and computer vision applications. It considers a novel approach to multi resolution signal processing, the so-called Shearlet Transform, as a reference framework for representing meaningful space-time local information in a video signal. The Shearlet Transform has been shown effective in analyzing multi-dimensional signals, ranging from images to x-ray tomographic data. As a tool for signal denoising, has also been applied to video data. However, to the best of our knowledge, the Shearlet Transform has never been employed to design video analysis algorithms. In this thesis, our broad objective is to explore the capabilities of the Shearlet Transform to extract information from 2D+T-dimensional data. We exploit the properties of the Shearlet decomposition to redesign a variety of classical video processing techniques (including space-time interest point detection and normal flow estimation) and to develop novel methods to better understand the local behavior of video sequences. We provide experimental evidence on the potential of our approach on synthetic as well as real data drawn from publicly available benchmark datasets. The results we obtain show the potential of our approach and encourages further investigations in the near future.
Spatio-Temporal Video Analysis and the 3D Shearlet Transform
MALAFRONTE, DAMIANO
2018
Abstract
Abstract The automatic analysis of the content of a video sequence has captured the attention of the computer vision community for a very long time. Indeed, video understanding, which needs to incorporate both semantic and dynamic cues, may be trivial for humans, but it turned out to be a very complex task for a machine. Over the years the signal processing, computer vision, and machine learning communities contributed with algorithms that are today effective building blocks of more and more complex systems. In the meanwhile, theoretical analysis has gained a better understanding of this multifaceted type of data. Indeed, video sequences are not only high dimensional data, but they are also very peculiar, as they include spatial as well as temporal information which should be treated differently, but are both important to the overall process. The work of this thesis builds a new bridge between signal processing theory, and computer vision applications. It considers a novel approach to multi resolution signal processing, the so-called Shearlet Transform, as a reference framework for representing meaningful space-time local information in a video signal. The Shearlet Transform has been shown effective in analyzing multi-dimensional signals, ranging from images to x-ray tomographic data. As a tool for signal denoising, has also been applied to video data. However, to the best of our knowledge, the Shearlet Transform has never been employed to design video analysis algorithms. In this thesis, our broad objective is to explore the capabilities of the Shearlet Transform to extract information from 2D+T-dimensional data. We exploit the properties of the Shearlet decomposition to redesign a variety of classical video processing techniques (including space-time interest point detection and normal flow estimation) and to develop novel methods to better understand the local behavior of video sequences. We provide experimental evidence on the potential of our approach on synthetic as well as real data drawn from publicly available benchmark datasets. The results we obtain show the potential of our approach and encourages further investigations in the near future.File | Dimensione | Formato | |
---|---|---|---|
phdunige_3268245.pdf
accesso aperto
Dimensione
8.01 MB
Formato
Adobe PDF
|
8.01 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/108178
URN:NBN:IT:UNIGE-108178