Abstract The automatic analysis of the content of a video sequence has captured the attention of the computer vision community for a very long time. Indeed, video understanding, which needs to incorporate both semantic and dynamic cues, may be trivial for humans, but it turned out to be a very complex task for a machine. Over the years the signal processing, computer vision, and machine learning communities contributed with algorithms that are today effective building blocks of more and more complex systems. In the meanwhile, theoretical analysis has gained a better understanding of this multifaceted type of data. Indeed, video sequences are not only high dimensional data, but they are also very peculiar, as they include spatial as well as temporal information which should be treated differently, but are both important to the overall process. The work of this thesis builds a new bridge between signal processing theory, and computer vision applications. It considers a novel approach to multi resolution signal processing, the so-called Shearlet Transform, as a reference framework for representing meaningful space-time local information in a video signal. The Shearlet Transform has been shown effective in analyzing multi-dimensional signals, ranging from images to x-ray tomographic data. As a tool for signal denoising, has also been applied to video data. However, to the best of our knowledge, the Shearlet Transform has never been employed to design video analysis algorithms. In this thesis, our broad objective is to explore the capabilities of the Shearlet Transform to extract information from 2D+T-dimensional data. We exploit the properties of the Shearlet decomposition to redesign a variety of classical video processing techniques (including space-time interest point detection and normal flow estimation) and to develop novel methods to better understand the local behavior of video sequences. We provide experimental evidence on the potential of our approach on synthetic as well as real data drawn from publicly available benchmark datasets. The results we obtain show the potential of our approach and encourages further investigations in the near future.

Spatio-Temporal Video Analysis and the 3D Shearlet Transform

MALAFRONTE, DAMIANO
2018

Abstract

Abstract The automatic analysis of the content of a video sequence has captured the attention of the computer vision community for a very long time. Indeed, video understanding, which needs to incorporate both semantic and dynamic cues, may be trivial for humans, but it turned out to be a very complex task for a machine. Over the years the signal processing, computer vision, and machine learning communities contributed with algorithms that are today effective building blocks of more and more complex systems. In the meanwhile, theoretical analysis has gained a better understanding of this multifaceted type of data. Indeed, video sequences are not only high dimensional data, but they are also very peculiar, as they include spatial as well as temporal information which should be treated differently, but are both important to the overall process. The work of this thesis builds a new bridge between signal processing theory, and computer vision applications. It considers a novel approach to multi resolution signal processing, the so-called Shearlet Transform, as a reference framework for representing meaningful space-time local information in a video signal. The Shearlet Transform has been shown effective in analyzing multi-dimensional signals, ranging from images to x-ray tomographic data. As a tool for signal denoising, has also been applied to video data. However, to the best of our knowledge, the Shearlet Transform has never been employed to design video analysis algorithms. In this thesis, our broad objective is to explore the capabilities of the Shearlet Transform to extract information from 2D+T-dimensional data. We exploit the properties of the Shearlet decomposition to redesign a variety of classical video processing techniques (including space-time interest point detection and normal flow estimation) and to develop novel methods to better understand the local behavior of video sequences. We provide experimental evidence on the potential of our approach on synthetic as well as real data drawn from publicly available benchmark datasets. The results we obtain show the potential of our approach and encourages further investigations in the near future.
22-mag-2018
Inglese
ODONE, FRANCESCA
DE VITO, ERNESTO
DELZANNO, GIORGIO
Università degli studi di Genova
File in questo prodotto:
File Dimensione Formato  
phdunige_3268245.pdf

accesso aperto

Dimensione 8.01 MB
Formato Adobe PDF
8.01 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/165302
Il codice NBN di questa tesi è URN:NBN:IT:UNIGE-165302