The use of Deep Neural Networks with their increased representational power has allowed for great progress in core areas of computer vision, and in their applications to our day-to-day life. Unfortunately the performance of these systems rests on the "big data" assumption, where large quantities of annotated data are freely and legally available for use. This assumption may not hold due to a variety of factors: legal restrictions, difficulty in gathering samples, expense of annotations, hindering the broad applicability of deep learning methods. This thesis studies and provides solutions for different types of data scarcity: (i) the annotation task is prohibitively expensive, (ii) the gathered data is in a long tail distribution, (iii) data storage is restricted. For the first case, specifically for use in video understanding tasks, we have developed a class agnostic, unsupervised spatio-temporal proposal system learned in a transductive manner, and a more precise pixel-level unsupervised graph based video segmentation method. At the same time, we have developed a cycled, generative, unsupervised depth estimation system that can be further used in image understanding tasks, avoiding the use of expensive depth map annotations. Further, for use in cases where the gathered data is scarce we have developed two few-shot image classification systems: a method that makes use of category-specific 3D models to generate novel samples, and one that increases novel sample diversity by making use of textual data. Finally, data collection and annotation can be legally restricted, significantly impacting the function of lifelong learning systems. To overcome catastrophic forgetting exacerbated by data storage limitations, we have developed a deep generative memory network that functions in a strictly class incremental setup.

Learning in Low Data Regimes for Image and Video Understanding

Puscas, Mihai - Marian
2019

Abstract

The use of Deep Neural Networks with their increased representational power has allowed for great progress in core areas of computer vision, and in their applications to our day-to-day life. Unfortunately the performance of these systems rests on the "big data" assumption, where large quantities of annotated data are freely and legally available for use. This assumption may not hold due to a variety of factors: legal restrictions, difficulty in gathering samples, expense of annotations, hindering the broad applicability of deep learning methods. This thesis studies and provides solutions for different types of data scarcity: (i) the annotation task is prohibitively expensive, (ii) the gathered data is in a long tail distribution, (iii) data storage is restricted. For the first case, specifically for use in video understanding tasks, we have developed a class agnostic, unsupervised spatio-temporal proposal system learned in a transductive manner, and a more precise pixel-level unsupervised graph based video segmentation method. At the same time, we have developed a cycled, generative, unsupervised depth estimation system that can be further used in image understanding tasks, avoiding the use of expensive depth map annotations. Further, for use in cases where the gathered data is scarce we have developed two few-shot image classification systems: a method that makes use of category-specific 3D models to generate novel samples, and one that increases novel sample diversity by making use of textual data. Finally, data collection and annotation can be legally restricted, significantly impacting the function of lifelong learning systems. To overcome catastrophic forgetting exacerbated by data storage limitations, we have developed a deep generative memory network that functions in a strictly class incremental setup.
2019
Inglese
Sebe, Niculae
Università degli studi di Trento
TRENTO
132
File in questo prodotto:
File Dimensione Formato  
2037_190422152740_001_(1).pdf

accesso solo da BNCF e BNCR

Dimensione 966.39 kB
Formato Adobe PDF
966.39 kB Adobe PDF
Thesis_22.pdf

accesso aperto

Dimensione 20.21 MB
Formato Adobe PDF
20.21 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/107913
Il codice NBN di questa tesi è URN:NBN:IT:UNITN-107913