The use of Deep Neural Networks with their increased representational power has allowed for great progress in core areas of computer vision, and in their applications to our day-to-day life. Unfortunately the performance of these systems rests on the "big data" assumption, where large quantities of annotated data are freely and legally available for use. This assumption may not hold due to a variety of factors: legal restrictions, difficulty in gathering samples, expense of annotations, hindering the broad applicability of deep learning methods. This thesis studies and provides solutions for different types of data scarcity: (i) the annotation task is prohibitively expensive, (ii) the gathered data is in a long tail distribution, (iii) data storage is restricted. For the first case, specifically for use in video understanding tasks, we have developed a class agnostic, unsupervised spatio-temporal proposal system learned in a transductive manner, and a more precise pixel-level unsupervised graph based video segmentation method. At the same time, we have developed a cycled, generative, unsupervised depth estimation system that can be further used in image understanding tasks, avoiding the use of expensive depth map annotations. Further, for use in cases where the gathered data is scarce we have developed two few-shot image classification systems: a method that makes use of category-specific 3D models to generate novel samples, and one that increases novel sample diversity by making use of textual data. Finally, data collection and annotation can be legally restricted, significantly impacting the function of lifelong learning systems. To overcome catastrophic forgetting exacerbated by data storage limitations, we have developed a deep generative memory network that functions in a strictly class incremental setup.
Learning in Low Data Regimes for Image and Video Understanding
Puscas, Mihai - Marian
2019
Abstract
The use of Deep Neural Networks with their increased representational power has allowed for great progress in core areas of computer vision, and in their applications to our day-to-day life. Unfortunately the performance of these systems rests on the "big data" assumption, where large quantities of annotated data are freely and legally available for use. This assumption may not hold due to a variety of factors: legal restrictions, difficulty in gathering samples, expense of annotations, hindering the broad applicability of deep learning methods. This thesis studies and provides solutions for different types of data scarcity: (i) the annotation task is prohibitively expensive, (ii) the gathered data is in a long tail distribution, (iii) data storage is restricted. For the first case, specifically for use in video understanding tasks, we have developed a class agnostic, unsupervised spatio-temporal proposal system learned in a transductive manner, and a more precise pixel-level unsupervised graph based video segmentation method. At the same time, we have developed a cycled, generative, unsupervised depth estimation system that can be further used in image understanding tasks, avoiding the use of expensive depth map annotations. Further, for use in cases where the gathered data is scarce we have developed two few-shot image classification systems: a method that makes use of category-specific 3D models to generate novel samples, and one that increases novel sample diversity by making use of textual data. Finally, data collection and annotation can be legally restricted, significantly impacting the function of lifelong learning systems. To overcome catastrophic forgetting exacerbated by data storage limitations, we have developed a deep generative memory network that functions in a strictly class incremental setup.File | Dimensione | Formato | |
---|---|---|---|
2037_190422152740_001_(1).pdf
accesso solo da BNCF e BNCR
Dimensione
966.39 kB
Formato
Adobe PDF
|
966.39 kB | Adobe PDF | |
Thesis_22.pdf
accesso aperto
Dimensione
20.21 MB
Formato
Adobe PDF
|
20.21 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/107913
URN:NBN:IT:UNITN-107913