Emulating human perception is a foundational component in the research towards artificial intelligence (AI). Computer vision, in particular, is now one of the most active and fastest growing research topics in AI, and its field of practical applications range from video-survaillance to robotics to ecological monitoring. However, in spite of all the recent progress, humans still greatly outperform machines in most visual tasks, and even competitive artificial models require thousands of examples to learn concepts that children learn easily. Hence, given the objective difficulty in emulating the human visual system, the question that we intended to investigate in this thesis is in which ways humans can support the advancement of computer vision techniques. More precisely, we investigated how the synergy between human vision expertise and automated methods can be shifted from a top-down paradigm where direct user action or human perception principles explicitly guide the software component to a bottom-up paradigm, where instead of trying to copy the way our mind works, we exploit the by-product (i.e. some kind of measured feedback) of its workings to extract information on how visual tasks are performed. Starting from a purely top-down approach, where a fully-automated video object segmentation algorithm is extended to encode and include principles of human perceptual organization, we moved to interactive methods, where the same task is performed involving humans in the loop by means of gamification and eye-gaze analysis strategies, in a progressively increasing bottom-up fashion. Lastly, we pushed this trend to the limit by investigating brain-driven image classification approaches, where brain signals were used to extract compact representation of image contents. Performance evaluation of the tested approaches shows that involving people in automated vision methods can enhance their accuracy. Our experiments, carried out at different degrees of awareness and control of the generated human feedback, show that top-down approaches may achieve a better accuracy than bottom-up ones, at the cost of higher user interaction time and effort. As for our most ambitious objective, the purely bottom-up image classification system from brain pattern analysis, we were able to outperform the current state of the art with a method trained to extract brain-inspired visual content descriptors, thus removing the need of undergoing EEG recording for unseen images.
Hybrid human-machine vision systems for automated object segmentation and categorization
PALAZZO, SIMONE
2017
Abstract
Emulating human perception is a foundational component in the research towards artificial intelligence (AI). Computer vision, in particular, is now one of the most active and fastest growing research topics in AI, and its field of practical applications range from video-survaillance to robotics to ecological monitoring. However, in spite of all the recent progress, humans still greatly outperform machines in most visual tasks, and even competitive artificial models require thousands of examples to learn concepts that children learn easily. Hence, given the objective difficulty in emulating the human visual system, the question that we intended to investigate in this thesis is in which ways humans can support the advancement of computer vision techniques. More precisely, we investigated how the synergy between human vision expertise and automated methods can be shifted from a top-down paradigm where direct user action or human perception principles explicitly guide the software component to a bottom-up paradigm, where instead of trying to copy the way our mind works, we exploit the by-product (i.e. some kind of measured feedback) of its workings to extract information on how visual tasks are performed. Starting from a purely top-down approach, where a fully-automated video object segmentation algorithm is extended to encode and include principles of human perceptual organization, we moved to interactive methods, where the same task is performed involving humans in the loop by means of gamification and eye-gaze analysis strategies, in a progressively increasing bottom-up fashion. Lastly, we pushed this trend to the limit by investigating brain-driven image classification approaches, where brain signals were used to extract compact representation of image contents. Performance evaluation of the tested approaches shows that involving people in automated vision methods can enhance their accuracy. Our experiments, carried out at different degrees of awareness and control of the generated human feedback, show that top-down approaches may achieve a better accuracy than bottom-up ones, at the cost of higher user interaction time and effort. As for our most ambitious objective, the purely bottom-up image classification system from brain pattern analysis, we were able to outperform the current state of the art with a method trained to extract brain-inspired visual content descriptors, thus removing the need of undergoing EEG recording for unseen images.File | Dimensione | Formato | |
---|---|---|---|
thesis.pdf
accesso aperto
Dimensione
14.21 MB
Formato
Adobe PDF
|
14.21 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/76544
URN:NBN:IT:UNICT-76544