A STOCHASTIC FORAGING MODEL OF ATTENTIVE EYE GUIDANCE ON DYNAMIC STIMULI

D'Amelio, Alessandro

Understanding human behavioural signals is one of the key ingredients of an effective human-human and human-computer interaction (HCI). In such respect, non verbal communication plays a key role and is composed by a variety of modalities acting jointly to convey a common message. In particular, cues like gesture, facial expression, prosody etc. have the same importance as spoken words. Gaze behaviour makes no exception, being one of the most common, yet unobtrusive ways of communicating. To this aim, many computational models of visual attention allocation have been proposed; although such models were primarily conceived in the psychological field, in the last couple of decades, the problem of predicting attention allocation on a visual stimuli has started to catch the interest of the computer vision and pattern recognition community, pushed by the fast growing number of possible applications (e.g. autonomous driving, image/video compression, robotics). In this renaissance of attention modelling, some of the key features characterizing eye movements were at best overlooked; in particular the explicit unrolling in time of eye movements (i.e. their dynamics) has been seldom taken into account. Moreover, the vast majority of the proposed models are only able to deal with static stimuli (images), with few notable exceptions. The main contribution of this work is a novel computational model of attentive eye guidance which derives gaze dynamics in a principled way, by reformulating attention deployment as a stochastic foraging problem. We show how treating a virtual observer attending to a video as a stochastic composite forager searching for valuable patches in a multi-modal landscape, leads to simulated gaze trajectories that are not statistically distinguishable from the ones performed by humans while free-viewing the same scene. Model simulation and experiments are carried out on a publicly available dataset of eye-tracked subjects displaying conversations and social interactions between humans.

A STOCHASTIC FORAGING MODEL OF ATTENTIVE EYE GUIDANCE ON DYNAMIC STIMULI

D'AMELIO, ALESSANDRO

2021

Abstract

Understanding human behavioural signals is one of the key ingredients of an effective human-human and human-computer interaction (HCI). In such respect, non verbal communication plays a key role and is composed by a variety of modalities acting jointly to convey a common message. In particular, cues like gesture, facial expression, prosody etc. have the same importance as spoken words. Gaze behaviour makes no exception, being one of the most common, yet unobtrusive ways of communicating. To this aim, many computational models of visual attention allocation have been proposed; although such models were primarily conceived in the psychological field, in the last couple of decades, the problem of predicting attention allocation on a visual stimuli has started to catch the interest of the computer vision and pattern recognition community, pushed by the fast growing number of possible applications (e.g. autonomous driving, image/video compression, robotics). In this renaissance of attention modelling, some of the key features characterizing eye movements were at best overlooked; in particular the explicit unrolling in time of eye movements (i.e. their dynamics) has been seldom taken into account. Moreover, the vast majority of the proposed models are only able to deal with static stimuli (images), with few notable exceptions. The main contribution of this work is a novel computational model of attentive eye guidance which derives gaze dynamics in a principled way, by reformulating attention deployment as a stochastic foraging problem. We show how treating a virtual observer attending to a video as a stochastic composite forager searching for valuable patches in a multi-modal landscape, leads to simulated gaze trajectories that are not statistically distinguishable from the ones performed by humans while free-viewing the same scene. Model simulation and experiments are carried out on a publicly available dataset of eye-tracked subjects displaying conversations and social interactions between humans.

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				Dipartimento di Informatica Giovanni Degli Antoni
			
	Corso di studio
	
				INFORMATICA
			
	Data di pubblicazione
	
				22-mar-2021
			
	Lingua
	
				Inglese
			
	Parola chiave
	
				Audio-visual attention; gaze models; social interaction; multimodal perception
			
	Relatore, Supervisor, Advisor o Tutor
	
				GROSSI, GIULIANO
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				GROSSI, GIULIANO
BOCCIGNONE, GIUSEPPE
BOLDI, PAOLO
			
	Nome Editore
	
				Università degli Studi di Milano
			
	Collezione di appartenenza
	
				Università degli Studi di Milano

File in questo prodotto:

File	Dimensione	Formato
phd_unimi_R11866.pdf accesso aperto Dimensione 35.71 MB Formato Adobe PDF Visualizza/Apri	35.71 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/73564

Il codice NBN di questa tesi è URN:NBN:IT:UNIMI-73564