In the last few years, there is a growing interest in the analysis of human-human communications, trying to devise artificial systems able to recognise an amount of signals sent through the body (consciously or unconsciously) to other speakers that express emotions or personality traits, and are called social signals. Social signals can be defined as temporal co-occurences of social cues, that can be basically defined as a set of temporally sequenced changes in neuromuscular, neurocognitive and neurophysiological activity. Social cues are organized into five categories: physical appearance, gesture and posture, face and eyes behavior, vocal behavior, and space and environment. The analysis of the social cues in the vocal behavior category is one of the issues most related to pattern recognition and machine learning themes. In general, this analysis consists in evaluating all the spoken cues that surround the verbal message and influence its actual meaning, characterizing, for example, particular social roles e.g.,dominance. In this thesis, we illustrate an automatic system based on a generative structure able to analyze conversational scenarios. The generative model is composed by integrating a Gaussian mixture model and the Observed Influence Model (OIM), and it is fed with a novel kind of simple low-level auditory social signals, which are termed steady conversational periods (SCPs). These are built on duration of continuous slots of silence or speech, taking also into account conversational turn-taking. The interactional dynamics built upon the transitions among SCPs provide a behavioral blueprint of conversational settings without relying on segmental or continuous phonetic features. Our contribution is to show the effectiveness of our model when applied on dialogs classification and clustering tasks,considering dialog scenarios characterized by several social situations i.e, the age of the speakers, the conversational mood, and the presence/absence of speakers language disorder (Asperger syndrome), showing excellent performances also in comparison with state-of-the-art frameworks.
Statistical Analysis of Interactional Patterns: a Social Signal Processing Perspective
PESARIN, Anna
2014
Abstract
In the last few years, there is a growing interest in the analysis of human-human communications, trying to devise artificial systems able to recognise an amount of signals sent through the body (consciously or unconsciously) to other speakers that express emotions or personality traits, and are called social signals. Social signals can be defined as temporal co-occurences of social cues, that can be basically defined as a set of temporally sequenced changes in neuromuscular, neurocognitive and neurophysiological activity. Social cues are organized into five categories: physical appearance, gesture and posture, face and eyes behavior, vocal behavior, and space and environment. The analysis of the social cues in the vocal behavior category is one of the issues most related to pattern recognition and machine learning themes. In general, this analysis consists in evaluating all the spoken cues that surround the verbal message and influence its actual meaning, characterizing, for example, particular social roles e.g.,dominance. In this thesis, we illustrate an automatic system based on a generative structure able to analyze conversational scenarios. The generative model is composed by integrating a Gaussian mixture model and the Observed Influence Model (OIM), and it is fed with a novel kind of simple low-level auditory social signals, which are termed steady conversational periods (SCPs). These are built on duration of continuous slots of silence or speech, taking also into account conversational turn-taking. The interactional dynamics built upon the transitions among SCPs provide a behavioral blueprint of conversational settings without relying on segmental or continuous phonetic features. Our contribution is to show the effectiveness of our model when applied on dialogs classification and clustering tasks,considering dialog scenarios characterized by several social situations i.e, the age of the speakers, the conversational mood, and the presence/absence of speakers language disorder (Asperger syndrome), showing excellent performances also in comparison with state-of-the-art frameworks.File | Dimensione | Formato | |
---|---|---|---|
PhDTHESIS_AP.pdf
accesso solo da BNCF e BNCR
Dimensione
7.94 MB
Formato
Adobe PDF
|
7.94 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/114950
URN:NBN:IT:UNIVR-114950