Emotions have an important role in daily life, influence decision-making, human interaction, perception, attention, self-regulation. They have been studied since ancient times, philosophers have been always interested in analyzing human nature and bodily sensations, psychologists in studying the physical and psychological changes that influence thought and behavior. In the early 1970s, the psychologist Paul Ekman defined six universal emotions, namely anger, disgust, fear, happiness, sadness, and surprise. This categorization has been taken into account for several studies. In the late 1990s, Affective Computing was born, a new discipline spanning between computer science, psychology, and cognitive science. Affective Computing aims at developing intelligent systems able to recognize, interpret, process, and simulate human emotions. It has a wide range of applications, as healthcare, education, games, entertainment, marketing, automated driver assistance, robotics, and many others. Emotions can be detected from different channels, such as facial expressions, body gestures, speech, text, physiological signals. In order to enrich human-machine interaction, the machine should be able to perform tasks similar to humans, such as recognizing facial expressions, detecting emotions from what it is said (text) and from how it said (audio), and it should be able also to express its own emotions. With the great success of deep learning, deep architectures have been employed also for many Affective Computing tasks. In this thesis, thinking about an emotional and intelligent agent, a detailed study of emotions has been carried out using deep learning techniques for various tasks, such as facial expression recognition, text and speech emotion recognition, and facial expression generation. Nevertheless, deep learning methods to properly perform in general require a great computing power and large collections of labeled data. To overcome these limitations we exploit the framework of Learning from Constraints, which needs few supervised data and enables to exploit a great quantity of unsupervised data, which are easier to collect. Furthermore, such approach integrates low-level tasks processing sensorial data and reasoning using higher-level semantic knowledge, so allowing machines to behave in an intelligent way in real complex environments. These conditions are reached requiring the satisfaction of a set of constraints during the learning process. In this way a task is translated into a constrained satisfaction problem. In our case, considering that knowledge could not be always perfect, the constraints are softly injected into the learning problem, so allowing some slight violations for some inputs. In this work different constraints have been employed in order to exploit knowledge that we have on the problem. In facial expression recognition, a predictor that detects emotions from the full face is enforced by three coherence constraints. One exploits the temporal sequence of the expression, another relates different face sub-parts (eyes, nose, mouth, eyebrows, jaw), and the last relates two feature representations. In text emotion recognition First Order Logic (FOL)-based constraints are used to exploit a great quantity of unlabeled data and data labeled with Facebook reactions. In facial expression generation cyclic-consistency FOL constraints are employed to translate a neutral face into a specific expression, and other logical rules are used to decide what emotion to generate putting together inputs coming from different channels. Finally, some logical constraints are proposed to develop a system that recognizes emotion from speech, and we built an Italian dataset that might be helpful to implement such model.

Constrained Affective Computing

2021

Abstract

Emotions have an important role in daily life, influence decision-making, human interaction, perception, attention, self-regulation. They have been studied since ancient times, philosophers have been always interested in analyzing human nature and bodily sensations, psychologists in studying the physical and psychological changes that influence thought and behavior. In the early 1970s, the psychologist Paul Ekman defined six universal emotions, namely anger, disgust, fear, happiness, sadness, and surprise. This categorization has been taken into account for several studies. In the late 1990s, Affective Computing was born, a new discipline spanning between computer science, psychology, and cognitive science. Affective Computing aims at developing intelligent systems able to recognize, interpret, process, and simulate human emotions. It has a wide range of applications, as healthcare, education, games, entertainment, marketing, automated driver assistance, robotics, and many others. Emotions can be detected from different channels, such as facial expressions, body gestures, speech, text, physiological signals. In order to enrich human-machine interaction, the machine should be able to perform tasks similar to humans, such as recognizing facial expressions, detecting emotions from what it is said (text) and from how it said (audio), and it should be able also to express its own emotions. With the great success of deep learning, deep architectures have been employed also for many Affective Computing tasks. In this thesis, thinking about an emotional and intelligent agent, a detailed study of emotions has been carried out using deep learning techniques for various tasks, such as facial expression recognition, text and speech emotion recognition, and facial expression generation. Nevertheless, deep learning methods to properly perform in general require a great computing power and large collections of labeled data. To overcome these limitations we exploit the framework of Learning from Constraints, which needs few supervised data and enables to exploit a great quantity of unsupervised data, which are easier to collect. Furthermore, such approach integrates low-level tasks processing sensorial data and reasoning using higher-level semantic knowledge, so allowing machines to behave in an intelligent way in real complex environments. These conditions are reached requiring the satisfaction of a set of constraints during the learning process. In this way a task is translated into a constrained satisfaction problem. In our case, considering that knowledge could not be always perfect, the constraints are softly injected into the learning problem, so allowing some slight violations for some inputs. In this work different constraints have been employed in order to exploit knowledge that we have on the problem. In facial expression recognition, a predictor that detects emotions from the full face is enforced by three coherence constraints. One exploits the temporal sequence of the expression, another relates different face sub-parts (eyes, nose, mouth, eyebrows, jaw), and the last relates two feature representations. In text emotion recognition First Order Logic (FOL)-based constraints are used to exploit a great quantity of unlabeled data and data labeled with Facebook reactions. In facial expression generation cyclic-consistency FOL constraints are employed to translate a neutral face into a specific expression, and other logical rules are used to decide what emotion to generate putting together inputs coming from different channels. Finally, some logical constraints are proposed to develop a system that recognizes emotion from speech, and we built an Italian dataset that might be helpful to implement such model.
2021
Inglese
Marco Gori
Università degli Studi di Firenze
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/132007
Il codice NBN di questa tesi è URN:NBN:IT:UNIFI-132007