What this thesis proposes is a new type of crowd analysis in computer vision, focused on the spectator crowd, that is, people "interested in watching something specific that they came to see". Typical scenarios of spectator crowds are stadiums, amphitheaters, classrooms, etc., and they share some aspects with classical crowd monitoring; for instance, since many people are simultaneously observed, per-person analysis is hard; however, in the considered cases, the dynamics of humans is more constrained, due to the architectural environment in which they are situated; specifically, people are expected to stay in a fixed location most of the time, limiting their activities to applaud, watch, support/heckle the players or discuss with the neighbors. We start facing this challenge by following a social signal processing approach, which grounds computer vision techniques in social theories. More specifically, leveraging on social theories describing expressive bodily conduct, we will show interesting results on how it is possible to distinguish people behaviors by automatically detecting their social activities. In particular, we propose a novel dataset, the Spectators Hockey (S-Hock), which deals with 4 hockey matches recorded during an international tournament. A massive annotation has been carried out on the dataset, focusing on the spectators at different levels of detail: at a higher level, people have been labeled depending on the team they were supporting and on the acquaintance they have with spectators who sit close to them; going to the lower levels, standard pose information has been considered (regarding the head, the body), but also fine grained actions such as hands on hips, clapping hands, etc. The labeling has also been focused on the game field, allowing to relate what was going on in the match with the crowd behavior. This brought to more than 100 millions of annotations, useful for standard lowlevel applications as object counting, people detection and head pose estimation, but also for high-level tasks, as spectator categorization and event recognition. For all of these we provide protocols and baseline results, encouraging further research. In this general picture, this thesis has been devoted to demonstrate that a strong sociological background is necessary to deal with crowd analysis in general, but also to underline the need to explore a novel specific issue, namely spectator crowd, by developing approaches able to adapt to the peculiarities of this scenario, which is new in computer vision. We are confident that S-Hock and our studies may trigger the design of novel and effective approaches for the analysis of human behavior in crowded settings and environments.
Spectator crowd: a social signal processing perspective
CONIGLIARO, Davide
2016
Abstract
What this thesis proposes is a new type of crowd analysis in computer vision, focused on the spectator crowd, that is, people "interested in watching something specific that they came to see". Typical scenarios of spectator crowds are stadiums, amphitheaters, classrooms, etc., and they share some aspects with classical crowd monitoring; for instance, since many people are simultaneously observed, per-person analysis is hard; however, in the considered cases, the dynamics of humans is more constrained, due to the architectural environment in which they are situated; specifically, people are expected to stay in a fixed location most of the time, limiting their activities to applaud, watch, support/heckle the players or discuss with the neighbors. We start facing this challenge by following a social signal processing approach, which grounds computer vision techniques in social theories. More specifically, leveraging on social theories describing expressive bodily conduct, we will show interesting results on how it is possible to distinguish people behaviors by automatically detecting their social activities. In particular, we propose a novel dataset, the Spectators Hockey (S-Hock), which deals with 4 hockey matches recorded during an international tournament. A massive annotation has been carried out on the dataset, focusing on the spectators at different levels of detail: at a higher level, people have been labeled depending on the team they were supporting and on the acquaintance they have with spectators who sit close to them; going to the lower levels, standard pose information has been considered (regarding the head, the body), but also fine grained actions such as hands on hips, clapping hands, etc. The labeling has also been focused on the game field, allowing to relate what was going on in the match with the crowd behavior. This brought to more than 100 millions of annotations, useful for standard lowlevel applications as object counting, people detection and head pose estimation, but also for high-level tasks, as spectator categorization and event recognition. For all of these we provide protocols and baseline results, encouraging further research. In this general picture, this thesis has been devoted to demonstrate that a strong sociological background is necessary to deal with crowd analysis in general, but also to underline the need to explore a novel specific issue, namely spectator crowd, by developing approaches able to adapt to the peculiarities of this scenario, which is new in computer vision. We are confident that S-Hock and our studies may trigger the design of novel and effective approaches for the analysis of human behavior in crowded settings and environments.File | Dimensione | Formato | |
---|---|---|---|
thesis_Conigliaro.pdf
accesso solo da BNCF e BNCR
Dimensione
32.97 MB
Formato
Adobe PDF
|
32.97 MB | Adobe PDF |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/181665
URN:NBN:IT:UNIVR-181665