Well-established cognitive models coming from anthropology have shown that, due to the cognitive constraints that limit our ``bandwidth'' for social interactions, humans organize their social relations according to a regular structure. In the thesis, we postulate that similar regularities can be found in other cognitive processes, such as those involving language production. The thesis consists of three main parts. In the first part, we leverage a methodology similar to the one used to uncover social cognitive constraints applied to the domain of language. More specifically, we are interested in understanding how individuals unconsciously structure their vocabulary. In order to investigate this claim, we analyse a dataset containing tweets of a heterogeneous group of Twitter users (regular users and professional writers). We find that a concentric layered structure (which we call emph{ego network of words}, in analogy to the ego network of social relationships) very well captures how individuals organise the words they use.In the second part we carry out a semantic analysis of the model. Each ring of each ego network is described by a semantic profile, which captures the topics associated with the words in the ring. We find that the innermost ring, which contains the most frequently used words, can be seen as the semantic fingerprint of the whole model.In the third part, drawing inspiration from social ego networks where the active part includes relationships regularly nurtured by individuals, we establish the notion of an active ego network of words. We demonstrate that without the active network concept, an ego network becomes vulnerable to the amount of data considered, leading to the disappearance of the layered structure in larger datasets (we used an extended version of the Twitter/X dataset and MediaSum, a preexisting dataset containing a large amount of interview transcripts). To address this, we define a methodology for extracting the active part of the ego network of words and validating it. The resulting ego network structures align substantially with the layer ego network of words obtained in previous chapters where only the active network was implicitly covered, confirming the model's robustness across different dataset sizes. Moreover, the validation on the transcripts dataset (MediaSum) highlights the generalizability of the model across diverse domains and the ingrained cognitive constraints in language usage including spoken forms of communication.

Using data science to uncover cognitive constraints in human behavior beyond social interactions.

OLLIVIER, Kilian Frédéric Fabien
2024

Abstract

Well-established cognitive models coming from anthropology have shown that, due to the cognitive constraints that limit our ``bandwidth'' for social interactions, humans organize their social relations according to a regular structure. In the thesis, we postulate that similar regularities can be found in other cognitive processes, such as those involving language production. The thesis consists of three main parts. In the first part, we leverage a methodology similar to the one used to uncover social cognitive constraints applied to the domain of language. More specifically, we are interested in understanding how individuals unconsciously structure their vocabulary. In order to investigate this claim, we analyse a dataset containing tweets of a heterogeneous group of Twitter users (regular users and professional writers). We find that a concentric layered structure (which we call emph{ego network of words}, in analogy to the ego network of social relationships) very well captures how individuals organise the words they use.In the second part we carry out a semantic analysis of the model. Each ring of each ego network is described by a semantic profile, which captures the topics associated with the words in the ring. We find that the innermost ring, which contains the most frequently used words, can be seen as the semantic fingerprint of the whole model.In the third part, drawing inspiration from social ego networks where the active part includes relationships regularly nurtured by individuals, we establish the notion of an active ego network of words. We demonstrate that without the active network concept, an ego network becomes vulnerable to the amount of data considered, leading to the disappearance of the layered structure in larger datasets (we used an extended version of the Twitter/X dataset and MediaSum, a preexisting dataset containing a large amount of interview transcripts). To address this, we define a methodology for extracting the active part of the ego network of words and validating it. The resulting ego network structures align substantially with the layer ego network of words obtained in previous chapters where only the active network was implicitly covered, confirming the model's robustness across different dataset sizes. Moreover, the validation on the transcripts dataset (MediaSum) highlights the generalizability of the model across diverse domains and the ingrained cognitive constraints in language usage including spoken forms of communication.
26-gen-2024
Inglese
Scuola Normale Superiore
Esperti anonimi
File in questo prodotto:
File Dimensione Formato  
Tesi.pdf

accesso aperto

Dimensione 8.9 MB
Formato Adobe PDF
8.9 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/167635
Il codice NBN di questa tesi è URN:NBN:IT:SNS-167635