The large-scale personal big data is collected at high speeds by smart devices, mainly in the form of sensor data. While this extensive type of data allows for the analysis and inference about the environment where it was collected (e.g., locations) and certain aspects of human behavior (e.g., physical activities and transportation modes), a major limitation of this type of data is that it is not thick, that is it does not carry the information about the context within which it was generated. However, contextual information, such as personal, social and also objective interpretation reflecting upon how and why people do what they do, must be explicitly represented for the data to be self-explanatory and meaningful. In this thesis, we introduce big-thick data as big data complemented with highly contextualized thick data. To generate big-thick data, we first represent it by defining life sequences as sequences of observation contexts over a certain period. Then, observation context is composed of two types of context, namely, personal context and reference context. The personal context encodes, for every single person, a subjective personal view of the world, while the reference context encodes an objective view of an all-observing third party. Finally, we model these two types of context and explore how they can be unified in many different ways (for specific purposes) into observation contexts. The big-thick data is generated by populating sequences of observation contexts in life sequences with various data as sequences of knowledge graphs. The life sequence representation and the flexible context unification allow for different types of enquires posing to big-thick data, enquiring about people’s habits, daily lives and the surrounding world; and also allow for multiple different (possibly contradicting) answers (provided by different people or even the same person at different times) to the same enquiry. This is the crucial property of big-thick data which we consider key to the development of meaningful human-in-the-loop human-machine interactions. Our work has been validated with a case study, where a SU2OSM big-thick dataset was generated by integrating the SmartUnitn Two (SU2) dataset with the Trentino region OpenStreetMap (OSM) dataset to construct students’ life sequences. The observation contexts in these life sequences unified the reference context populated by the OpenStreetMap data of the Trentino region with the personal contexts populated by the SmartUnitn2 dataset (consisting of time diaries and sensor data from one hundred and fifty-eight students at the University of Trento over a period of four weeks). The results demonstrated that the generated SU2OSM big-thick dataset is capable of answering a broader range of enquiries and exhibited superior prediction performance compared to the original datasets.

BIG-THICK DATA GENERATION VIA LIFE SEQUENCES

Li, Xiaoyue
2025

Abstract

The large-scale personal big data is collected at high speeds by smart devices, mainly in the form of sensor data. While this extensive type of data allows for the analysis and inference about the environment where it was collected (e.g., locations) and certain aspects of human behavior (e.g., physical activities and transportation modes), a major limitation of this type of data is that it is not thick, that is it does not carry the information about the context within which it was generated. However, contextual information, such as personal, social and also objective interpretation reflecting upon how and why people do what they do, must be explicitly represented for the data to be self-explanatory and meaningful. In this thesis, we introduce big-thick data as big data complemented with highly contextualized thick data. To generate big-thick data, we first represent it by defining life sequences as sequences of observation contexts over a certain period. Then, observation context is composed of two types of context, namely, personal context and reference context. The personal context encodes, for every single person, a subjective personal view of the world, while the reference context encodes an objective view of an all-observing third party. Finally, we model these two types of context and explore how they can be unified in many different ways (for specific purposes) into observation contexts. The big-thick data is generated by populating sequences of observation contexts in life sequences with various data as sequences of knowledge graphs. The life sequence representation and the flexible context unification allow for different types of enquires posing to big-thick data, enquiring about people’s habits, daily lives and the surrounding world; and also allow for multiple different (possibly contradicting) answers (provided by different people or even the same person at different times) to the same enquiry. This is the crucial property of big-thick data which we consider key to the development of meaningful human-in-the-loop human-machine interactions. Our work has been validated with a case study, where a SU2OSM big-thick dataset was generated by integrating the SmartUnitn Two (SU2) dataset with the Trentino region OpenStreetMap (OSM) dataset to construct students’ life sequences. The observation contexts in these life sequences unified the reference context populated by the OpenStreetMap data of the Trentino region with the personal contexts populated by the SmartUnitn2 dataset (consisting of time diaries and sensor data from one hundred and fifty-eight students at the University of Trento over a period of four weeks). The results demonstrated that the generated SU2OSM big-thick dataset is capable of answering a broader range of enquiries and exhibited superior prediction performance compared to the original datasets.
30-apr-2025
Inglese
Giunchiglia, Fausto
Università degli studi di Trento
TRENTO
140
File in questo prodotto:
File Dimensione Formato  
2025_PhD_Thesis_Xiaoyue_Li.pdf

accesso aperto

Dimensione 2.57 MB
Formato Adobe PDF
2.57 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/209470
Il codice NBN di questa tesi è URN:NBN:IT:UNITN-209470