Data analytics is a key technology that enables modern enterprises across different sectors to analyze key performance indicators and to gain competitive advantage. In many cases the data that needs to be processed has a temporal extent. Analyzing the history of data has the potential to reveal deeper insights, e.g., trends can only be discovered when looking at historical data. In fact, time is ubiquitous, and it is difficult to think about applications in the real world that do not associate data with time in one way or another. In this thesis we explore, analyze, and extend the computation of temporal aggregation operators for data analytics from different perspectives. We show how to model, store, and query data with period timestamps in relational data warehouses to support efficient aggregation over time. We describe how to efficiently summarize the result of temporal aggregation, which can be very large, such that only the most significant changes are emphasized. To guide users in the construction of these summaries we provide the VISOR tool, which in an interactive way allows to evaluate summaries based on different error metrics. Data often has two or more time dimensions, and the correlation between these dimensions might be interesting for data analysts. We show how to use bitemporal aggregation to perform such analyses and provide several improvements for the computation of bitemporal aggregation. In particular, we show that, different from previous works, a combination of static and dynamic data structures significantly boosts the performance. To visualize the result of bitemporal aggregation we develop the HotPeriods tool. For several real-world datasets, we show how HotPeriods can be used to visually analyze correlations of period-timestamped data, similar to what scatter plots are used for point data.

Temporal Aggregation for Data Analytics

2020

Abstract

Data analytics is a key technology that enables modern enterprises across different sectors to analyze key performance indicators and to gain competitive advantage. In many cases the data that needs to be processed has a temporal extent. Analyzing the history of data has the potential to reveal deeper insights, e.g., trends can only be discovered when looking at historical data. In fact, time is ubiquitous, and it is difficult to think about applications in the real world that do not associate data with time in one way or another. In this thesis we explore, analyze, and extend the computation of temporal aggregation operators for data analytics from different perspectives. We show how to model, store, and query data with period timestamps in relational data warehouses to support efficient aggregation over time. We describe how to efficiently summarize the result of temporal aggregation, which can be very large, such that only the most significant changes are emphasized. To guide users in the construction of these summaries we provide the VISOR tool, which in an interactive way allows to evaluate summaries based on different error metrics. Data often has two or more time dimensions, and the correlation between these dimensions might be interesting for data analysts. We show how to use bitemporal aggregation to perform such analyses and provide several improvements for the computation of bitemporal aggregation. In particular, we show that, different from previous works, a combination of static and dynamic data structures significantly boosts the performance. To visualize the result of bitemporal aggregation we develop the HotPeriods tool. For several real-world datasets, we show how HotPeriods can be used to visually analyze correlations of period-timestamped data, similar to what scatter plots are used for point data.
2020
Inglese
Temporal data modeling
Temporal databases
Data analytics
Temporal aggregation
Gamper
Johann
Libera Università di Bolzano
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/128684
Il codice NBN di questa tesi è URN:NBN:IT:UNIBZ-128684