The detection of anomalies in real-time data streams has become a critical task across various domains, including finance, healthcare, marketing, and more. Traditional methods for anomaly detection often rely on static models that fail to adapt to changing patterns, leading to decreased accuracy and effectiveness. Furthermore, the complexity of modern data streams is characterized by high-dimensional, noisy data and concept drifts, making it challenging for traditional methods to detect anomalies with high accuracy and robustness. As a result, there is a growing need for innovative approaches. This research presents a set of innovative approaches linked by a common thread, they address the challenge of change, wether it comes from industrial machine processes aiming to monitor the condition of components, or from real life patterns. The first proposed approach, COndition moNitoring Detection via cORrelation-based norms (CONDOR), which uses matrix norm to detect changes in patterns describing machine behavior, leverages a system-agnostic framework to identify anomalies in industrial environments. CONDOR is designed to detect five well-defined patterns, including sudden transitions, fluctuations, and gradual shifts between stable conditions. The Frobenius norm helps in measuring the discrepancy between the current and past data correlations, allowing it to detect changes in machine behavior. The CONDOR approach was later extended into StreamCM, a hybrid framework incorporating an autoencoder model to gain explainability and interpretability for the results of the CONDOR. This extension is known as Hybrid Anomaly Detection. The third suggested algorithm, called Furaki is designed to address these challenges by providing an unsupervised approach to data drift detection in evolving data streams. Furaki consists of a developing binary tree, where each branch is responsible for processing a specific portion of the data stream. When data drift is detected, Furaki can split into more sub-trees, allowing it to focus on distinct patterns in the data. The proposed approaches offer several significant advantages over traditional data drift detection methods. Notably, they can dynamically adapt to changing data patterns by adjusting their structure in real-time. Furthermore, they provide users with a detailed understanding of the underlying data distribution, enabling the identification of clusters and behaviors that are essential for effective pattern detection. The evaluation of the proposed algorithms was conducted on several synthetic and real-world datasets. The results demonstrate the effectiveness of each method in detecting data drifts with high accuracy and robustness. This research finds its contributions to a deeper understanding of the complexities of anomaly detection in evolving data streams. It provides a novel approach to addressing changing patterns and distributions in the data, and demonstrates its effectiveness through rigorous evaluation and comparison with existing methods. The defined methodologies have several key implications for future research and development. First, they highlight the need for adaptive and dynamic approaches to anomaly detection that can handle evolving data distributions. Second, they demonstrate the effectiveness of combining clustering and density-based methods in addressing changing patterns and distributions in the data. Third, they provide a novel approach to designing and implementing efficient and scalable algorithms for anomaly detection. In conclusion, this paper presents an innovative approach to addressing the challenges posed by anomaly detection in evolving data streams. The proposed framework combines clustering and density-based methods to adapt to changing patterns and distributions in the data, making it a highly effective tool for anomaly detection in modern data-driven applications.

Unsupervised learning under the presence of data drift

NUCCI, Vincenzo
2025

Abstract

The detection of anomalies in real-time data streams has become a critical task across various domains, including finance, healthcare, marketing, and more. Traditional methods for anomaly detection often rely on static models that fail to adapt to changing patterns, leading to decreased accuracy and effectiveness. Furthermore, the complexity of modern data streams is characterized by high-dimensional, noisy data and concept drifts, making it challenging for traditional methods to detect anomalies with high accuracy and robustness. As a result, there is a growing need for innovative approaches. This research presents a set of innovative approaches linked by a common thread, they address the challenge of change, wether it comes from industrial machine processes aiming to monitor the condition of components, or from real life patterns. The first proposed approach, COndition moNitoring Detection via cORrelation-based norms (CONDOR), which uses matrix norm to detect changes in patterns describing machine behavior, leverages a system-agnostic framework to identify anomalies in industrial environments. CONDOR is designed to detect five well-defined patterns, including sudden transitions, fluctuations, and gradual shifts between stable conditions. The Frobenius norm helps in measuring the discrepancy between the current and past data correlations, allowing it to detect changes in machine behavior. The CONDOR approach was later extended into StreamCM, a hybrid framework incorporating an autoencoder model to gain explainability and interpretability for the results of the CONDOR. This extension is known as Hybrid Anomaly Detection. The third suggested algorithm, called Furaki is designed to address these challenges by providing an unsupervised approach to data drift detection in evolving data streams. Furaki consists of a developing binary tree, where each branch is responsible for processing a specific portion of the data stream. When data drift is detected, Furaki can split into more sub-trees, allowing it to focus on distinct patterns in the data. The proposed approaches offer several significant advantages over traditional data drift detection methods. Notably, they can dynamically adapt to changing data patterns by adjusting their structure in real-time. Furthermore, they provide users with a detailed understanding of the underlying data distribution, enabling the identification of clusters and behaviors that are essential for effective pattern detection. The evaluation of the proposed algorithms was conducted on several synthetic and real-world datasets. The results demonstrate the effectiveness of each method in detecting data drifts with high accuracy and robustness. This research finds its contributions to a deeper understanding of the complexities of anomaly detection in evolving data streams. It provides a novel approach to addressing changing patterns and distributions in the data, and demonstrates its effectiveness through rigorous evaluation and comparison with existing methods. The defined methodologies have several key implications for future research and development. First, they highlight the need for adaptive and dynamic approaches to anomaly detection that can handle evolving data distributions. Second, they demonstrate the effectiveness of combining clustering and density-based methods in addressing changing patterns and distributions in the data. Third, they provide a novel approach to designing and implementing efficient and scalable algorithms for anomaly detection. In conclusion, this paper presents an innovative approach to addressing the challenges posed by anomaly detection in evolving data streams. The proposed framework combines clustering and density-based methods to adapt to changing patterns and distributions in the data, making it a highly effective tool for anomaly detection in modern data-driven applications.
23-lug-2025
Inglese
POLINI, Andrea
RE, Barbara
PIANGERELLI, Marco
Università degli Studi di Camerino
File in questo prodotto:
File Dimensione Formato  
Nucci_PHD_Thesis.pdf

embargo fino al 23/07/2026

Licenza: Tutti i diritti riservati
Dimensione 12.35 MB
Formato Adobe PDF
12.35 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/362152
Il codice NBN di questa tesi è URN:NBN:IT:UNICAM-362152