Given a dataset, traditional clustering algorithms often only provide a single partitioning or a single view of the dataset. On complex tasks, many different clusterings of a dataset exist, thus alternative clusterings which are of high quality and different from given trivial clusterings are asked to have complementary views. The task is therefore a clear multi-objective optimization problem. However, most approaches in the literature optimize these objectives sequentially (one after another one) or indirectly (by some heuristic combination). This can result in solutions which are not Pareto- optimal. The problem is even more difficult for high-dimensional datasets as clusters can be located in various subspaces of the original feature space. Besides, many practical applications require that subspace clusters can still overlap but the overlap must be below a predefined threshold. Nonetheless, most of the state-of-the-art subspace clustering algorithms can only generate a set of disjoint or significantly overlapping subspace clusters. To deal with the above issues, for full-space alternative clustering, we develop an algorithm which fully acknowledges the multiple objectives, optimizes them directly and simultaneously, and produces solutions approximating the Pareto front. As for non-redundant subspace clustering, we propose a general framework for generating K overlapping subspace clusters where the maximum overlap between them is guaranteed to be below a predefined threshold. In both cases, our algorithms can be applied for several domains as different analyzing models can be used without modifying the main parts of the algorithms.

Non-Redundant Overlapping Clustering: Algorithms and Applications

Truong, Duy Tin
2013

Abstract

Given a dataset, traditional clustering algorithms often only provide a single partitioning or a single view of the dataset. On complex tasks, many different clusterings of a dataset exist, thus alternative clusterings which are of high quality and different from given trivial clusterings are asked to have complementary views. The task is therefore a clear multi-objective optimization problem. However, most approaches in the literature optimize these objectives sequentially (one after another one) or indirectly (by some heuristic combination). This can result in solutions which are not Pareto- optimal. The problem is even more difficult for high-dimensional datasets as clusters can be located in various subspaces of the original feature space. Besides, many practical applications require that subspace clusters can still overlap but the overlap must be below a predefined threshold. Nonetheless, most of the state-of-the-art subspace clustering algorithms can only generate a set of disjoint or significantly overlapping subspace clusters. To deal with the above issues, for full-space alternative clustering, we develop an algorithm which fully acknowledges the multiple objectives, optimizes them directly and simultaneously, and produces solutions approximating the Pareto front. As for non-redundant subspace clustering, we propose a general framework for generating K overlapping subspace clusters where the maximum overlap between them is guaranteed to be below a predefined threshold. In both cases, our algorithms can be applied for several domains as different analyzing models can be used without modifying the main parts of the algorithms.
2013
Inglese
Battiti, Roberto
Università degli studi di Trento
TRENTO
143
File in questo prodotto:
File Dimensione Formato  
phdThesis.pdf

accesso aperto

Dimensione 2.74 MB
Formato Adobe PDF
2.74 MB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/59539
Il codice NBN di questa tesi è URN:NBN:IT:UNITN-59539