This thesis investigates the following general constrained clustering problem: given a dimension $d$, an $L^p$-norm, a set $X\subset\R^d$, a positive integer $k$ and a finite set $\mathcal M\subset\N$, find the optimal $k$-partition $\{A_1,...,A_k\}$ of $X$ w.r.t. the $L^p$-norm satisfying $|A_i|\in \mathcal M$, $i=1,...,k$. First of all, we prove that the problem is NP-hard even if $k=2$ (for all $p>1$), or $d=2$ and $|\mathcal M|=2$ (with Euclidean norm). Moreover, we put in evidence that the problem is computationally hard if $p$ is a non-integer rational. When $d=2$, $k=2$ and $\mathcal M=\{m,n-m\}$, we design an algorithm for solving the problem in time $O(n\sqrt[3]m \log^2 n)$ in the case of Euclidean norm; this result relies on combinatorial geometry techniques concerning $k$-sets and dynamic convex hulls. Finally, we study the problem in fixed dimension $d$ with $k=2$; by means of tools of real algebraic geometry and numerical techniques for localising algebraic roots we construct a polynomial-time method for solving the constrained clustering problem with integer $p$ given in unary notation.

EXACT ALGORITHMS FOR SIZE CONSTRAINED CLUSTERING

LIN, JIANYI
2012

Abstract

This thesis investigates the following general constrained clustering problem: given a dimension $d$, an $L^p$-norm, a set $X\subset\R^d$, a positive integer $k$ and a finite set $\mathcal M\subset\N$, find the optimal $k$-partition $\{A_1,...,A_k\}$ of $X$ w.r.t. the $L^p$-norm satisfying $|A_i|\in \mathcal M$, $i=1,...,k$. First of all, we prove that the problem is NP-hard even if $k=2$ (for all $p>1$), or $d=2$ and $|\mathcal M|=2$ (with Euclidean norm). Moreover, we put in evidence that the problem is computationally hard if $p$ is a non-integer rational. When $d=2$, $k=2$ and $\mathcal M=\{m,n-m\}$, we design an algorithm for solving the problem in time $O(n\sqrt[3]m \log^2 n)$ in the case of Euclidean norm; this result relies on combinatorial geometry techniques concerning $k$-sets and dynamic convex hulls. Finally, we study the problem in fixed dimension $d$ with $k=2$; by means of tools of real algebraic geometry and numerical techniques for localising algebraic roots we construct a polynomial-time method for solving the constrained clustering problem with integer $p$ given in unary notation.
9-mar-2012
Inglese
clustering ; size contraints ; NP-hardness ; p-norm ; polynomial-time ; k-set ; dynamic convex hull ; cylindrical algebraic decomposition
BERTONI, ALBERTO
Università degli Studi di Milano
File in questo prodotto:
File Dimensione Formato  
phd_unimi_R07628.pdf

Open Access dal 22/09/2013

Dimensione 573.78 kB
Formato Adobe PDF
573.78 kB Adobe PDF Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/103211
Il codice NBN di questa tesi è URN:NBN:IT:UNIMI-103211