This thesis investigates the following general constrained clustering problem: given a dimension $d$, an $L^p$-norm, a set $X\subset\R^d$, a positive integer $k$ and a finite set $\mathcal M\subset\N$, find the optimal $k$-partition $\{A_1,...,A_k\}$ of $X$ w.r.t. the $L^p$-norm satisfying $|A_i|\in \mathcal M$, $i=1,...,k$. First of all, we prove that the problem is NP-hard even if $k=2$ (for all $p>1$), or $d=2$ and $|\mathcal M|=2$ (with Euclidean norm). Moreover, we put in evidence that the problem is computationally hard if $p$ is a non-integer rational. When $d=2$, $k=2$ and $\mathcal M=\{m,n-m\}$, we design an algorithm for solving the problem in time $O(n\sqrt[3]m \log^2 n)$ in the case of Euclidean norm; this result relies on combinatorial geometry techniques concerning $k$-sets and dynamic convex hulls. Finally, we study the problem in fixed dimension $d$ with $k=2$; by means of tools of real algebraic geometry and numerical techniques for localising algebraic roots we construct a polynomial-time method for solving the constrained clustering problem with integer $p$ given in unary notation.
EXACT ALGORITHMS FOR SIZE CONSTRAINED CLUSTERING
LIN, JIANYI
2012
Abstract
This thesis investigates the following general constrained clustering problem: given a dimension $d$, an $L^p$-norm, a set $X\subset\R^d$, a positive integer $k$ and a finite set $\mathcal M\subset\N$, find the optimal $k$-partition $\{A_1,...,A_k\}$ of $X$ w.r.t. the $L^p$-norm satisfying $|A_i|\in \mathcal M$, $i=1,...,k$. First of all, we prove that the problem is NP-hard even if $k=2$ (for all $p>1$), or $d=2$ and $|\mathcal M|=2$ (with Euclidean norm). Moreover, we put in evidence that the problem is computationally hard if $p$ is a non-integer rational. When $d=2$, $k=2$ and $\mathcal M=\{m,n-m\}$, we design an algorithm for solving the problem in time $O(n\sqrt[3]m \log^2 n)$ in the case of Euclidean norm; this result relies on combinatorial geometry techniques concerning $k$-sets and dynamic convex hulls. Finally, we study the problem in fixed dimension $d$ with $k=2$; by means of tools of real algebraic geometry and numerical techniques for localising algebraic roots we construct a polynomial-time method for solving the constrained clustering problem with integer $p$ given in unary notation.File | Dimensione | Formato | |
---|---|---|---|
phd_unimi_R07628.pdf
Open Access dal 22/09/2013
Dimensione
573.78 kB
Formato
Adobe PDF
|
573.78 kB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/103211
URN:NBN:IT:UNIMI-103211