This is the second of a series of posts on cluster-algorithms and ideas in data analysis (and related fields).
Ordering points to identify the clustering structure (OPTICS) is
a data clustering algorithm presented by Mihael Ankerst, Markus M. Breunig,
Hans-Peter Kriegel and Jörg Sander in 1999 [1]. It builds on the ideas of DBSCAN, which I described in a previous
post. But let's cut the intro and dive right into it...
Let $(X,d)$ be a finite discrete metric space, i.e.~let $X$ be a data set on
which we have a notion of similarity expressed in terms of a suitable distance
function $d$.
Given a positive integer $m \in \mathbb{N}$ we can
define a notion of
density on points of our data set, which in turn can
be used to define a
perturbed metric. Given a starting point $x_0 \in X$ the
OPTICS-algorithm iteratively defines a linear ordering on $X$ in terms of a filtration
$X_0 \subset X_1 \subset \ldots \subset X_n$ of $X$, where $X_{n+1}$ is obtained
from $X_n$ by appending the closest element (with respect to the perturbed metric) in its complement.
The original OPTICS-algorithm also takes an additional parameter $\varepsilon > 0$. However
this is only needed to reduce the time complexity and is ignored in our discussion for now
-- to be more precise, we implicitly set $\varepsilon = \infty$.