This is the second of a series of posts on cluster-algorithms and ideas in data analysis (and related fields).
Ordering points to identify the clustering structure (OPTICS) is
a data clustering algorithm presented by Mihael Ankerst, Markus M. Breunig,
Hans-Peter Kriegel and Jörg Sander in 1999 [1]. It builds on the ideas of DBSCAN, which I described in a previous
post. But let's cut the intro and dive right into it...
Let
(X,d) be a finite discrete metric space, i.e.~let
X be a data set on
which we have a notion of similarity expressed in terms of a suitable distance
function
d.
Given a positive integer
m \in \mathbb{N} we can
define a notion of
density on points of our data set, which in turn can
be used to define a
perturbed metric. Given a starting point
x_0 \in X the
OPTICS-algorithm iteratively defines a linear ordering on
X in terms of a filtration
X_0 \subset X_1 \subset \ldots \subset X_n of
X, where
X_{n+1} is obtained
from
X_n by appending the closest element (with respect to the perturbed metric) in its complement.
The original OPTICS-algorithm also takes an additional parameter
\varepsilon > 0. However
this is only needed to reduce the time complexity and is ignored in our discussion for now
-- to be more precise, we implicitly set
\varepsilon = \infty.