This is the third of a series of posts on cluster-algorithms and ideas in data analysis.
Mapper is a construction that uses a given cluster-algorithm to associate a simplicial complex to a reference map on a given data set. It was introduced by Carlsson--Mémoli--Singh in [1] and lies at the core of the of the topological data analysis startup Ayasdi. A good reference, I personally enjoyed reading, is [2]. Mapper is closely related to the Reeb graph of a real-valued function on a topological space. Just as the Reeb graph it is able to capture certain topological and geometrical features of the underlying space.
Given a family of open sets \mathcal{U}=\{ U_i \}_I we define another family \pi_*(\mathcal{U}) of open sets by applying \pi to each set U_i and collecting the resulting clusters, i.e. we define \pi_*(\mathcal{U}) := \bigcup_{i \in I} \pi(U_i). Finally we have everything in place to define the Mapper-construction.
Mapper is a construction that uses a given cluster-algorithm to associate a simplicial complex to a reference map on a given data set. It was introduced by Carlsson--Mémoli--Singh in [1] and lies at the core of the of the topological data analysis startup Ayasdi. A good reference, I personally enjoyed reading, is [2]. Mapper is closely related to the Reeb graph of a real-valued function on a topological space. Just as the Reeb graph it is able to capture certain topological and geometrical features of the underlying space.
The nerve of a cover
Let X be a topological space. We associate to each open cover \mathcal{U}=\{U_i\}_I a simplicial complex \check N(\mathcal{U}) called the nerve of \mathcal{U} defined as follows: there is a vertex i for each U_i, and a k-simplex spanned by i_0,...,i_k whenever the intersection U_{i_0} \cap \ldots \cap U_{i_k} of the corresponding sets is nonempty.Reference maps, filters or lenses
Let f:X \to Y be a continuous map between topological spaces X and Y. Any open cover \mathcal{V} of Y induces an open cover f^*\mathcal{V} of X obtained by the pullback under f, i.e. f^*\mathcal{V} := \big\{ f^{-1}(V) \ | \ {V \in \mathcal{V}} \big\}. For example set Y = \mathbb{R} and think of f as a height function on X. In the context of the Mapper construction these maps are called filters or lenses.Cluster functions
For a given space X let \mathcal{P}(X) denote its power set. Then we call an assignment X \leadsto \pi(X), that associates to a space X a family \pi(X) \subset \mathcal{P}(X) of subsets of X, a cluster function or cluster algorithm -- note that this is not really a function in the mathematical sense, but rather in the sense of computer science and programming. We refer to an element in \pi(X) as a cluster. We don't require \pi(X) to satisfy any properties at that point. From a programming perspective think of it as the signature of a function implementing a certain cluster algorithm.Given a family of open sets \mathcal{U}=\{ U_i \}_I we define another family \pi_*(\mathcal{U}) of open sets by applying \pi to each set U_i and collecting the resulting clusters, i.e. we define \pi_*(\mathcal{U}) := \bigcup_{i \in I} \pi(U_i). Finally we have everything in place to define the Mapper-construction.
Put it all together
Let X be a topological space (the data set), \pi be a cluster algorithm, f:X \to Y a reference map to a topological space Y, and \mathcal{V} an open cover of Y. Then the result of Mapper applied to this triple of data is the simplicial complex \mathcal{M}(\pi, f, \mathcal{V}) defined by \mathcal{M}(\pi, f, \mathcal{V}) := \check{N} \big( \pi_*(f^*\mathcal{V}) \big).References
[1] G. Carlsson, F. Mémoli and G. Singh, Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition, Eurographics Symposium on Point Based Graphics (2007), pp. 91--100.[2] G. Carlsson, Topology and data, Bull. Amer. Math. Soc. 46 (2009), pp. 255--308.
No comments :
Post a Comment