Open-source release of GEOMANCER

PiperOrigin-RevId: 318038391
2026-05-09 21:07:49 +08:00 · 2020-06-24 12:03:34 +01:00
parent dc96b58884
commit 4c1b8d1944
7 changed files with 760 additions and 0 deletions
@@ -24,6 +24,7 @@ https://deepmind.com/research/publications/

 ## Projects

+*   [Disentangling by Subspace Diffusion (GEOMANCER)](geomancer)
 *   [What can I do here? A theory of affordances in reinforcmenet learning](affordances_theory), ICML 2020
 *   [Scaling data-driven robotics with reward sketching and batch reinforcement learning](sketchy), RSS 2020
 *   [The Option Keyboard: Combining Skills in Reinforcement Learning](option_keyboard), NeurIPS 2019
@@ -0,0 +1,108 @@
+# Geometric Manifold Component Estimator (GEOMANCER)
+
+This package provides an implementation of the Geometric Manifold Component
+Estimator, or GEOMANCER, as published in [Disentangling by Subspace Diffusion
+(2020)](https://arxiv.org/abs/2006.12982). GEOMANCER is a nonparametric
+algorithm for disentangling, somewhat similar in spirit to Laplacian Eigenmaps
+or Vector Diffusion Maps, except instead of producing an embedding for the data,
+it produces a set of subspaces around each data point, one subspace for each
+disentangled factor of variation in the data. This differs from more common
+algorithms for disentangling that originated in the deep learning community,
+such as the beta-VAE, TCVAE or FactorVAE, which learn a nonlinear embedding and
+probabilistic generative model of the data. GEOMANCER is intended for data where
+the individual factors of variation might be more than one dimensional, for
+instance 3D rotations. At the moment, GEOMANCER works best when some ground
+truth information about the metric in the data space is available, for instance
+knowledge of the "true" nearest neighbors around each point, and we do not
+recommend running GEOMANCER directly on unstructured data from high-dimensional
+spaces. We are providing the code here to enable the interested researcher to
+get some hands-on experience with the ideas around differential geometry,
+holonomy and higher-order graph connection Laplacians we explore in the paper.
+
+
+## Installation
+
+To install the package locally in a new virtual environment run:
+```bash
+python3 -m venv geomancer
+source geomancer/bin/activate
+git clone https://github.com/deepmind/deepmind-research.git .
+cd deepmind-research/geomancer
+pip install -e .
+```
+
+## Example
+
+To run, simply load or generate an array of data, and call the `fit` function:
+
+```
+import numpy as np
+import geomancer
+
+# Generate data from a product of two spheres
+data = []
+for i in range(2):
+  foo = np.random.randn(1000, 3)
+  data.append(foo / np.linalg.norm(foo, axis=1, keepdims=True))
+data = np.concatenate(data, axis=1)
+
+# Run GEOMANCER. The underlying manifold is 4-dimensional.
+components, spectrum = geomancer.fit(data, 4)
+```
+
+If ground truth information about the tangent spaces is available in a space
+that is aligned with the data, then the performance can be evaluated using the
+`eval_aligned` function. If ground truth data is only available in an unaligned
+space, for instance if the embedding used to generate the data is not the same
+as the space in which the data is observed, then the `eval_unaligned` function
+can be used, which requires both the data and disentangled tangent vectors in
+the ground truth space. Examples of both evaluation metrics are given in the
+demo in `train.py`.
+
+
+## Demo on Synthetic Manifolds
+
+The file `train.py` runs GEOMANCER on a product of manifolds that can be
+specified by the user. The number of data points to train on is given by the
+`--npts` flag, while the specification of the manifold is given by the
+`--specification` flag. The `--rotate` flag specifies whether a random rotation
+should be applied to the data. If false, `eval_aligned` will be used to evaluate
+the result. If true, `eval_unaligned` will be used to evaluate the result.
+
+For instance, to run on the product of the sphere in 2 and 4 dimensions and the
+special orthogonal group in 3 dimensions, run:
+
+```
+python3 train.py --specification='S^2','S^4','SO(3)' --npts=100000
+```
+
+This passes a list of strings as the manifold specification flag. Note that a
+manifold this large will require a large amount of data to work and may require
+hours or days to run. The default example should run in just a few minutes.
+
+The demo plots 3 different outputs:
+1. The eigenvalue spectrum of the 2nd-order graph Laplacian. This should have
+a large gap in the spectrum at the eigenvalue equal to the number of
+submanifolds.
+2. The basis vectors for each disentangled subspace around one point.
+3. The ground truth basis vectors for the disentangled subspaces at the same
+point. If `--rotate=False`, and GEOMANCER has sufficient data, each basis matrix
+should span the same subspace as the results in the second plot.
+
+## Giving Credit
+
+If you use this code in your work, we ask you to cite this paper:
+
+```
+@article{pfau2020disentangling,
+  title={Disentangling by Subspace Diffusion},
+  author={Pfau, David and Higgins, Irina and Botev, Aleksandar and Racani\`ere,
+  S{\'e}bastian},
+  journal={arXiv preprint arXiv:2006.12982},
+  year={2020}
+}
+```
+
+## Disclaimer
+
+This is not an official Google product.
@@ -0,0 +1,318 @@
+# Copyright 2020 DeepMind Technologies Limited.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Code for the Geometric Manifold Component Estimator (GEOMANCER)."""
+
+import itertools
+
+from absl import logging
+
+import numpy as np
+import scipy
+import scipy.sparse
+import scipy.sparse.linalg
+
+from tqdm import tqdm
+
+
+def sym_op(x, zero_trace=False):
+  """Given X, makes L(A) = X @ A @ X' for symmetric matrices A.
+
+  If A is not symmetric, L(A) will return X @ (A_L + A_L') @ X' where A_L is
+  the lower triangular of A (with the diagonal divided by 2).
+
+  Args:
+    x: The matrix from which to construct the operator
+    zero_trace (optional): If true, restrict the operator to only act on
+      matrices with zero trace, effectively reducing the dimensionality by one.
+  Returns:
+    A matrix Y such that vec(L(A)) = Y @ vec(A).
+  """
+  n = x.shape[0]
+  # Remember to subtract off the diagonal once
+  xx = (np.einsum('ik,jl->ijkl', x, x) +
+        np.einsum('il,jk->ijkl', x, x) -
+        np.einsum('ik,jl,kl->ijkl', x, x, np.eye(n)))
+  xx = xx[np.tril_indices(n)]
+  xx = xx.transpose(1, 2, 0)
+  xx = xx[np.tril_indices(n)]
+  xx = xx.T
+  if zero_trace:
+    diag_idx = np.cumsum([0]+list(range(2, n)))
+    proj_op = np.eye(n*(n+1)//2)[:, :-1]
+    proj_op[-1, diag_idx] = -1
+    # multiply by operator that completes last element of diagonal
+    # for a zero-trace matrix
+    xx = xx @  proj_op
+    xx = xx[:-1]
+  return xx
+
+
+def vec_to_sym(x, n, zero_trace=False):
+  y = np.zeros((n, n))
+  if zero_trace:
+    x = np.append(x, 0.0)
+  y[np.tril_indices(n)] = x
+  y += y.T
+  y[np.diag_indices(n)] /= 2.0
+  if zero_trace:
+    y[-1, -1] = -np.trace(y)
+  return y
+
+
+def ffdiag(data, lr=1.0, tol=1e-10, verbose=False, eig_init=False):
+  """Orthogonal FFDiag algorithm of Ziehe et al 2004."""
+  n = data.shape[1]
+  k = data.shape[0]
+  c = data.copy()
+  if eig_init:
+    _, v = np.linalg.eig(data[0])
+    v = v.T
+    for i in range(k):
+      c[i] = v @ c[i] @ v.T
+  else:
+    v = np.eye(n)
+
+  err_ = np.inf
+  for t in range(10000):
+    w = np.zeros((n, n))
+    for i in range(n):
+      for j in range(i+1, n):
+        diag = c[:, i, i] - c[:, j, j]
+        w[i, j] = np.sum(c[:, i, j] * diag) / np.sum(diag ** 2)
+    w -= w.T
+    norm = np.linalg.svd(w, compute_uv=False).max()
+    if norm > lr:
+      w *= lr / norm
+    ew = scipy.linalg.expm(w)
+    v = ew @ v
+    for i in range(k):
+      c[i] = ew @ c[i] @ ew.T
+    cdiag = c.copy()
+    for i in range(n):
+      for j in range(k):
+        cdiag[j, i, i] = 0
+    err = np.linalg.norm(cdiag)
+    if verbose:
+      logging.info('Iter %d: %f', t, err)
+    if err_ - err < tol and err_ - err >= 0:
+      return v
+    err_ = err
+  return v
+
+
+def avg_angle_between_subspaces(xs, ys):
+  """Compute the error between two sets of subspaces."""
+  if len(xs) != len(ys):
+    return np.pi / 2  # largest possible angle
+  angles = []
+  for ys_perm in itertools.permutations(ys):
+    angles.append([])
+    for i in range(len(xs)):
+      if xs[i].shape[1] == ys_perm[i].shape[1]:
+        sigma = np.linalg.svd(xs[i].T @ ys_perm[i], compute_uv=False)
+        angles[-1].append(np.arccos(np.min(sigma)))
+      else:
+        angles[-1].append(np.pi / 2)
+  angles = np.array(angles)
+  return np.min(np.mean(angles, axis=1))
+
+
+def make_nearest_neighbors_graph(data, k, n=1000):
+  """Build exact k-nearest neighbors graph from numpy data.
+
+  Args:
+    data: Data to compute nearest neighbors of, each column is one point
+    k: number of nearest neighbors to compute
+    n (optional): number of neighbors to compute simultaneously
+
+  Returns:
+    A scipy sparse matrix in LIL format giving the symmetric nn graph.
+  """
+  shape = data.shape
+  assert shape[0] % n == 0
+  nbr_graph = scipy.sparse.lil_matrix((shape[0], shape[0]))
+  norm = np.sum(data**2, axis=1)
+  cols = np.meshgrid(np.arange(n), np.ones(k+1))[0]
+  for i in tqdm(range(0, shape[0], n)):
+    dot = data @ data[i:i+n].T
+    dists = np.sqrt(np.abs(norm[:, None] - 2*dot + norm[i:i+n][None, :]))
+    idx = np.argpartition(dists, k, axis=0)[:k+1]
+    nbrs = idx[np.argsort(dists[idx, cols], axis=0), cols][1:]
+    for j in range(n):
+      nbr_graph[i+j, nbrs[:, j]] = 1
+  # Symmetrize graph
+  for i in tqdm(range(shape[0])):
+    for j in nbr_graph.rows[i]:
+      if nbr_graph[j, i] == 0:
+        nbr_graph[j, i] = nbr_graph[i, j]
+  logging.info('Symmetrized neighbor graph')
+  return nbr_graph
+
+
+def make_tangents(data, neighbor_graph, k):
+  """Construct all tangent vectors for the dataset."""
+  tangents = np.zeros((data.shape[0], k, data.shape[1]), dtype=np.float32)
+  for i in tqdm(range(data.shape[0])):
+    diff = data[neighbor_graph.rows[i]] - data[i]
+    _, _, u = np.linalg.svd(diff, full_matrices=False)
+    tangents[i] = u[:k]
+  logging.info('Computed all tangents')
+  return tangents
+
+
+def make_connection(tangents, neighbor_graph):
+  """Make connection matrices for all edges of the neighbor graph."""
+  connection = {}
+  for i in tqdm(range(tangents.shape[0])):
+    for j in neighbor_graph.rows[i]:
+      if j > i:
+        uy, _, ux = np.linalg.svd(tangents[j] @ tangents[i].T,
+                                  full_matrices=False)
+        conn = uy @ ux
+        connection[(i, j)] = conn
+        connection[(j, i)] = conn.T
+  logging.info('Constructed all connection matrices')
+  return connection
+
+
+def make_laplacian(connection, neighbor_graph, sym=True, zero_trace=True):
+  """Make symmetric zero-trace second-order graph connection Laplacian."""
+  n = neighbor_graph.shape[0]
+  k = list(connection.values())[0].shape[0]
+  bsz = (k*(k+1)//2 - 1 if zero_trace else k*(k+1)//2) if sym else k**2
+  data = np.zeros((neighbor_graph.nnz + n, bsz, bsz), dtype=np.float32)
+  indptr = []
+  indices = np.zeros(neighbor_graph.nnz + n)
+  index = 0
+  for i in tqdm(range(n)):
+    indptr.append(index)
+    data[index] = len(neighbor_graph.rows[i]) * np.eye(bsz)
+    indices[index] = i
+    index += 1
+    for j in neighbor_graph.rows[i]:
+      if sym:
+        kron = sym_op(connection[(j, i)], zero_trace=zero_trace)
+      else:
+        kron = np.kron(connection[(j, i)], connection[(j, i)])
+      data[index] = -kron
+      indices[index] = j
+      index += 1
+  indptr.append(index)
+  indptr = np.array(indptr)
+
+  laplacian = scipy.sparse.bsr_matrix((data, indices, indptr),
+                                      shape=(n*bsz, n*bsz))
+  logging.info('Built 2nd-order graph connection Laplacian.')
+  return laplacian
+
+
+def cluster_subspaces(omega):
+  """Cluster different dimensions from the eigenvectors of the Laplacian."""
+  w = ffdiag(omega)  # simultaneous diagonalization
+  psi = np.zeros(omega.shape[:2])
+  for i in range(omega.shape[0]):
+    psi[i] = np.diag(w @ omega[i] @ w.T)  # compute diagonals
+  # Compute cosine similarity of diagonal vectors
+  psi_outer = psi.T @ psi
+  psi_diag = np.diag(psi_outer)
+  cos_similarity = psi_outer / np.sqrt(np.outer(psi_diag, psi_diag))
+  adj = cos_similarity > 0.5  # adjacency matrix for graph of clusters
+  # Use graph Laplacian to find cliques
+  # (though a greedy algorithm could work too)
+  lapl = np.diag(np.sum(adj, axis=0)) - adj  # graph Laplacian
+  d, v = np.linalg.eig(lapl)
+  # connected components of graph
+  cliques = np.abs(v[:, np.abs(d) < 1e-6]) > 1e-6
+  tangents = [w[cliques[:, i]] for i in range(sum(np.abs(d) < 1e-6))]
+  return tangents
+
+
+def fit(data, k, gamma=None, nnbrs=None, neig=10, shard_size=1000):
+  """The Geometric Manifold Component Estimator.
+
+  Args:
+    data: the dataset, a set of points sample from a product manifold.
+    k: the dimensionality of the manifold.
+    gamma (optional): the threshold in the spectrum at which to cut off the
+      number of submanifolds.
+    nnbrs (optional): number of neighbors to use for each point.
+    neig (optional): the total number of eigenvectors to compute.
+    shard_size (optional): the size of shard to use in knn computation.
+
+  Returns:
+    A list of lists of subspace bases, one list for each element of the dataset,
+    and the spectrum of the 2nd-order graph Laplacian.
+  """
+  if not nnbrs:
+    nnbrs = 2*k
+  neighbor_graph = make_nearest_neighbors_graph(data, nnbrs, n=shard_size)
+  tangents = make_tangents(data, neighbor_graph, k)
+  connection = make_connection(tangents, neighbor_graph)
+  laplacian = make_laplacian(connection, neighbor_graph)
+  eigvals, eigvecs = scipy.sparse.linalg.eigsh(laplacian, k=neig, which='SM')
+  logging.info('Computed bottom eigenvectors of 2nd-order Laplacian')
+  bsz = k*(k+1)//2 - 1  # Block size for the projected 2nd-order Laplacian
+  if gamma:
+    nm = np.argwhere(eigvals < gamma)[-1, 0] + 1
+  else:  # If no threshold is provided, just use the largest gap in the spectrum
+    nm = np.argmax(eigvals[1:] - eigvals[:-1]) + 1
+  eigvecs = eigvecs.reshape(data.shape[0], bsz, neig)
+  omega = np.zeros((nm, k, k), dtype=np.float32)
+  components = []
+  for i in tqdm(range(data.shape[0])):
+    for j in range(nm):
+      omega[j] = vec_to_sym(eigvecs[i, :, j], k, zero_trace=True)
+    components.append([tangents[i].T @ x.T for x in cluster_subspaces(omega)])
+  logging.info('GEOMANCER completed')
+  return components, eigvals
+
+
+def eval_aligned(tangents, true_tangents):
+  """Evaluation for aligned data."""
+  errors = np.zeros(len(tangents))
+  for i in tqdm(range(len(tangents))):
+    errors[i] = avg_angle_between_subspaces([gt[i] for gt in true_tangents],
+                                            tangents[i])
+  logging.info('Computed angles between ground truth and GEOMANCER results')
+  return errors
+
+
+def eval_unaligned(data, tangents, true_data, true_tangents, k=10, n=1000):
+  """Evaluation for unaligned data."""
+  logging.info('Evaluating unaligned data')
+  errors = np.zeros(data.shape[0])
+  nbrs = make_nearest_neighbors_graph(true_data, k=k, n=n)
+
+  for i in tqdm(range(data.shape[0])):
+    tangent = np.concatenate(tangents[i], axis=1)
+    true_tangent = np.concatenate([t[i] for t in true_tangents], axis=1)
+    dx_true = (true_data[nbrs.rows[i]] - true_data[i]) @ true_tangent
+    dx_result = (data[nbrs.rows[i]] - data[i]) @ tangent
+
+    # compute canonical correlations between the two dxs
+    xx = dx_true.T @ dx_true
+    yy = dx_result.T @ dx_result
+    xy = dx_true.T @ dx_result
+    xx_ = np.linalg.inv(xx)
+    yy_ = np.linalg.inv(yy)
+    foo = scipy.linalg.sqrtm(xx_) @ xy @ scipy.linalg.sqrtm(yy_)
+    u, _, v = np.linalg.svd(foo)
+
+    # project subspaces for results and ground truth into aligned space
+    proj = [v @ tangent.T @ s for s in tangents[i]]
+    true_proj = [u.T @ true_tangent.T @ s[i] for s in true_tangents]
+    errors[i] = avg_angle_between_subspaces(proj, true_proj)
+  return errors
+
@@ -0,0 +1,82 @@
+# Copyright 2020 DeepMind Technologies Limited.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Tests for the Geometric Manifold Component Estimator (GEOMANCER)."""
+
+from absl.testing import absltest
+from absl.testing import parameterized
+
+import numpy as np
+
+from geomancer import geomancer
+
+
+class GeomancerTest(parameterized.TestCase):
+
+  @parameterized.parameters(
+      {'zero_trace': False},
+      {'zero_trace': True})
+  def test_sym_op(self, zero_trace):
+    """sym_op on tril(X) gives same result as QXQ' for symmetric X?"""
+    n = 5
+    x = np.random.randn(n, n)
+    x += x.T
+    if zero_trace:
+      np.fill_diagonal(x, np.diag(x)-np.trace(x)/n)
+    q, _ = np.linalg.qr(np.random.randn(n, n))
+    sym_q = geomancer.sym_op(q, zero_trace=zero_trace)
+    tril_x = x[np.tril_indices(n)]
+    if zero_trace:
+      tril_x = tril_x[:-1]
+    vec_y = sym_q @ tril_x
+    y = q @ x @ q.T
+    y_ = geomancer.vec_to_sym(vec_y, n, zero_trace=zero_trace)
+    np.testing.assert_allclose(y_, y)
+
+  def test_ffdiag(self):
+    k = 2
+    n = 5
+    w, _ = np.linalg.qr(np.random.randn(n, n))
+    psi = np.random.randn(k, n)
+    a = np.zeros((k, n, n))
+    for i in range(k):
+      a[i] = w @ np.diag(psi[i]) @ w.T
+    w_ = geomancer.ffdiag(a)
+    for i in range(k):
+      x = w_ @ a[i] @ w_.T
+      diag = np.diag(x).copy()
+      np.fill_diagonal(x, 1.0)
+      # check that x is diagonal
+      np.testing.assert_allclose(x, np.eye(n), rtol=1e-10, atol=1e-10)
+      self.assertTrue(np.all(np.min(
+          np.abs(diag[None, :] - psi[i][:, None]), axis=0) < 1e-10))
+
+  def test_make_nearest_neighbor_graph(self):
+    n = 100
+    # make points on a circle
+    data = np.zeros((n, 2))
+    for i in range(n):
+      data[i, 0] = np.sin(i*2*np.pi/n)
+      data[i, 1] = np.cos(i*2*np.pi/n)
+    graph = geomancer.make_nearest_neighbors_graph(data, 4, n=10)
+    for i in range(n):
+      self.assertLen(graph.rows[i], 4)
+      self.assertIn((i+1) % n, graph.rows[i])
+      self.assertIn((i+2) % n, graph.rows[i])
+      self.assertIn((i-1) % n, graph.rows[i])
+      self.assertIn((i-2) % n, graph.rows[i])
+
+
+if __name__ == '__main__':
+  absltest.main()
@@ -0,0 +1,21 @@
+#!/bin/sh
+# Copyright 2020 DeepMind Technologies Limited.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+python3 -m venv geomancer-venv
+source geomancer-venv/bin/activate
+pip3 install .
+python3 geomancer_test.py
+python3 train.py
+deactivate
@@ -0,0 +1,35 @@
+# Copyright 2020 DeepMind Technologies Limited
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     https://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Setup for pip package."""
+
+from setuptools import find_packages
+from setuptools import setup
+
+
+REQUIRED_PACKAGES = ['numpy', 'scipy', 'matplotlib', 'absl-py', 'tqdm']
+
+setup(
+    name='geomancer',
+    version='0.1',
+    description='A library for the Geometric Manifold Component Estimator.',
+    url='https://github.com/deepmind/deepmind-research/geomancer',
+    author='DeepMind',
+    author_email='pfau@google.com',
+    # Contained modules and scripts.
+    packages=find_packages(),
+    install_requires=REQUIRED_PACKAGES,
+    platforms=['any'],
+    license='Apache 2.0',
+)
@@ -0,0 +1,195 @@
+# Copyright 2020 DeepMind Technologies Limited.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Run GEOMANCER on products of synthetic manifolds."""
+
+import re
+
+from absl import app
+from absl import flags
+from absl import logging
+
+import geomancer
+
+from matplotlib import gridspec
+import matplotlib.pyplot as plt
+import numpy as np
+from scipy.stats import special_ortho_group
+from tqdm import tqdm
+
+flags.DEFINE_list('specification', ['S^2', 'S^2'], 'List of submanifolds')
+flags.DEFINE_integer('npts', 1000, 'Number of data points')
+flags.DEFINE_boolean('rotate', False, 'Apply random rotation to the data')
+
+FLAGS = flags.FLAGS
+
+
+def make_so_tangent(q):
+  """Given an n x n orthonormal matrix, return a basis for its tangent space."""
+  n = q.shape[0]
+  assert np.allclose(q.T @ q, np.eye(n), atol=1e-4, rtol=1e-4)
+  a = np.zeros((n, n))
+  ii = 0
+  dq = np.zeros((n, n, n*(n-1)//2))
+  for i in range(n):
+    for j in range(i+1, n):
+      a[i, j] = 1
+      a[j, i] = -1
+      dq[Ellipsis, ii] = a @ q  # tangent vectors are skew-symmetric matrix times Q
+      a[i, j] = 0
+      a[j, i] = 0
+      ii += 1
+  # reshape and orthonormalize the result
+  return np.linalg.qr(np.reshape(dq, (n**2, n*(n-1)//2)))[0]
+
+
+def make_sphere_tangent(x):
+  _, _, v = np.linalg.svd(x[None, :])
+  return v[:, 1:]
+
+
+def make_true_tangents(spec, data):
+  """Return a set of orthonormal bases, one for each submanifold."""
+  for i in range(spec.shape[1]):
+    assert spec[0, i] == 0 or spec[1, i] == 0
+  so_dim = sum(dim ** 2 for dim in spec[0])
+  sphere_dim = sum(dim+1 if dim > 0 else 0 for dim in spec[1])
+  assert so_dim + sphere_dim == data.shape[0]
+
+  ii = 0
+  tangents = []
+  for i in range(spec.shape[1]):
+    if spec[0, i] != 0:
+      dim = spec[0, i]
+      tangents.append(make_so_tangent(np.reshape(data[ii:ii+dim**2],
+                                                 (dim, dim))))
+      ii += dim ** 2
+    else:
+      dim = spec[1, i]
+      tangents.append(make_sphere_tangent(data[ii:ii+dim+1]))
+      ii += dim + 1
+
+  tangents2 = []
+  for i in range(len(tangents)):
+    size1 = sum(x.shape[0] for x in tangents[:i])
+    size2 = sum(x.shape[0] for x in tangents[i+1:])
+    tangents2.append(np.concatenate(
+        (np.zeros((size1, tangents[i].shape[1])),
+         tangents[i],
+         np.zeros((size2, tangents[i].shape[1]))), axis=0))
+  return tangents2
+
+
+def make_product_manifold(specification, npts):
+  """Generate data from a product of manifolds with the given specification."""
+  data = []
+  tangents = []
+  latent_dim = 0
+  spec_array = np.zeros((2, len(specification)), dtype=np.int32)
+  for i, spec in enumerate(specification):
+    so_spec = re.search(r'SO\(([0-9]+)\)', spec)  # matches "SO(<numbers>)"
+    sphere_spec = re.search(r'S\^([0-9]+)', spec)  # matches "S^<numbers>"
+
+    if sphere_spec is not None:
+      dim = int(sphere_spec.group(1))
+      spec_array[1, i] = dim
+      latent_dim += dim
+      dat = np.random.randn(npts, dim+1)
+      dat /= np.tile(np.sqrt(np.sum(dat**2, axis=1)[Ellipsis, None]),
+                     [1, dim+1])
+    elif so_spec is not None:
+      dim = int(so_spec.group(1))
+      spec_array[0, i] = dim
+      latent_dim += dim * (dim - 1) // 2
+      dat = [np.ndarray.flatten(special_ortho_group.rvs(dim), order='C')
+             for _ in range(npts)]
+      dat = np.stack(dat)
+    else:
+      raise ValueError(f'Unrecognized manifold: {spec}')
+    data.append(dat)
+  data = np.concatenate(data, axis=1)
+
+  for i in range(spec_array.shape[1]):
+    if spec_array[0, i] != 0:
+      dim = spec_array[0, i]
+      tangents.append(np.zeros((npts, data.shape[1], dim * (dim - 1) // 2)))
+    elif spec_array[1, i] != 0:
+      dim = spec_array[1, i]
+      tangents.append(np.zeros((npts, data.shape[1], dim)))
+
+  for i in tqdm(range(npts)):
+    true_tangent = make_true_tangents(spec_array, data[i])
+    for j in range(len(specification)):
+      tangents[j][i] = true_tangent[j]
+  logging.info('Constructed data and true tangents for %s',
+               ' x '.join(specification))
+  return data, latent_dim, tangents
+
+
+def main(_):
+  # Generate data and run GEOMANCER
+  data, dim, tangents = make_product_manifold(FLAGS.specification, FLAGS.npts)
+  if FLAGS.rotate:
+    rot, _ = np.linalg.qr(np.random.randn(data.shape[1], data.shape[1]))
+    data_rot = data @ rot.T
+    components, spectrum = geomancer.fit(data_rot, dim)
+    errors = geomancer.eval_unaligned(data_rot, components, data, tangents)
+  else:
+    components, spectrum = geomancer.fit(data, dim)
+    errors = geomancer.eval_aligned(components, tangents)
+
+  # Plot spectrum
+  plt.figure(figsize=(8, 6))
+  plt.scatter(np.arange(len(spectrum)), spectrum, s=100)
+  largest_gap = np.argmax(spectrum[1:]-spectrum[:-1]) + 1
+  plt.axvline(largest_gap, linewidth=2, c='r')
+  plt.xticks([])
+  plt.yticks(fontsize=18)
+  plt.xlabel('Index', fontsize=24)
+  plt.ylabel('Eigenvalue', fontsize=24)
+  plt.title('GeoManCEr Eigenvalue Spectrum', fontsize=24)
+
+  # Plot subspace bases
+  fig = plt.figure(figsize=(8, 6))
+  bases = components[0]
+  gs = gridspec.GridSpec(1, len(bases),
+                         width_ratios=[b.shape[1] for b in bases])
+  for i in range(len(bases)):
+    ax = plt.subplot(gs[i])
+    ax.imshow(bases[i])
+    ax.set_xticks([])
+    ax.set_yticks([])
+    ax.set_title(r'$T_{\mathbf{x}_1}\mathcal{M}_%d$' % (i+1), fontsize=18)
+  fig.canvas.set_window_title('GeoManCEr Results')
+
+  # Plot ground truth
+  fig = plt.figure(figsize=(8, 6))
+  gs = gridspec.GridSpec(1, len(tangents),
+                         width_ratios=[b.shape[2] for b in tangents])
+  for i, spec in enumerate(FLAGS.specification):
+    ax = plt.subplot(gs[i])
+    ax.imshow(tangents[i][0])
+    ax.set_xticks([])
+    ax.set_yticks([])
+    ax.set_title(r'$T_{\mathbf{x}_1}%s$' % spec, fontsize=18)
+  fig.canvas.set_window_title('Ground Truth')
+
+  logging.info('Error between subspaces: %.2f +/- %.2f radians',
+               np.mean(errors),
+               np.std(errors))
+  plt.show()
+
+
+if __name__ == '__main__':
+  app.run(main)