diff --git a/geomancer/README.md b/geomancer/README.md index 4700794..cdcdf72 100644 --- a/geomancer/README.md +++ b/geomancer/README.md @@ -2,22 +2,24 @@ This package provides an implementation of the Geometric Manifold Component Estimator, or GEOMANCER, as described in [Disentangling by Subspace Diffusion -(2020)](https://arxiv.org/abs/2006.12982). GEOMANCER is a nonparametric -algorithm for disentangling, somewhat similar in spirit to Laplacian Eigenmaps -or Vector Diffusion Maps, except instead of producing an embedding for the data, -it produces a set of subspaces around each data point, one subspace for each -disentangled factor of variation in the data. This differs from more common -algorithms for disentangling that originated in the deep learning community, -such as the beta-VAE, TCVAE or FactorVAE, which learn a nonlinear embedding and -probabilistic generative model of the data. GEOMANCER is intended for data where -the individual factors of variation might be more than one dimensional, for -instance 3D rotations. At the moment, GEOMANCER works best when some ground -truth information about the metric in the data space is available, for instance -knowledge of the "true" nearest neighbors around each point, and we do not -recommend running GEOMANCER directly on unstructured data from high-dimensional -spaces. We are providing the code here to enable the interested researcher to -get some hands-on experience with the ideas around differential geometry, -holonomy and higher-order graph connection Laplacians we explore in the paper. +(2020)](https://arxiv.org/abs/2006.12982), as well as information about the +[Stanford 3D Objects for Disentangling (S3O4D) dataset](https://console.cloud.google.com/storage/browser/dm_s3o4d). +GEOMANCER is a nonparametric algorithm for disentangling, somewhat similar in +spirit to Laplacian Eigenmaps or Vector Diffusion Maps, except instead of +producing an embedding for the data, it produces a set of subspaces around each +data point, one subspace for each disentangled factor of variation in the data. +This differs from more common algorithms for disentangling that originated in +the deep learning community, such as the beta-VAE, TCVAE or FactorVAE, which +learn a nonlinear embedding and probabilistic generative model of the data. +GEOMANCER is intended for data where the individual factors of variation might +be more than one dimensional, for instance 3D rotations. At the moment, +GEOMANCER works best when some ground truth information about the metric in the +data space is available, for instance knowledge of the "true" nearest neighbors +around each point, and we do not recommend running GEOMANCER directly on +unstructured data from high-dimensional spaces. We are providing the code here +to enable the interested researcher to get some hands-on experience with the +ideas around differential geometry, holonomy and higher-order graph connection +Laplacians we explore in the paper. ## Installation @@ -89,9 +91,69 @@ submanifolds. point. If `--rotate=False`, and GEOMANCER has sufficient data, each basis matrix should span the same subspace as the results in the second plot. +## Stanford 3D Objects for Disentangling (S3O4D) + +![Stanford 3D Objects for Disentangling](s3o4d.png) + +The data used in the "Stanford 3D Objects" section of the experimental results +can be found [here](https://console.cloud.google.com/storage/browser/dm_s3o4d). +The data consists of 100,000 renderings each of the Bunny and Dragon objects +from the [Stanford 3D Scanning Repository](http://graphics.stanford.edu/data/3Dscanrep/). +More objects may be added in the future, but only the Bunny and Dragon are used +in the paper. Each object is rendered with a uniformly sampled illumination from +a point on the 2-sphere, and a uniformly sampled 3D rotation. The true latent +states are provided as NumPy arrays along with the images. The lighting is given +as a 3-vector with unit norm, while the rotation is provided both as a +quaternion and a 3x3 orthogonal matrix. + +### Why another dataset? + +There are many similarities between S3O4D and existing ML benchmark datasets +like [NORB](https://cs.nyu.edu/~ylclab/data/norb-v1.0/), +[3D Chairs](https://github.com/mathieuaubry/seeing3Dchairs), +[3D Shapes](https://github.com/deepmind/3d-shapes) and many others, which also +include renderings of a set of objects under different pose and illumination +conditions. However, none of these existing datasets include the *full manifold* +of rotations in 3D - most include only a subset of changes to elevation and +azimuth. S3O4D images are sampled uniformly and independently from the full +space of rotations and illuminations, meaning the dataset contains objects that +are upside down and illuminated from behind or underneath. We believe that this +makes S3O4D uniquely suited for research on generative models where the latent +space has non-trivial topology, as well as for general manifold learning +methods where the curvature of the manifold is important. + +### Usage + +To load the data for a given object, unzip `images.zip` into a folder called +`images` in the same directory as `latents.npz`, and from inside that +directory run: + +``` +import numpy as np +from PIL import Image + +with open('latents.npz', 'r') as f: + data = np.load(f) + illumination = data['illumination'] # lighting source position, a 3-vector + pose_quat = data['pose_quat'] # object pose (3D rotation as a quaternion) + pose_mat = data['pose_mat'] # object pose (3D rotation as a matrix) + +def get_data(i): + """Return data and latent given an index up to 100,000.""" + img = np.array(Image.open(f'images/{i:05}.jpg')) + # Uses the matrix, not quaternion, representation, + # similarly to the experiments in the paper + latent = np.concatenate((illumination[i], + pose_mat[i].reshape(-1))) + return img, latent + +img, latent = get_data(0) +``` + ## Giving Credit -If you use this code in your work, we ask you to cite this paper: +If you use this code or the Stanford 3D Objects for Disentangling data in your +work, we ask you to cite this paper: ``` @article{pfau2020disentangling, diff --git a/geomancer/s3o4d.png b/geomancer/s3o4d.png new file mode 100644 index 0000000..d613fd7 Binary files /dev/null and b/geomancer/s3o4d.png differ