Add information about Stanford 3D Objects for Disentangling (S3O4D) to README for GEOMANCER

PiperOrigin-RevId: 342946882
This commit is contained in:
David Pfau
2020-11-17 22:28:37 +00:00
committed by Saran Tunyasuvunakool
parent 5b261680f3
commit 403f9a976a
2 changed files with 79 additions and 17 deletions
+79 -17
View File
@@ -2,22 +2,24 @@
This package provides an implementation of the Geometric Manifold Component
Estimator, or GEOMANCER, as described in [Disentangling by Subspace Diffusion
(2020)](https://arxiv.org/abs/2006.12982). GEOMANCER is a nonparametric
algorithm for disentangling, somewhat similar in spirit to Laplacian Eigenmaps
or Vector Diffusion Maps, except instead of producing an embedding for the data,
it produces a set of subspaces around each data point, one subspace for each
disentangled factor of variation in the data. This differs from more common
algorithms for disentangling that originated in the deep learning community,
such as the beta-VAE, TCVAE or FactorVAE, which learn a nonlinear embedding and
probabilistic generative model of the data. GEOMANCER is intended for data where
the individual factors of variation might be more than one dimensional, for
instance 3D rotations. At the moment, GEOMANCER works best when some ground
truth information about the metric in the data space is available, for instance
knowledge of the "true" nearest neighbors around each point, and we do not
recommend running GEOMANCER directly on unstructured data from high-dimensional
spaces. We are providing the code here to enable the interested researcher to
get some hands-on experience with the ideas around differential geometry,
holonomy and higher-order graph connection Laplacians we explore in the paper.
(2020)](https://arxiv.org/abs/2006.12982), as well as information about the
[Stanford 3D Objects for Disentangling (S3O4D) dataset](https://console.cloud.google.com/storage/browser/dm_s3o4d).
GEOMANCER is a nonparametric algorithm for disentangling, somewhat similar in
spirit to Laplacian Eigenmaps or Vector Diffusion Maps, except instead of
producing an embedding for the data, it produces a set of subspaces around each
data point, one subspace for each disentangled factor of variation in the data.
This differs from more common algorithms for disentangling that originated in
the deep learning community, such as the beta-VAE, TCVAE or FactorVAE, which
learn a nonlinear embedding and probabilistic generative model of the data.
GEOMANCER is intended for data where the individual factors of variation might
be more than one dimensional, for instance 3D rotations. At the moment,
GEOMANCER works best when some ground truth information about the metric in the
data space is available, for instance knowledge of the "true" nearest neighbors
around each point, and we do not recommend running GEOMANCER directly on
unstructured data from high-dimensional spaces. We are providing the code here
to enable the interested researcher to get some hands-on experience with the
ideas around differential geometry, holonomy and higher-order graph connection
Laplacians we explore in the paper.
## Installation
@@ -89,9 +91,69 @@ submanifolds.
point. If `--rotate=False`, and GEOMANCER has sufficient data, each basis matrix
should span the same subspace as the results in the second plot.
## Stanford 3D Objects for Disentangling (S3O4D)
![Stanford 3D Objects for Disentangling](s3o4d.png)
The data used in the "Stanford 3D Objects" section of the experimental results
can be found [here](https://console.cloud.google.com/storage/browser/dm_s3o4d).
The data consists of 100,000 renderings each of the Bunny and Dragon objects
from the [Stanford 3D Scanning Repository](http://graphics.stanford.edu/data/3Dscanrep/).
More objects may be added in the future, but only the Bunny and Dragon are used
in the paper. Each object is rendered with a uniformly sampled illumination from
a point on the 2-sphere, and a uniformly sampled 3D rotation. The true latent
states are provided as NumPy arrays along with the images. The lighting is given
as a 3-vector with unit norm, while the rotation is provided both as a
quaternion and a 3x3 orthogonal matrix.
### Why another dataset?
There are many similarities between S3O4D and existing ML benchmark datasets
like [NORB](https://cs.nyu.edu/~ylclab/data/norb-v1.0/),
[3D Chairs](https://github.com/mathieuaubry/seeing3Dchairs),
[3D Shapes](https://github.com/deepmind/3d-shapes) and many others, which also
include renderings of a set of objects under different pose and illumination
conditions. However, none of these existing datasets include the *full manifold*
of rotations in 3D - most include only a subset of changes to elevation and
azimuth. S3O4D images are sampled uniformly and independently from the full
space of rotations and illuminations, meaning the dataset contains objects that
are upside down and illuminated from behind or underneath. We believe that this
makes S3O4D uniquely suited for research on generative models where the latent
space has non-trivial topology, as well as for general manifold learning
methods where the curvature of the manifold is important.
### Usage
To load the data for a given object, unzip `images.zip` into a folder called
`images` in the same directory as `latents.npz`, and from inside that
directory run:
```
import numpy as np
from PIL import Image
with open('latents.npz', 'r') as f:
data = np.load(f)
illumination = data['illumination'] # lighting source position, a 3-vector
pose_quat = data['pose_quat'] # object pose (3D rotation as a quaternion)
pose_mat = data['pose_mat'] # object pose (3D rotation as a matrix)
def get_data(i):
"""Return data and latent given an index up to 100,000."""
img = np.array(Image.open(f'images/{i:05}.jpg'))
# Uses the matrix, not quaternion, representation,
# similarly to the experiments in the paper
latent = np.concatenate((illumination[i],
pose_mat[i].reshape(-1)))
return img, latent
img, latent = get_data(0)
```
## Giving Credit
If you use this code in your work, we ask you to cite this paper:
If you use this code or the Stanford 3D Objects for Disentangling data in your
work, we ask you to cite this paper:
```
@article{pfau2020disentangling,
Binary file not shown.

After

Width:  |  Height:  |  Size: 137 KiB