mirror of
https://github.com/google-deepmind/deepmind-research.git
synced 2026-05-20 20:41:48 +08:00
Add information about Stanford 3D Objects for Disentangling (S3O4D) to README for GEOMANCER
PiperOrigin-RevId: 342946882
This commit is contained in:
committed by
Saran Tunyasuvunakool
parent
5b261680f3
commit
403f9a976a
+79
-17
@@ -2,22 +2,24 @@
|
||||
|
||||
This package provides an implementation of the Geometric Manifold Component
|
||||
Estimator, or GEOMANCER, as described in [Disentangling by Subspace Diffusion
|
||||
(2020)](https://arxiv.org/abs/2006.12982). GEOMANCER is a nonparametric
|
||||
algorithm for disentangling, somewhat similar in spirit to Laplacian Eigenmaps
|
||||
or Vector Diffusion Maps, except instead of producing an embedding for the data,
|
||||
it produces a set of subspaces around each data point, one subspace for each
|
||||
disentangled factor of variation in the data. This differs from more common
|
||||
algorithms for disentangling that originated in the deep learning community,
|
||||
such as the beta-VAE, TCVAE or FactorVAE, which learn a nonlinear embedding and
|
||||
probabilistic generative model of the data. GEOMANCER is intended for data where
|
||||
the individual factors of variation might be more than one dimensional, for
|
||||
instance 3D rotations. At the moment, GEOMANCER works best when some ground
|
||||
truth information about the metric in the data space is available, for instance
|
||||
knowledge of the "true" nearest neighbors around each point, and we do not
|
||||
recommend running GEOMANCER directly on unstructured data from high-dimensional
|
||||
spaces. We are providing the code here to enable the interested researcher to
|
||||
get some hands-on experience with the ideas around differential geometry,
|
||||
holonomy and higher-order graph connection Laplacians we explore in the paper.
|
||||
(2020)](https://arxiv.org/abs/2006.12982), as well as information about the
|
||||
[Stanford 3D Objects for Disentangling (S3O4D) dataset](https://console.cloud.google.com/storage/browser/dm_s3o4d).
|
||||
GEOMANCER is a nonparametric algorithm for disentangling, somewhat similar in
|
||||
spirit to Laplacian Eigenmaps or Vector Diffusion Maps, except instead of
|
||||
producing an embedding for the data, it produces a set of subspaces around each
|
||||
data point, one subspace for each disentangled factor of variation in the data.
|
||||
This differs from more common algorithms for disentangling that originated in
|
||||
the deep learning community, such as the beta-VAE, TCVAE or FactorVAE, which
|
||||
learn a nonlinear embedding and probabilistic generative model of the data.
|
||||
GEOMANCER is intended for data where the individual factors of variation might
|
||||
be more than one dimensional, for instance 3D rotations. At the moment,
|
||||
GEOMANCER works best when some ground truth information about the metric in the
|
||||
data space is available, for instance knowledge of the "true" nearest neighbors
|
||||
around each point, and we do not recommend running GEOMANCER directly on
|
||||
unstructured data from high-dimensional spaces. We are providing the code here
|
||||
to enable the interested researcher to get some hands-on experience with the
|
||||
ideas around differential geometry, holonomy and higher-order graph connection
|
||||
Laplacians we explore in the paper.
|
||||
|
||||
|
||||
## Installation
|
||||
@@ -89,9 +91,69 @@ submanifolds.
|
||||
point. If `--rotate=False`, and GEOMANCER has sufficient data, each basis matrix
|
||||
should span the same subspace as the results in the second plot.
|
||||
|
||||
## Stanford 3D Objects for Disentangling (S3O4D)
|
||||
|
||||

|
||||
|
||||
The data used in the "Stanford 3D Objects" section of the experimental results
|
||||
can be found [here](https://console.cloud.google.com/storage/browser/dm_s3o4d).
|
||||
The data consists of 100,000 renderings each of the Bunny and Dragon objects
|
||||
from the [Stanford 3D Scanning Repository](http://graphics.stanford.edu/data/3Dscanrep/).
|
||||
More objects may be added in the future, but only the Bunny and Dragon are used
|
||||
in the paper. Each object is rendered with a uniformly sampled illumination from
|
||||
a point on the 2-sphere, and a uniformly sampled 3D rotation. The true latent
|
||||
states are provided as NumPy arrays along with the images. The lighting is given
|
||||
as a 3-vector with unit norm, while the rotation is provided both as a
|
||||
quaternion and a 3x3 orthogonal matrix.
|
||||
|
||||
### Why another dataset?
|
||||
|
||||
There are many similarities between S3O4D and existing ML benchmark datasets
|
||||
like [NORB](https://cs.nyu.edu/~ylclab/data/norb-v1.0/),
|
||||
[3D Chairs](https://github.com/mathieuaubry/seeing3Dchairs),
|
||||
[3D Shapes](https://github.com/deepmind/3d-shapes) and many others, which also
|
||||
include renderings of a set of objects under different pose and illumination
|
||||
conditions. However, none of these existing datasets include the *full manifold*
|
||||
of rotations in 3D - most include only a subset of changes to elevation and
|
||||
azimuth. S3O4D images are sampled uniformly and independently from the full
|
||||
space of rotations and illuminations, meaning the dataset contains objects that
|
||||
are upside down and illuminated from behind or underneath. We believe that this
|
||||
makes S3O4D uniquely suited for research on generative models where the latent
|
||||
space has non-trivial topology, as well as for general manifold learning
|
||||
methods where the curvature of the manifold is important.
|
||||
|
||||
### Usage
|
||||
|
||||
To load the data for a given object, unzip `images.zip` into a folder called
|
||||
`images` in the same directory as `latents.npz`, and from inside that
|
||||
directory run:
|
||||
|
||||
```
|
||||
import numpy as np
|
||||
from PIL import Image
|
||||
|
||||
with open('latents.npz', 'r') as f:
|
||||
data = np.load(f)
|
||||
illumination = data['illumination'] # lighting source position, a 3-vector
|
||||
pose_quat = data['pose_quat'] # object pose (3D rotation as a quaternion)
|
||||
pose_mat = data['pose_mat'] # object pose (3D rotation as a matrix)
|
||||
|
||||
def get_data(i):
|
||||
"""Return data and latent given an index up to 100,000."""
|
||||
img = np.array(Image.open(f'images/{i:05}.jpg'))
|
||||
# Uses the matrix, not quaternion, representation,
|
||||
# similarly to the experiments in the paper
|
||||
latent = np.concatenate((illumination[i],
|
||||
pose_mat[i].reshape(-1)))
|
||||
return img, latent
|
||||
|
||||
img, latent = get_data(0)
|
||||
```
|
||||
|
||||
## Giving Credit
|
||||
|
||||
If you use this code in your work, we ask you to cite this paper:
|
||||
If you use this code or the Stanford 3D Objects for Disentangling data in your
|
||||
work, we ask you to cite this paper:
|
||||
|
||||
```
|
||||
@article{pfau2020disentangling,
|
||||
|
||||
Binary file not shown.
|
After Width: | Height: | Size: 137 KiB |
Reference in New Issue
Block a user