Add information about Stanford 3D Objects for Disentangling (S3O4D) to README for GEOMANCER

PiperOrigin-RevId: 342946882
2026-05-20 21:41:05 +08:00 · 2020-11-17 22:28:37 +00:00
parent 5b261680f3
commit 403f9a976a
2 changed files with 79 additions and 17 deletions
@@ -2,22 +2,24 @@

 This package provides an implementation of the Geometric Manifold Component
 Estimator, or GEOMANCER, as described in [Disentangling by Subspace Diffusion
-(2020)](https://arxiv.org/abs/2006.12982). GEOMANCER is a nonparametric
-algorithm for disentangling, somewhat similar in spirit to Laplacian Eigenmaps
-or Vector Diffusion Maps, except instead of producing an embedding for the data,
-it produces a set of subspaces around each data point, one subspace for each
-disentangled factor of variation in the data. This differs from more common
-algorithms for disentangling that originated in the deep learning community,
-such as the beta-VAE, TCVAE or FactorVAE, which learn a nonlinear embedding and
-probabilistic generative model of the data. GEOMANCER is intended for data where
-the individual factors of variation might be more than one dimensional, for
-instance 3D rotations. At the moment, GEOMANCER works best when some ground
-truth information about the metric in the data space is available, for instance
-knowledge of the "true" nearest neighbors around each point, and we do not
-recommend running GEOMANCER directly on unstructured data from high-dimensional
-spaces. We are providing the code here to enable the interested researcher to
-get some hands-on experience with the ideas around differential geometry,
-holonomy and higher-order graph connection Laplacians we explore in the paper.
+(2020)](https://arxiv.org/abs/2006.12982), as well as information about the
+[Stanford 3D Objects for Disentangling (S3O4D) dataset](https://console.cloud.google.com/storage/browser/dm_s3o4d).
+GEOMANCER is a nonparametric algorithm for disentangling, somewhat similar in
+spirit to Laplacian Eigenmaps or Vector Diffusion Maps, except instead of
+producing an embedding for the data, it produces a set of subspaces around each
+data point, one subspace for each disentangled factor of variation in the data.
+This differs from more common algorithms for disentangling that originated in
+the deep learning community, such as the beta-VAE, TCVAE or FactorVAE, which
+learn a nonlinear embedding and probabilistic generative model of the data.
+GEOMANCER is intended for data where the individual factors of variation might
+be more than one dimensional, for instance 3D rotations. At the moment,
+GEOMANCER works best when some ground truth information about the metric in the
+data space is available, for instance knowledge of the "true" nearest neighbors
+around each point, and we do not recommend running GEOMANCER directly on
+unstructured data from high-dimensional spaces. We are providing the code here
+to enable the interested researcher to get some hands-on experience with the
+ideas around differential geometry, holonomy and higher-order graph connection
+Laplacians we explore in the paper.


 ## Installation
@@ -89,9 +91,69 @@ submanifolds.
 point. If `--rotate=False`, and GEOMANCER has sufficient data, each basis matrix
 should span the same subspace as the results in the second plot.

+## Stanford 3D Objects for Disentangling (S3O4D)
+
+![Stanford 3D Objects for Disentangling](s3o4d.png)
+
+The data used in the "Stanford 3D Objects" section of the experimental results
+can be found [here](https://console.cloud.google.com/storage/browser/dm_s3o4d).
+The data consists of 100,000 renderings each of the Bunny and Dragon objects
+from the [Stanford 3D Scanning Repository](http://graphics.stanford.edu/data/3Dscanrep/).
+More objects may be added in the future, but only the Bunny and Dragon are used
+in the paper. Each object is rendered with a uniformly sampled illumination from
+a point on the 2-sphere, and a uniformly sampled 3D rotation. The true latent
+states are provided as NumPy arrays along with the images. The lighting is given
+as a 3-vector with unit norm, while the rotation is provided both as a
+quaternion and a 3x3 orthogonal matrix.
+
+### Why another dataset?
+
+There are many similarities between S3O4D and existing ML benchmark datasets
+like [NORB](https://cs.nyu.edu/~ylclab/data/norb-v1.0/),
+[3D Chairs](https://github.com/mathieuaubry/seeing3Dchairs),
+[3D Shapes](https://github.com/deepmind/3d-shapes) and many others, which also
+include renderings of a set of objects under different pose and illumination
+conditions. However, none of these existing datasets include the *full manifold*
+of rotations in 3D - most include only a subset of changes to elevation and
+azimuth. S3O4D images are sampled uniformly and independently from the full
+space of rotations and illuminations, meaning the dataset contains objects that
+are upside down and illuminated from behind or underneath. We believe that this
+makes S3O4D uniquely suited for research on generative models where the latent
+space has non-trivial topology, as well as for general manifold learning
+methods where the curvature of the manifold is important.
+
+### Usage
+
+To load the data for a given object, unzip `images.zip` into a folder called
+`images` in the same directory as `latents.npz`, and from inside that
+directory run:
+
+```
+import numpy as np
+from PIL import Image
+
+with open('latents.npz', 'r') as f:
+  data = np.load(f)
+  illumination = data['illumination']  # lighting source position, a 3-vector
+  pose_quat = data['pose_quat']  # object pose (3D rotation as a quaternion)
+  pose_mat = data['pose_mat']  # object pose (3D rotation as a matrix)
+
+def get_data(i):
+  """Return data and latent given an index up to 100,000."""
+  img = np.array(Image.open(f'images/{i:05}.jpg'))
+  # Uses the matrix, not quaternion, representation,
+  # similarly to the experiments in the paper
+  latent = np.concatenate((illumination[i],
+                           pose_mat[i].reshape(-1)))
+  return img, latent
+
+img, latent = get_data(0)
+```
+
 ## Giving Credit

-If you use this code in your work, we ask you to cite this paper:
+If you use this code or the Stanford 3D Objects for Disentangling data in your
+work, we ask you to cite this paper:

 ```
@article{pfau2020disentangling,