Internal change.

PiperOrigin-RevId: 372158522
2025-12-11 17:39:09 +08:00 · 2021-05-05 17:42:43 +00:00
parent 2e866f1937
commit 59f5fb1268
6 changed files with 915 additions and 0 deletions
--- a/cmtouch/CMTouch_Dataset_Visulization.ipynb
+++ b/cmtouch/CMTouch_Dataset_Visulization.ipynb
--- a/cmtouch/README.md
+++ b/cmtouch/README.md
@@ -0,0 +1,209 @@
+--------------------------------------------------------------------------------
+
+# CMTouch Dataset
+
+<!-- ![downloads](https://img.shields.io/github/downloads/atom/atom/total.svg)
+![build](https://img.shields.io/appveyor/ci/:user/:repo.svg)
+![chat](https://img.shields.io/discord/:serverId.svg)
+ -->
+
+This repository contains datasets for cross-modal representation learning, used
+in developing rich touch representations in "Learning rich touch representations
+through cross-modal self-supervision" [1].
+
+The datasets we provide are:
+
+1.  CMTouch-Props
+2.  CMTouch-YCB
+
+The datasets consist of episodes collected by running a reinforcement learning
+agent on a simulated Shadow Dexterous Hand [2] interacting with different
+objects. From this interactions, observations from different sensory modalities
+are collected at each time step, including vision, proprioception (joint
+positions and velocities), touch, actions, object IDs. We used these data to
+learn rich touch representations using cross-modal self-supervision.
+
+## Bibtex
+
+If you use one of these datasets in your work, please cite the reference paper
+as follows:
+
+```
+@InProceedings{zambelli20learning,
+author = "Zambelli, Martina and Aytar, Yusuf and Visin, Francesco and Zhou, Yuxiang and Hadsell, Raia",
+title = "Learning rich touch representations through cross-modal self-supervision",
+year = "2020",
+}
+```
+
+<!--
+@misc{cmtouchdatasets}, title={CMTouch Datasets}, author={Zambelli, Martina and
+Aytar, Yusuf and Visin, Francesco and Zhou, Yuxiang and Hadsell, Raia},
+howpublished={https://github.com/deepmind/deepmind-research/tree/master/cmtouch},
+year={2020} }
+-->
+
+## Descriptions
+
+### Experimental setup
+
+We run experiments in simulation with MuJoCo [3] and we use the simulated Shadow
+Dexterous Hand [2], with five fingers and 24 degrees of freedom, actuated by 20
+motors. In simulation, each fingertip has a spatial touch sensor attached with a
+spatial resolution of 4×4 and three channels: one for normal force and two for
+tangential forces. We simplify this by summing across the spatial dimensions,
+to obtain a single force vector for each fingertip representing one normal force
+and two tangential forces. The state consists of proprioception (joint positions
+and joint velocities) and touch.
+
+Visual inputs are collected with a 64×64 resolution and are only used for
+representation learning, but are not provided as observations to control the
+robot’s actions. The action space is 20-dimensional. We use velocity control and
+a control rate of 30 Hz. Each episode has 200 time steps, which correspond to
+about 6 seconds. The environment consists of the Shadow Hand, facing down, and
+interacting with different objects. These objects have different shapes, sizes
+and physical properties (e.g. rigid or soft). We develop two versions of the
+task, the first using simple props and the second using YCB objects. In both
+cases, objects are fixed to their frame of reference, while their position and
+orientation are randomized.
+
+### CMTouch-Props
+
+This is a dataset based on simple geometric 3D shapes (referred to as "props").
+Props are simple 3D shaped objects that include cubes, spheres, cylinders and
+ellipsoid of different sizes. We also generated the soft version of each prop,
+which can deform under the pressure of the touching fingers.
+
+Soft deformable objects are complex entities to simulate: they are defined
+through a composition of multiple bodies (capsules) that are tied together to
+form a shape, such as a cube or a sphere. The main characteristic of these
+objects is their elastic behaviour, that is they change shape when touched. The
+most difficult thing to simulate in this context is contacts, which grow
+exponentially with the increased number of colliding bodies.
+
+Forty-eight different objects are generated by sampling from 6 different sizes,
+4 different shapes (i.e. sphere, cylinder, cube, ellipsoid), and they can either
+be rigid or soft.
+
+![](https://i.imgur.com/Hps38z5.jpg)
+
+### CMTouch-YCB
+
+This is a dataset based on YCB objects. The YCB objects dataset [4] consists of
+everyday objects with different shapes, sizes, textures, weight and rigidity.
+
+We chose a set of ten objects: cracker box, sugar box, mustard bottle, potted
+meat can, banana, pitcher base, bleach cleanser, mug, power drill, scissors.
+These are generated in simulation at their standard size, which is also
+proportionate to the default dimension of the simulated Shadow Hand.
+
+The pose of each object is randomly selected among a set of 60 different poses,
+where we vary the orientation of the object. These variations make the
+identification of each object more complex than the CMTouch-Props and require a
+higher generalization capability from the learning method applied.
+
+![](https://i.imgur.com/Mf3KYbn.jpg)
+
+## Download
+
+The datasets can be downloaded from
+[Google Cloud Storage](https://console.cloud.google.com/storage/browser/dm_cmtouch).
+Each dataset is a single
+[TFRecord](https://www.tensorflow.org/tutorials/load_data/tfrecord) file.
+
+On Linux, to download a particular dataset, use the web interface, or run `wget`
+with the appropriate filename as follows:
+
+```
+wget https://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_props_all_test.tfrecords
+wget https://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_props_all_train.tfrecords
+wget https://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_props_all_val.tfrecords
+wget https://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_ycb_all_test.tfrecords
+wget https://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_ycb_all_train.tfrecords
+wget https://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_ycb_all_val.tfrecords
+```
+
+
+## Usage
+
+After downloading the dataset files, you can read them as `tf.data.Dataset`
+instances with the readers provided. The example below shows how to read the
+cmtouch-props dataset:
+
+```
+record_file = 'test.tfrecords'
+dataset = tf.data.TFRecordDataset(record_file)
+parsed_dataset = dataset.map(_parse_tf_example)
+```
+
+(a complete example is provided in the Colab).
+
+All dataset readers return the following set of observations:
+
+'camera': tf.io.FixedLenFeature([], tf.string),
+    'camera/height': tf.io.FixedLenFeature([], tf.int64),
+    'camera/width': tf.io.FixedLenFeature([], tf.int64),
+    'camera/channel': tf.io.FixedLenFeature([], tf.int64),
+    'object_id': tf.io.FixedLenFeature([], tf.string),  # for both
+    'object_id/dim': tf.io.FixedLenFeature([], tf.int64),
+    'orientation_id': tf.io.FixedLenFeature([], tf.string),  # only for ycb
+    'orientation_id/dim': tf.io.FixedLenFeature([], tf.int64),
+    'shadowhand_motor/joints_vel': tf.io.FixedLenFeature([], tf.string),
+    'shadowhand_motor/joints_vel/dim': tf.io.FixedLenFeature([], tf.int64),
+    'shadowhand_motor/joints_pos': tf.io.FixedLenFeature([], tf.string),
+    'shadowhand_motor/joints_pos/dim': tf.io.FixedLenFeature([], tf.int64),
+    'shadowhand_motor/spatial_touch': tf.io.FixedLenFeature([], tf.string),
+    'shadowhand_motor/spatial_touch/dim': tf.io.FixedLenFeature([], tf.int64),
+    'actions'
+
+*   'camera': `Tensor` of shape [sequence_length, height, width, channels] and type
+    uint8
+
+*   'shadowhand_motor/spatial_touch': `Tensor` of shape [sequence_length, num_fingers x 3] and type float32
+
+*   'shadowhand_motor/joints_pos': `Tensor` of shape [sequence_length, num_joint_positions] and type float32
+
+*   'shadowhand_motor/joints_vel': `Tensor` of shape [sequence_length,
+    num_joint_velocities] and type float32
+
+*   'actions': `Tensor` of shape [sequence_length, num_actuated_joints] and type
+    float32
+
+*   'object_id': `Scalar` indicating an object identification number
+
+*   'orientation_id': `Scalar` indicating a YCB object pose identification number
+    (CMTouch-YCB only)
+
+Few-shot evaluations can be made by creating subsets of data to train and
+evaluate the models.
+
+<!--
+```diff=
+- TODO
+```
+-->
+
+## References
+
+[1] M. Zambelli, Y. Aytar, F. Visin, Y. Zhou, R. Hadsell. Learning rich touch
+representations through cross-modal self-supervision. Conference on Robot
+Learning (CoRL), 2020.
+
+[2] ShadowRobot, Shadow Dexterous Hand.
+https://www.shadowrobot.com/products/dexterous-hand/.
+
+[3] E. Todorov, T. Erez, and Y. Tassa. MuJoCo: A physics engine for model-based
+control. In Proceedings of the International Conference on Intelligent Robots
+and Systems (IROS), 2012.
+
+[4] B. Calli, A. Singh, J. Bruce, A. Walsman, K. Konolige, S. Srinivasa, P.
+Abbeel, and A. M. Dollar. Yale-cmu-berkeley dataset for robotic manipulation
+research. The International Journal of RoboticsResearch, 36(3):261–268, 2017.
+
+## Disclaimers
+
+This is not an official Google product.
+
+## Appendix and FAQ
+
+**Find this document incomplete?** Leave a comment!
--- a/cmtouch/download_datasets.sh
+++ b/cmtouch/download_datasets.sh
@@ -0,0 +1,123 @@
+#!/bin/bash
+# Copyright 2020 Deepmind Technologies Limited.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_all_im64_train.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_all_im64_train10.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_all_im64_train100.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_all_im64_train1000.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_all_im64_train250.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_all_im64_train30.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_all_im64_train50.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_all_im64_train500.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_all_im64_val.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj0_im64_train.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj0_im64_train1500.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj0_im64_train180.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj0_im64_train300.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj0_im64_train3000.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj0_im64_train60.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj0_im64_train600.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj0_im64_train6000.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj0_im64_val.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj1_im64_train.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj1_im64_train1500.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj1_im64_train180.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj1_im64_train300.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj1_im64_train3000.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj1_im64_train60.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj1_im64_train600.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj1_im64_train6000.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj1_im64_val.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj2_im64_train.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj2_im64_train1500.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj2_im64_train180.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj2_im64_train300.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj2_im64_train3000.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj2_im64_train60.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj2_im64_train600.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj2_im64_train6000.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj2_im64_val.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj3_im64_train.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj3_im64_train1500.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj3_im64_train180.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj3_im64_train300.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj3_im64_train3000.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj3_im64_train60.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj3_im64_train600.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj3_im64_train6000.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj3_im64_val.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj4_im64_train.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj4_im64_train1500.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj4_im64_train180.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj4_im64_train300.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj4_im64_train3000.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj4_im64_train60.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj4_im64_train600.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj4_im64_train6000.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj4_im64_val.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj5_im64_train.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj5_im64_train1500.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj5_im64_train180.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj5_im64_train300.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj5_im64_train3000.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj5_im64_train60.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj5_im64_train600.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj5_im64_train6000.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj5_im64_val.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj6_im64_train.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj6_im64_train1500.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj6_im64_train180.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj6_im64_train300.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj6_im64_train3000.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj6_im64_train60.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj6_im64_train600.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj6_im64_train6000.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj6_im64_val.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj7_im64_train.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj7_im64_train1500.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj7_im64_train180.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj7_im64_train300.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj7_im64_train3000.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj7_im64_train60.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj7_im64_train600.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj7_im64_train6000.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj7_im64_val.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj8_im64_train.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj8_im64_train1500.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj8_im64_train180.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj8_im64_train300.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj8_im64_train3000.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj8_im64_train60.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj8_im64_train600.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj8_im64_train6000.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj8_im64_val.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj9_im64_train.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj9_im64_train1500.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj9_im64_train180.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj9_im64_train300.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj9_im64_train3000.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj9_im64_train60.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj9_im64_train600.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj9_im64_train6000.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_objects_ycb_obj9_im64_val.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_props_all_im64_train.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_props_all_im64_train1200.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_props_all_im64_train144.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_props_all_im64_train240.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_props_all_im64_train2400.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_props_all_im64_train48.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_props_all_im64_train480.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_props_all_im64_train4800.tfrecords
+wget http://storage.googleapis.com/dm_cmtouch/datasets/cmtouch_touch_props_all_im64_val.tfrecords
--- a/synthetic_returns/README.md
+++ b/synthetic_returns/README.md
@@ -0,0 +1,125 @@
+# Code for Synthetic Returns
+
+This repository contains code for the arXiv preprint
+["Synthetic Returns for Long-Term Credit Assignment"](https://arxiv.org/abs/2102.12425)
+by David Raposo, Sam Ritter, Adam Santoro, Greg Wayne, Theophane Weber, Matt
+Botvinick, Hado van Hasselt, and Francis Song.
+
+To cite this work:
+
+```
+@article{raposo2021synthetic,
+  title={Rapid Task-Solving in Novel Environments},
+  author={Raposo, David and Ritter, Sam and Santoro, Adam and Wayne, Greg and
+  Weber, Theophane and Botvinick, Matt and van Hasselt, Hado and Song, Francis},
+  journal={arXiv preprint arXiv:2102.12425},
+  year={2021}
+}
+```
+
+### Agent core wrapper
+
+We implemented the Synthetic Returns module as a wrapper to a recurrent neural
+network (RNN), so it should be compatible with any Deep-RL agent with an
+arbitrary RNN core, whose inputs consist of batches of vectors. This could be an
+LSTM as in the example below, or a more sophisticated core as long as it
+implements an `hk.RNNCore`.
+
+```python
+agent_core = hk.LSTM(128)
+```
+
+To build the SR wrapper, simply pass the existing agent core to the constructor,
+along with the SR configuration:
+
+```python
+sr_config = {
+    "memory_size": 128,
+    "capacity": 300,
+    "hidden_layers": (128, 128),
+    "alpha": 0.3,
+    "beta": 1.0,
+}
+sr_agent_core = hk.ResetCore(
+  SyntheticReturnsCoreWrapper(core=agent_core, **sr_config))
+```
+
+Typically, the SR wrapper should itself be wrapped in a `hk.ResetCore` in order
+to reset the core state in the beginning of a new episode. This will reset not
+only the episodic memory but also the original agent core that was passed to the
+SR wrapper constructor.
+
+### Learner
+
+Consider the distributed setting, wherein a learner receives mini-batches of
+trajectories of length `T` produced by the actors.
+
+`trajectory` is a nested structure of tensors of size `[T,B,...]` (where `B` is
+the batch size) containing observations, agent states, rewards and step type
+indicators.
+
+We start by producing inputs to the SR core, which consist of tuples of current
+state embeddings and return targets. The current state embeddings can be
+produced by a ConvNet, for example. In our experiments we used the current step
+reward as target. Note that the current step reward correspond to the rewards in
+the trajectory shifted by one, relative to the observations:
+
+```python
+observations = jax.tree_map(lambda x: x[:-1], trajectory.observation)
+vision_output = hk.BatchApply(vision_net)(observations)
+return_targets = trajectory.reward[1:]
+sr_core_inputs = (vision_output, return_targets)
+```
+
+For purposes of core resetting at the beginning of a new episode, we also need
+to pass an indicator of which steps correspond to the first step of an episode.
+
+```python
+should_reset = jnp.equal(
+  trajectory.step_type[:-1], int(dm_env.StepType.FIRST))
+core_inputs = (sr_core_inputs, should_reset)
+```
+
+We can now produce an unroll using `hk.dynamic_unroll` and passing it the SR
+core, the core inputs we produced, and the initial state of the unroll, which
+corresponds to the agent state in the first step of the trajectory:
+
+```python
+state = jax.tree_map(lambda t: t[0], trajectory.agent_state)
+core_output, state = hk.dynamic_unroll(
+  sr_agent_core, core_inputs, state)
+```
+
+The SR wrapper produces 4 output tensors: the output of the agent core, the
+synthetic returns, the SR-augmented return, and the SR loss.
+
+The synthetic returns are taken into account when computing the augmented return
+and the SR loss. Therefore they are not needed anymore and can be discarded or
+used for logging purposes.
+
+The agent core outputs should be used, as usual, for producing a policy. In an
+actor-critic, policy gradient set-up, like IMPALA, we would produce policy
+logits and values:
+
+```python
+policy_logits = hk.BatchApply(policy_net)(core_output.output)
+value = hk.BatchApply(baseline_net)(core_output.output)
+```
+
+Similarly, in a Q-learning setting we would use the agent core outputs to
+produce q-values.
+
+The SR-augmented returns should be used in place of the environment rewards for
+the policy updates (e.g. when computing the policy gradient and baseline
+losses):
+
+```python
+rewards = core_output.augmented_return
+```
+
+Finally, the SR loss, summed over batch and time dimensions, should be added to
+the total learner loss to be minimized:
+
+```python
+total_loss += jnp.sum(core_output.sr_loss)
+```
--- a/synthetic_returns/requirements.txt
+++ b/synthetic_returns/requirements.txt
@@ -0,0 +1,2 @@
+dm-haiku>=0.0.3
+jax>=0.2.8
--- a/synthetic_returns/synthetic_returns.py
+++ b/synthetic_returns/synthetic_returns.py
@@ -0,0 +1,187 @@
+# Copyright 2021 DeepMind Technologies Limited. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Episodic Memory and Synthetic Returns Core Wrapper modules."""
+import collections
+
+import haiku as hk
+import jax
+import jax.numpy as jnp
+
+SRCoreWrapperOutput = collections.namedtuple(
+    "SRCoreWrapperOutput", ["output", "synthetic_return", "augmented_return",
+                            "sr_loss"])
+
+
+class EpisodicMemory(hk.RNNCore):
+  """Episodic Memory module."""
+
+  def __init__(self, memory_size, capacity, name="episodic_memory"):
+    """Constructor.
+
+    Args:
+      memory_size: Integer. The size of the vectors to be stored.
+      capacity: Integer. The maximum number of memories to store before it
+          becomes necessary to overwrite old memories.
+      name: String. A name for this Haiku module instance.
+    """
+    super().__init__(name=name)
+    self._memory_size = memory_size
+    self._capacity = capacity
+
+  def __call__(self, inputs, prev_state):
+    """Writes a new memory into the episodic memory.
+
+    Args:
+      inputs: A Tensor of shape ``[batch_size, memory_size]``.
+      prev_state: The previous state of the episodic memory, which is a tuple
+         with a (i) counter of shape ``[batch_size, 1]`` indicating how many
+         memories have been written so far, and (ii) a tensor of shape
+         ``[batch_size, capacity, memory_size]`` with the full content of the
+         episodic memory.
+    Returns:
+      A tuple with (i) a tensor of shape ``[batch_size, capacity, memory_size]``
+          with the full content of the episodic memory, including the newly
+          written memory, and (ii) the new state of the episodic memory.
+    """
+    inputs = jax.lax.stop_gradient(inputs)
+    counter, memories = prev_state
+    counter_mod = jnp.mod(counter, self._capacity)
+    slot_selector = jnp.expand_dims(
+        jax.nn.one_hot(counter_mod, self._capacity), axis=2)
+    memories = memories * (1 - slot_selector) + (
+        slot_selector * jnp.expand_dims(inputs, 1))
+    counter = counter + 1
+    return memories, (counter, memories)
+
+  def initial_state(self, batch_size):
+    """Creates the initial state of the episodic memory.
+
+    Args:
+      batch_size: Integer. The batch size of the episodic memory.
+    Returns:
+      A tuple with (i) a counter of shape ``[batch_size, 1]`` and (ii) a tensor
+          of shape ``[batch_size, capacity, memory_size]`` with the full content
+          of the episodic memory.
+    """
+    if batch_size is None:
+      shape = []
+    else:
+      shape = [batch_size]
+    counter = jnp.zeros(shape)
+    memories = jnp.zeros(shape + [self._capacity, self._memory_size])
+    return (counter, memories)
+
+
+class SyntheticReturnsCoreWrapper(hk.RNNCore):
+  """Synthetic Returns core wrapper."""
+
+  def __init__(self, core, memory_size, capacity, hidden_layers, alpha, beta,
+               loss_func=(lambda x, y: 0.5 * jnp.square(x - y)),
+               apply_core_to_input=False, name="synthetic_returns_wrapper"):
+    """Constructor.
+
+    Args:
+      core: hk.RNNCore. The recurrent core of the agent. E.g. an LSTM.
+      memory_size: Integer. The size of the vectors to be stored in the episodic
+          memory.
+      capacity: Integer. The maximum number of memories to store before it
+          becomes necessary to overwrite old memories.
+      hidden_layers: Tuple or list of integers, indicating the size of the
+          hidden layers of the MLPs used to produce synthetic returns, current
+          state bias, and gate.
+      alpha: The multiplier of the synthetic returns term in the augmented
+          return.
+      beta: The multiplier of the environment returns term in the augmented
+          return.
+      loss_func: A function of two arguments (predictions and targets) to
+          compute the SR loss.
+      apply_core_to_input: Boolean. Whether to apply the core on the inputs. If
+          true, the synthetic returns will be computed from the outputs of the
+          RNN core passed to the constructor. If false, the RNN core will be
+          applied only at the output of this wrapper, and the synthetic returns
+          will be computed from the inputs.
+      name: String. A name for this Haiku module instance.
+    """
+    super().__init__(name=name)
+    self._em = EpisodicMemory(memory_size, capacity)
+    self._capacity = capacity
+    hidden_layers = list(hidden_layers)
+    self._synthetic_return = hk.nets.MLP(hidden_layers + [1])
+    self._bias = hk.nets.MLP(hidden_layers + [1])
+    self._gate = hk.Sequential([
+        hk.nets.MLP(hidden_layers + [1]),
+        jax.nn.sigmoid,
+    ])
+    self._apply_core_to_input = apply_core_to_input
+    self._core = core
+    self._alpha = alpha
+    self._beta = beta
+    self._loss = loss_func
+
+  def initial_state(self, batch_size):
+    return (
+        self._em.initial_state(batch_size),
+        self._core.initial_state(batch_size)
+    )
+
+  def __call__(self, inputs, prev_state):
+    current_input, return_target = inputs
+
+    em_state, core_state = prev_state
+    (counter, memories) = em_state
+
+    if self._apply_core_to_input:
+      current_input, core_state = self._core(current_input, core_state)
+
+    # Synthetic return for the current state
+    synth_return = jnp.squeeze(self._synthetic_return(current_input), -1)
+
+    # Current state bias term
+    bias = self._bias(current_input)
+
+    # Gate computed from current state
+    gate = self._gate(current_input)
+
+    # When counter > capacity, mask will be all ones
+    mask = 1 - jnp.cumsum(jax.nn.one_hot(counter, self._capacity), axis=1)
+    mask = jnp.expand_dims(mask, axis=2)
+
+    # Synthetic returns for each state in memory
+    past_synth_returns = hk.BatchApply(self._synthetic_return)(memories)
+
+    # Sum of synthetic returns from previous states
+    sr_sum = jnp.sum(past_synth_returns * mask, axis=1)
+
+    prediction = jnp.squeeze(sr_sum * gate + bias, -1)
+    sr_loss = self._loss(prediction, return_target)
+
+    augmented_return = jax.lax.stop_gradient(
+        self._alpha * synth_return + self._beta * return_target)
+
+    # Write current state to memory
+    _, em_state = self._em(current_input, em_state)
+
+    if not self._apply_core_to_input:
+      output, core_state = self._core(current_input, core_state)
+    else:
+      output = current_input
+
+    output = SRCoreWrapperOutput(
+        output=output,
+        synthetic_return=synth_return,
+        augmented_return=augmented_return,
+        sr_loss=sr_loss,
+    )
+    return output, (em_state, core_state)