Add image and citation links to RL Unplugged README

PiperOrigin-RevId: 321197757
2026-05-09 21:07:49 +08:00 · 2020-07-14 19:24:28 +01:00
parent 8188882a82
commit bd29e1b710
2 changed files with 22 additions and 13 deletions
@@ -1,4 +1,4 @@
-<img src="./docs/images/rl_unplugged_tasks_v0.png" width="50%">
+<img src="./images/tasks.png" width="50%">

 # RL Unplugged: Benchmarks for Offline Reinforcement Learning

@@ -31,10 +31,10 @@ Data loading code and examples will be available soon.
 ## Atari Dataset

 We are releasing a large and diverse dataset of gameplay following the protocol
-described by Agarwal et al. (2020), which can be used to evaluate several
+described by [Agarwal et al., 2020], which can be used to evaluate several
 discrete offline RL algorithms. The dataset is generated by running an online
 DQN agent and recording transitions from its replay during training with sticky
-actions (Machado et al., 2018). As stated in (Agarwal et al.,2020), for each
+actions [Machado et al., 2018]. As stated in [Agarwal et al., 2020], for each
 game we use data from five runs with 50 million transitions each. States in each
 transition include stacks of four frames to be able to do frame-stacking with
 our baselines. We release datasets for 46 Atari games. For details on how the
@@ -43,23 +43,23 @@ dataset was generated, please refer to the paper.
 ## Deepmind Locomotion Dataset

 These tasks are made up of the corridor locomotion tasks involving the CMU
-Humanoid, for which prior efforts have either used motion capture data (Merel et
-al., 2019a,b) or training from scratch (Song et al., 2020). In addition, the DM
-Locomotion repository contains a set of tasks adapted to be suited to a virtual
-rodent (see Merel et al., 2020). We emphasize that the DM Locomotion tasks
-feature the combination of challenging high-DoF continuous control along with
-perception from rich egocentric observations. For details on how the dataset was
-generated, please refer to the paper.
+Humanoid, for which prior efforts have either used motion capture data [Merel et
+al., 2019a], [Merel et al., 2019b] or training from scratch [Song et al., 2020].
+In addition, the DM Locomotion repository contains a set of tasks adapted to be
+suited to a virtual rodent [Merel et al., 2020]. We emphasize that the DM
+Locomotion tasks feature the combination of challenging high-DoF continuous
+control along with perception from rich egocentric observations. For details on
+how the dataset was generated, please refer to the paper.

 ## Deepmind Control Suite Dataset

-DeepMind Control Suite (Tassa et al., 2018) is a set of control tasks
-implemented in MuJoCo (Todorov et al., 2012). We consider a subset of the tasks
+DeepMind Control Suite [Tassa et al., 2018] is a set of control tasks
+implemented in MuJoCo [Todorov et al., 2012]. We consider a subset of the tasks
 provided in the suite that cover a wide range of difficulties.

 Most of the datasets in this domain are generated using D4PG. For the
 environments Manipulator insert ball and Manipulator insert peg we use V-MPO
-(Song et al., 2020) to generate the data as D4PG is unable to solve these tasks.
+[Song et al., 2020] to generate the data as D4PG is unable to solve these tasks.
 We release datasets for 9 control suite tasks. For details on how the dataset
 was generated, please refer to the paper.

@@ -148,3 +148,12 @@ engines such as <a href="https://g.co/datasetsearch">Google Dataset Search</a>.
  </tr>
 </table>
 </div>
+
+[Agarwal et al., 2020]: https://arxiv.org/abs/1907.04543
+[Machado et al., 2018]: https://arxiv.org/abs/1709.06009
+[Merel et al., 2019a]: https://arxiv.org/abs/1811.09656
+[Merel et al., 2019b]: https://arxiv.org/abs/1811.11711
+[Merel et al., 2020]: https://arxiv.org/abs/1911.09451
+[Song et al., 2020]: https://arxiv.org/abs/1909.12238
+[Tassa et al., 2018]: https://arxiv.org/abs/1801.00690
+[Todorov et al., 2012]: https://homes.cs.washington.edu/~todorov/papers/TodorovIROS12.pdf