Add image and citation links to RL Unplugged README

PiperOrigin-RevId: 321197757
This commit is contained in:
Sergio Gomez
2020-07-14 19:24:28 +01:00
committed by Saran Tunyasuvunakool
parent 8188882a82
commit bd29e1b710
2 changed files with 22 additions and 13 deletions
+22 -13
View File
@@ -1,4 +1,4 @@
<img src="./docs/images/rl_unplugged_tasks_v0.png" width="50%">
<img src="./images/tasks.png" width="50%">
# RL Unplugged: Benchmarks for Offline Reinforcement Learning
@@ -31,10 +31,10 @@ Data loading code and examples will be available soon.
## Atari Dataset
We are releasing a large and diverse dataset of gameplay following the protocol
described by Agarwal et al. (2020), which can be used to evaluate several
described by [Agarwal et al., 2020], which can be used to evaluate several
discrete offline RL algorithms. The dataset is generated by running an online
DQN agent and recording transitions from its replay during training with sticky
actions (Machado et al., 2018). As stated in (Agarwal et al.,2020), for each
actions [Machado et al., 2018]. As stated in [Agarwal et al., 2020], for each
game we use data from five runs with 50 million transitions each. States in each
transition include stacks of four frames to be able to do frame-stacking with
our baselines. We release datasets for 46 Atari games. For details on how the
@@ -43,23 +43,23 @@ dataset was generated, please refer to the paper.
## Deepmind Locomotion Dataset
These tasks are made up of the corridor locomotion tasks involving the CMU
Humanoid, for which prior efforts have either used motion capture data (Merel et
al., 2019a,b) or training from scratch (Song et al., 2020). In addition, the DM
Locomotion repository contains a set of tasks adapted to be suited to a virtual
rodent (see Merel et al., 2020). We emphasize that the DM Locomotion tasks
feature the combination of challenging high-DoF continuous control along with
perception from rich egocentric observations. For details on how the dataset was
generated, please refer to the paper.
Humanoid, for which prior efforts have either used motion capture data [Merel et
al., 2019a], [Merel et al., 2019b] or training from scratch [Song et al., 2020].
In addition, the DM Locomotion repository contains a set of tasks adapted to be
suited to a virtual rodent [Merel et al., 2020]. We emphasize that the DM
Locomotion tasks feature the combination of challenging high-DoF continuous
control along with perception from rich egocentric observations. For details on
how the dataset was generated, please refer to the paper.
## Deepmind Control Suite Dataset
DeepMind Control Suite (Tassa et al., 2018) is a set of control tasks
implemented in MuJoCo (Todorov et al., 2012). We consider a subset of the tasks
DeepMind Control Suite [Tassa et al., 2018] is a set of control tasks
implemented in MuJoCo [Todorov et al., 2012]. We consider a subset of the tasks
provided in the suite that cover a wide range of difficulties.
Most of the datasets in this domain are generated using D4PG. For the
environments Manipulator insert ball and Manipulator insert peg we use V-MPO
(Song et al., 2020) to generate the data as D4PG is unable to solve these tasks.
[Song et al., 2020] to generate the data as D4PG is unable to solve these tasks.
We release datasets for 9 control suite tasks. For details on how the dataset
was generated, please refer to the paper.
@@ -148,3 +148,12 @@ engines such as <a href="https://g.co/datasetsearch">Google Dataset Search</a>.
</tr>
</table>
</div>
[Agarwal et al., 2020]: https://arxiv.org/abs/1907.04543
[Machado et al., 2018]: https://arxiv.org/abs/1709.06009
[Merel et al., 2019a]: https://arxiv.org/abs/1811.09656
[Merel et al., 2019b]: https://arxiv.org/abs/1811.11711
[Merel et al., 2020]: https://arxiv.org/abs/1911.09451
[Song et al., 2020]: https://arxiv.org/abs/1909.12238
[Tassa et al., 2018]: https://arxiv.org/abs/1801.00690
[Todorov et al., 2012]: https://homes.cs.washington.edu/~todorov/papers/TodorovIROS12.pdf
Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 MiB