mirror of
https://github.com/google-deepmind/deepmind-research.git
synced 2026-05-09 21:07:49 +08:00
Add image and citation links to RL Unplugged README
PiperOrigin-RevId: 321197757
This commit is contained in:
committed by
Saran Tunyasuvunakool
parent
8188882a82
commit
bd29e1b710
+22
-13
@@ -1,4 +1,4 @@
|
||||
<img src="./docs/images/rl_unplugged_tasks_v0.png" width="50%">
|
||||
<img src="./images/tasks.png" width="50%">
|
||||
|
||||
# RL Unplugged: Benchmarks for Offline Reinforcement Learning
|
||||
|
||||
@@ -31,10 +31,10 @@ Data loading code and examples will be available soon.
|
||||
## Atari Dataset
|
||||
|
||||
We are releasing a large and diverse dataset of gameplay following the protocol
|
||||
described by Agarwal et al. (2020), which can be used to evaluate several
|
||||
described by [Agarwal et al., 2020], which can be used to evaluate several
|
||||
discrete offline RL algorithms. The dataset is generated by running an online
|
||||
DQN agent and recording transitions from its replay during training with sticky
|
||||
actions (Machado et al., 2018). As stated in (Agarwal et al.,2020), for each
|
||||
actions [Machado et al., 2018]. As stated in [Agarwal et al., 2020], for each
|
||||
game we use data from five runs with 50 million transitions each. States in each
|
||||
transition include stacks of four frames to be able to do frame-stacking with
|
||||
our baselines. We release datasets for 46 Atari games. For details on how the
|
||||
@@ -43,23 +43,23 @@ dataset was generated, please refer to the paper.
|
||||
## Deepmind Locomotion Dataset
|
||||
|
||||
These tasks are made up of the corridor locomotion tasks involving the CMU
|
||||
Humanoid, for which prior efforts have either used motion capture data (Merel et
|
||||
al., 2019a,b) or training from scratch (Song et al., 2020). In addition, the DM
|
||||
Locomotion repository contains a set of tasks adapted to be suited to a virtual
|
||||
rodent (see Merel et al., 2020). We emphasize that the DM Locomotion tasks
|
||||
feature the combination of challenging high-DoF continuous control along with
|
||||
perception from rich egocentric observations. For details on how the dataset was
|
||||
generated, please refer to the paper.
|
||||
Humanoid, for which prior efforts have either used motion capture data [Merel et
|
||||
al., 2019a], [Merel et al., 2019b] or training from scratch [Song et al., 2020].
|
||||
In addition, the DM Locomotion repository contains a set of tasks adapted to be
|
||||
suited to a virtual rodent [Merel et al., 2020]. We emphasize that the DM
|
||||
Locomotion tasks feature the combination of challenging high-DoF continuous
|
||||
control along with perception from rich egocentric observations. For details on
|
||||
how the dataset was generated, please refer to the paper.
|
||||
|
||||
## Deepmind Control Suite Dataset
|
||||
|
||||
DeepMind Control Suite (Tassa et al., 2018) is a set of control tasks
|
||||
implemented in MuJoCo (Todorov et al., 2012). We consider a subset of the tasks
|
||||
DeepMind Control Suite [Tassa et al., 2018] is a set of control tasks
|
||||
implemented in MuJoCo [Todorov et al., 2012]. We consider a subset of the tasks
|
||||
provided in the suite that cover a wide range of difficulties.
|
||||
|
||||
Most of the datasets in this domain are generated using D4PG. For the
|
||||
environments Manipulator insert ball and Manipulator insert peg we use V-MPO
|
||||
(Song et al., 2020) to generate the data as D4PG is unable to solve these tasks.
|
||||
[Song et al., 2020] to generate the data as D4PG is unable to solve these tasks.
|
||||
We release datasets for 9 control suite tasks. For details on how the dataset
|
||||
was generated, please refer to the paper.
|
||||
|
||||
@@ -148,3 +148,12 @@ engines such as <a href="https://g.co/datasetsearch">Google Dataset Search</a>.
|
||||
</tr>
|
||||
</table>
|
||||
</div>
|
||||
|
||||
[Agarwal et al., 2020]: https://arxiv.org/abs/1907.04543
|
||||
[Machado et al., 2018]: https://arxiv.org/abs/1709.06009
|
||||
[Merel et al., 2019a]: https://arxiv.org/abs/1811.09656
|
||||
[Merel et al., 2019b]: https://arxiv.org/abs/1811.11711
|
||||
[Merel et al., 2020]: https://arxiv.org/abs/1911.09451
|
||||
[Song et al., 2020]: https://arxiv.org/abs/1909.12238
|
||||
[Tassa et al., 2018]: https://arxiv.org/abs/1801.00690
|
||||
[Todorov et al., 2012]: https://homes.cs.washington.edu/~todorov/papers/TodorovIROS12.pdf
|
||||
|
||||
Binary file not shown.
|
After Width: | Height: | Size: 1.1 MiB |
Reference in New Issue
Block a user