Adding DeepMind Lab and bsuite dataset descriptions to README.

PiperOrigin-RevId: 363880165
This commit is contained in:
Caglar Gulcehre
2021-03-19 13:45:03 +00:00
committed by Louise Deason
parent ad49bf36f7
commit 0a46c8eec0
+53
View File
@@ -38,6 +38,11 @@ transition include stacks of four frames to be able to do frame-stacking with
our baselines. We release datasets for 46 Atari games. For details on how the
dataset was generated, please refer to the paper.
Atari is a standard RL benchmark. We recommend you to try offline RL methods
on Atari if you are interested in comparing your approach to other state of the
art offline RL methods with discrete actions.
## DeepMind Locomotion Dataset
These tasks are made up of the corridor locomotion tasks involving the CMU
@@ -49,6 +54,10 @@ Locomotion tasks feature the combination of challenging high-DoF continuous
control along with perception from rich egocentric observations. For details on
how the dataset was generated, please refer to the paper.
We recommend you to try offline RL methods on DeepMind Locomotion dataset, if
you are interested in very challenging offline RL dataset with continuous
action space.
## DeepMind Control Suite Dataset
DeepMind Control Suite [Tassa et al., 2018] is a set of control tasks
@@ -61,6 +70,11 @@ environments Manipulator insert ball and Manipulator insert peg we use V-MPO
We release datasets for 9 control suite tasks. For details on how the dataset
was generated, please refer to the paper.
DeepMind Control Suite is a traditional continuous action RL benchmark. In
particular, we recommend you test your approach in DeepMind Control Suite if
you are interested in comparing against other state of the art offline RL
methods.
## Realworld RL Dataset
Examples in the dataset represent SARS transitions stored when running a
@@ -71,6 +85,43 @@ We release 8 datasets in total -- with no combined challenge and easy combined
challenge on the cartpole, walker, quadruped, and humanoid tasks. For details on
how the dataset was generated, please refer to the paper.
## DeepMind Lab Dataset
DeepMind Lab dataset has several levels from the challenging, partially
observable [Deepmind Lab suite](https://github.com/deepmind/lab). DeepMind Lab
dataset is collected by training distributed R2D2 by [Kapturowski et al., 2018]
agents from scratch on individual tasks. We recorded the experience across all
actors during entire training runs a few times for every task. The details of
the dataset generation process is described in [Gulcehre et al., 2021].
We release datasets for five different DeepMind Lab levels: `seekavoid_arena_01`,
`explore_rewards_few`, `explore_rewards_many`, `rooms_watermaze`,
`rooms_select_nonmatching_object`. We also release the snapshot datasets for
`seekavoid_arena_01` level that we generated the datasets from a trained R2D2
snapshot with different levels of epsilons for the epsilon-greedy algorithm
when evaluating the agent in the environment.
DeepMind Lab dataset is fairly large-scale. We recommend you to try it if you
are interested in large-scale offline RL models with memory.
## bsuite Dataset
[bsuite](https://github.com/deepmind/bsuite) data was collected by training DQN
agents with the default setting in [Acme](https://github.com/deepmind/acme) from
scratch in each one of the following three tasks: cartpole, catch, and
mountain_car.
We converted the originally deterministic environments into stochastic ones by
randomly replacing the agent action with a uniformly sampled action with a
probability of {0, 0.1, 0.2, 0.3, 0.4, 0.5}. In this case, probability of 0
corresponds to original environment. The details of
the dataset generation process is described in [Gulcehre et al., 2021].
bsuite datasets are fairly light-weight and running experiments doesn't require
too much compute. We recommend you to try bsuite, if you are interested in
small-scale and easy to run offline RL datasets generated by stochastic
environments where the stochasticity of the environment is easy to control.
## Running the code
### Installation
@@ -178,3 +229,5 @@ engines such as <a href="https://g.co/datasetsearch">Google Dataset Search</a>.
[Song et al., 2020]: https://arxiv.org/abs/1909.12238
[Tassa et al., 2018]: https://arxiv.org/abs/1801.00690
[Todorov et al., 2012]: https://homes.cs.washington.edu/~todorov/papers/TodorovIROS12.pdf
[Kapturowski et al., 2018]: https://openreview.net/forum?id=r1lyTjAqYX
[Gulcehre et al., 2021]: https://arxiv.org/abs/2103.09575