mirror of
https://github.com/google-deepmind/deepmind-research.git
synced 2026-05-09 21:07:49 +08:00
Adding DeepMind Lab and bsuite dataset descriptions to README.
PiperOrigin-RevId: 363880165
This commit is contained in:
committed by
Louise Deason
parent
ad49bf36f7
commit
0a46c8eec0
@@ -38,6 +38,11 @@ transition include stacks of four frames to be able to do frame-stacking with
|
||||
our baselines. We release datasets for 46 Atari games. For details on how the
|
||||
dataset was generated, please refer to the paper.
|
||||
|
||||
Atari is a standard RL benchmark. We recommend you to try offline RL methods
|
||||
on Atari if you are interested in comparing your approach to other state of the
|
||||
art offline RL methods with discrete actions.
|
||||
|
||||
|
||||
## DeepMind Locomotion Dataset
|
||||
|
||||
These tasks are made up of the corridor locomotion tasks involving the CMU
|
||||
@@ -49,6 +54,10 @@ Locomotion tasks feature the combination of challenging high-DoF continuous
|
||||
control along with perception from rich egocentric observations. For details on
|
||||
how the dataset was generated, please refer to the paper.
|
||||
|
||||
We recommend you to try offline RL methods on DeepMind Locomotion dataset, if
|
||||
you are interested in very challenging offline RL dataset with continuous
|
||||
action space.
|
||||
|
||||
## DeepMind Control Suite Dataset
|
||||
|
||||
DeepMind Control Suite [Tassa et al., 2018] is a set of control tasks
|
||||
@@ -61,6 +70,11 @@ environments Manipulator insert ball and Manipulator insert peg we use V-MPO
|
||||
We release datasets for 9 control suite tasks. For details on how the dataset
|
||||
was generated, please refer to the paper.
|
||||
|
||||
DeepMind Control Suite is a traditional continuous action RL benchmark. In
|
||||
particular, we recommend you test your approach in DeepMind Control Suite if
|
||||
you are interested in comparing against other state of the art offline RL
|
||||
methods.
|
||||
|
||||
## Realworld RL Dataset
|
||||
|
||||
Examples in the dataset represent SARS transitions stored when running a
|
||||
@@ -71,6 +85,43 @@ We release 8 datasets in total -- with no combined challenge and easy combined
|
||||
challenge on the cartpole, walker, quadruped, and humanoid tasks. For details on
|
||||
how the dataset was generated, please refer to the paper.
|
||||
|
||||
## DeepMind Lab Dataset
|
||||
|
||||
DeepMind Lab dataset has several levels from the challenging, partially
|
||||
observable [Deepmind Lab suite](https://github.com/deepmind/lab). DeepMind Lab
|
||||
dataset is collected by training distributed R2D2 by [Kapturowski et al., 2018]
|
||||
agents from scratch on individual tasks. We recorded the experience across all
|
||||
actors during entire training runs a few times for every task. The details of
|
||||
the dataset generation process is described in [Gulcehre et al., 2021].
|
||||
|
||||
We release datasets for five different DeepMind Lab levels: `seekavoid_arena_01`,
|
||||
`explore_rewards_few`, `explore_rewards_many`, `rooms_watermaze`,
|
||||
`rooms_select_nonmatching_object`. We also release the snapshot datasets for
|
||||
`seekavoid_arena_01` level that we generated the datasets from a trained R2D2
|
||||
snapshot with different levels of epsilons for the epsilon-greedy algorithm
|
||||
when evaluating the agent in the environment.
|
||||
|
||||
DeepMind Lab dataset is fairly large-scale. We recommend you to try it if you
|
||||
are interested in large-scale offline RL models with memory.
|
||||
|
||||
## bsuite Dataset
|
||||
|
||||
[bsuite](https://github.com/deepmind/bsuite) data was collected by training DQN
|
||||
agents with the default setting in [Acme](https://github.com/deepmind/acme) from
|
||||
scratch in each one of the following three tasks: cartpole, catch, and
|
||||
mountain_car.
|
||||
|
||||
We converted the originally deterministic environments into stochastic ones by
|
||||
randomly replacing the agent action with a uniformly sampled action with a
|
||||
probability of {0, 0.1, 0.2, 0.3, 0.4, 0.5}. In this case, probability of 0
|
||||
corresponds to original environment. The details of
|
||||
the dataset generation process is described in [Gulcehre et al., 2021].
|
||||
|
||||
bsuite datasets are fairly light-weight and running experiments doesn't require
|
||||
too much compute. We recommend you to try bsuite, if you are interested in
|
||||
small-scale and easy to run offline RL datasets generated by stochastic
|
||||
environments where the stochasticity of the environment is easy to control.
|
||||
|
||||
## Running the code
|
||||
|
||||
### Installation
|
||||
@@ -178,3 +229,5 @@ engines such as <a href="https://g.co/datasetsearch">Google Dataset Search</a>.
|
||||
[Song et al., 2020]: https://arxiv.org/abs/1909.12238
|
||||
[Tassa et al., 2018]: https://arxiv.org/abs/1801.00690
|
||||
[Todorov et al., 2012]: https://homes.cs.washington.edu/~todorov/papers/TodorovIROS12.pdf
|
||||
[Kapturowski et al., 2018]: https://openreview.net/forum?id=r1lyTjAqYX
|
||||
[Gulcehre et al., 2021]: https://arxiv.org/abs/2103.09575
|
||||
|
||||
Reference in New Issue
Block a user