@misc{,
    title={Physically Embedded Planning Problems: New Challenges for
           Reinforcement Learning},
    author={Mehdi Mirza, Andrew Jaegle, Jonathan J. Hunt, Arthur Guez,
            Saran Tunyasuvunakool, Alistair Muldal, Théophane Weber,
            Peter Karkus, Sébastien Racanière, Lars Buesing,
            Timothy Lillicrap, Nicolas Heess},
    year={2020},
    eprint={},
    archivePrefix={arXiv},
    primaryClass={cs.RO}
}

Requirements and Installation

This repository is divided into 'mujoban' and 'board_games' folders. Both of them are built on top of dm_control which requires MuJoCo. Please follow these instructions to install MuJoCo. Other dependencies can be installed by:

pip3 install -r requirements.txt

Board games

The game logic is based on open_spiel. Please install as instructed here. gnugo is required to play the game of Go against a non-random opponent. gnugo can be installed in Ubuntu by:

apt install gnugo

Board game scripts expect gnugo binary to be at: /usr/games/gnugo. Users can change this path inside board_games/go_logic.py

This library has only been tested on Ubuntu.

Example usage

The code snippets below show examples of instantiating each of the environments.

Mujoban

from dm_control import composer
from dm_control.locomotion import walkers
from physics_planning_games.mujoban.mujoban import Mujoban
from physics_planning_games.mujoban.mujoban_level import MujobanLevel
from physics_planning_games.mujoban.boxoban import boxoban_level_generator

walker = walkers.JumpingBallWithHead(add_ears=True, camera_height=0.25)
maze = MujobanLevel(boxoban_level_generator)
task = Mujoban(walker=walker,
               maze=maze,
               control_timestep=0.1,
               top_camera_height=96,
               top_camera_width=96)
env = composer.Environment(time_limit=1000, task=task)

Board games

from  physics_planning_games  import  board_games

environment_name = 'go_7x7'
env = board_games.load(environment_name=environment_name)

Stepping through environment.

The returned environments are of type of dm_env.Environment and can be stepped through as shown here with random actions:

import numpy as np

timestep = env.reset()
action_spec = env.action_spec()
while True:
  action = np.stack([
      np.random.uniform(low=minimum, high=maximum)
      for minimum, maximum in zip(action_spec.minimum, action_spec.maximum)
  ])
  timestep = env.step(action)

Visualization

For visualization of the environments explore.py loads them using the viewer from dm_control.

More details

For more details please refer to the tech report, dm_control and dm_env.