Update citation and add demo results for no-TVT with gamma<1.

PiperOrigin-RevId: 281522361
This commit is contained in:
Josh Abramson
2019-11-20 16:07:58 +00:00
committed by Diego de Las Casas
parent 5c9f992652
commit 94505a89e6
37 changed files with 5447 additions and 0 deletions
+66
View File
@@ -0,0 +1,66 @@
# DM Lab Tasks
## General Structure
There are 7 [DM Lab](https://github.com/deepmind/lab) tasks presented here.
Each level is composed of 3 distinct phases (except `Key To Door To Match`
which has 5 phases). The first phase is the 'explore' phase, where the agent
should learn a piece of information or do something. For all tasks, the 2nd
phase is the 'distractor' phase, where the agent collects apples for rewards.
The 3rd phase is the 'exploit' phase, where the agent gets rewards based on the
knowledge acquired or actions performed in phase 1.
## Specific Tasks
### Passive Visual Match
* Phase 1: A colour square right in front of the agent.
* Phase 2: Apples collection.
* Phase 3: Choose the colour square matched that in Phase 1 among 4 options.
### Active Visual Match
* Phase 1: A colour square randomly placed in a two-connected room.
* Phase 2: Apples collection.
* Phase 3: Choose the colour square matched that in Phase 1 among 4 options.
### Key To Door
* Phase 1: A key randomly placed in a two-connected room.
* Phase 2: Apples collection.
* Phase 3: A small room with a door. If agent has key, it can open the door to
get to the goal behind the door to get reward.
### Key To Door Bluekey
All the same as key_to_door above but the key has a blue colour instead of
black.
### Two Negative Keys
* Phase 1: A blue and a red key placed in a small room. The agent can only
pick up one of the key.
* Phase 2: Apples collection.
* Phase 3: A small room with a door. If agent has either key, it can open the
door to get reward. The reward depends on which key it got in Phase 1
All the rewards are negative in this level.
### Latent Information Acquisition
* Phase 1: Thre randomly sampled objects are randomly placed in a small room.
When the agent touch each object, a red or green cue will appear,
indicating the reward it is associated in this episode. No rewards
are given in this phase.
* Phase 2: Apples collection.
* Phase 3: The same three objects in Phase 1 randomly placed again in the room.
The agent will get positive rewards if pick up the objects with green
cues in Phase 1, and get negative rewards for objects with red cues.
### Key To Door To Match
* Phase 1: A key is randomly placed in a room. Agent could pick it up.
* Phase 2: Apples collection.
* Phase 3: A colour square behind a door. If agent has key from Phase 1, it can
open the door to see the colour.
* Phase 4: Apples collection.
* Phase 5: Chose the colour square matched that in Phase 3 among 4 options.
+24
View File
@@ -0,0 +1,24 @@
-- Copyright 2019 DeepMind Technologies Limited. All Rights Reserved.
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
-- http://www.apache.org/licenses/LICENSE-2.0
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
-- ============================================================================
local factory = require 'visual_match_factory'
return factory.createLevelApi{
exploreMapMode = 'TWO_ROOMS',
episodeLengthSeconds = 40,
exploreLengthSeconds = 5,
distractorLengthSeconds = 30,
differentDistractRoomTexture = true,
differentRewardRoomTexture = true,
correctReward = 10,
incorrectReward = 1,
}
+42
View File
@@ -0,0 +1,42 @@
-- Copyright 2019 DeepMind Technologies Limited. All Rights Reserved.
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
-- http://www.apache.org/licenses/LICENSE-2.0
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
-- ============================================================================
local tensor = require 'dmlab.system.tensor'
local utils = {}
utils.COLORS = {
{0, 0, 0},
{0, 0, 170},
{0, 170, 0},
{0, 170, 170},
{170, 0, 0},
{170, 0, 170},
{170, 85, 0},
{170, 170, 170},
{85, 85, 85},
{85, 85, 255},
{85, 255, 85},
{85, 255, 255},
{255, 85, 85},
{255, 85, 255},
{255, 255, 85},
{255, 255, 255},
}
function utils:createByteImage(h, w, rgb)
return tensor.ByteTensor(h, w, 4):fill{rgb[1], rgb[2], rgb[3], 255}
end
function utils:createTransparentImage(h, w)
return tensor.ByteTensor(h, w, 4):fill{127, 127, 127, 0}
end
return utils
+20
View File
@@ -0,0 +1,20 @@
-- Copyright 2019 DeepMind Technologies Limited. All Rights Reserved.
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
-- http://www.apache.org/licenses/LICENSE-2.0
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
-- ============================================================================
local factory = require 'key_to_door_factory'
return factory.createLevelApi{
episodeLengthSeconds = 37,
exploreLengthSeconds = 5,
distractorLengthSeconds = 30,
differentDistractRoomTexture = true,
differentRewardRoomTexture = true,
}
+22
View File
@@ -0,0 +1,22 @@
-- Copyright 2019 DeepMind Technologies Limited. All Rights Reserved.
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
-- http://www.apache.org/licenses/LICENSE-2.0
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
-- ============================================================================
local factory = require 'key_to_door_factory'
return factory.createLevelApi{
keyColor = {0, 0, 255},
episodeLengthSeconds = 37,
exploreLengthSeconds = 5,
distractorLengthSeconds = 30,
differentDistractRoomTexture = true,
differentRewardRoomTexture = true,
}
+459
View File
@@ -0,0 +1,459 @@
-- Copyright 2019 DeepMind Technologies Limited. All Rights Reserved.
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
-- http://www.apache.org/licenses/LICENSE-2.0
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
-- ============================================================================
local make_map = require 'common.make_map'
local custom_observations = require 'decorators.custom_observations'
local debug_observations = require 'decorators.debug_observations'
local game = require 'dmlab.system.game'
local map_maker = require 'dmlab.system.map_maker'
local maze_generation = require 'dmlab.system.maze_generation'
local pickup_decorator = require 'decorators.human_recognisable_pickups'
local random = require 'common.random'
local setting_overrides = require 'decorators.setting_overrides'
local texture_sets = require 'themes.texture_sets'
local themes = require 'themes.themes'
local hrp = require 'common.human_recognisable_pickups'
local DEFAULTS = {
EPISODE_LENGTH_SECONDS = 15,
EXPLORE_LENGTH_SECONDS = 5,
DISTRACTOR_LENGTH_SECONDS = 5,
REWARD_LENGTH_SECONDS = nil,
SHOW_KEY_COLOR_SQUARE_SECONDS = 1,
PROB_APPLE_IN_DISTRACTOR_MAP = 0.3,
APPLE_REWARD = 5,
APPLE_REWARD_PROB = 1.0,
APPLE_EXTRA_REWARD_RANGE = 0,
GOAL_REWARD = 10,
DISTRACTOR_ROOM_SIZE = {11, 11},
DIFFERENT_DISTRACT_ROOM_TEXTURE = false,
DIFFERENT_REWARD_ROOM_TEXTURE = false,
KEY_COLOR = {0, 0, 0},
}
local APPLE_ID = 998
local GOAL_ID = 999
local KEY_SPAWN_ID = 1000
local DOOR_ID = 1001
local KEY_CUE_RECTANGLE_WIDTH = 600
local KEY_CUE_RECTANGLE_HEIGHT = 200
-- Table that maps from full decal name to decal index number.
local decalIndices = {}
local EXPLORE_MAP = "exploreMap"
local DISTRACTOR_MAP = "distractorMap"
local REWARD_MAP = "rewardMap"
-- Set texture set for all maps.
local textureSet = texture_sets.PACMAN
local secondTextureSet = texture_sets.TETRIS
local thirdTextureSet = texture_sets.TRON
local REWARD_ROOM =[[
***
*P*
*H*
*G*
***
]]
local OPEN_TWO_ROOM = [[
*********
*********
*PKK*KKK*
*KKKKKKK*
*KKK*KKK*
*********
]]
local N_KEY_POS_IN_TWO_ROOM = 18 -- # of K in OPEN_TWO_ROOM
local function createDistractorMaze(opts)
-- Example room with height = 2, width = 3
-- A are possible apple locations (everywhere)
-- *****
-- *APA*
-- *AAA*
-- *****
local roomHeight = opts.roomSize[1]
local roomWidth = opts.roomSize[2]
centerWidth = 1 + math.ceil(roomWidth / 2)
local maze = maze_generation:mazeGeneration{
height = roomHeight + 2, -- +2 for the two side of walls
width = roomWidth + 2
}
-- Fill the room with 'A' for apples. updateSpawnVars decides where to put.
for i = 2, roomHeight + 1 do
for j = 2, roomWidth + 1 do
maze:setEntityCell(i, j, 'A')
end
end
-- Override one cell with 'P' for spawn point.
maze:setEntityCell(2, centerWidth, 'P')
return maze
end
local function numPossibleAppleLocations(distractorRoomSize)
return distractorRoomSize[1] * distractorRoomSize[2] - 1
end
local factory = {}
game:console('cg_drawScriptRectanglesAlways 1')
function factory.createLevelApi(kwargs)
kwargs.episodeLengthSeconds = kwargs.episodeLengthSeconds or
DEFAULTS.EPISODE_LENGTH_SECONDS
kwargs.exploreLengthSeconds = kwargs.exploreLengthSeconds or
DEFAULTS.EXPLORE_LENGTH_SECONDS
kwargs.rewardLengthSeconds = kwargs.rewardLengthSeconds or
DEFAULTS.REWARD_LENGTH_SECONDS
kwargs.distractorLengthSeconds = kwargs.distractorLengthSeconds or
DEFAULTS.DISTRACTOR_LENGTH_SECONDS
kwargs.distractorRoomSize = kwargs.distractorRoomSize or
DEFAULTS.DISTRACTOR_ROOM_SIZE
kwargs.appleReward = kwargs.appleReward or DEFAULTS.APPLE_REWARD
kwargs.appleRewardProb = kwargs.appleRewardProb or DEFAULTS.APPLE_REWARD_PROB
kwargs.probAppleInDistractorMap = kwargs.probAppleInDistractorMap or
DEFAULTS.PROB_APPLE_IN_DISTRACTOR_MAP
kwargs.appleExtraRewardRange =
kwargs.appleExtraRewardRange or DEFAULTS.APPLE_EXTRA_REWARD_RANGE
kwargs.differentDistractRoomTexture = kwargs.differentDistractRoomTexture or
DEFAULTS.DIFFERENT_DISTRACT_ROOM_TEXTURE
kwargs.differentRewardRoomTexture = kwargs.differentRewardRoomTexture or
DEFAULTS.DIFFERENT_REWARD_ROOM_TEXTURE
kwargs.showKeyColorSquareSeconds = kwargs.showKeyColorSquareSeconds or
DEFAULTS.SHOW_KEY_COLOR_SQUARE_SECONDS
kwargs.goalReward = kwargs.goalReward or DEFAULTS.GOAL_REWARD
kwargs.keyColor = kwargs.keyColor or DEFAULTS.KEY_COLOR
local api = {}
function api:init(params)
self:_createExploreMap()
self:_createDistractorMap()
self:_createRewardMap()
local keyInfo = {
shape='key',
pattern='solid',
color1 = kwargs.keyColor,
color2 = kwargs.keyColor
}
self._keyObject = hrp.create(keyInfo)
self._keyCueRgba = {
kwargs.keyColor[1]/255,
kwargs.keyColor[2]/255,
kwargs.keyColor[3]/255,
1
}
end
function api:_createRewardMap()
self._rewardMap = map_maker:mapFromTextLevel{
mapName = REWARD_MAP,
entityLayer = REWARD_ROOM,
}
-- Create map theme and override default wall decal placement.
local texture = textureSet
if kwargs.differentRewardRoomTexture then
texture = thirdTextureSet
end
local rewardMapTheme = themes.fromTextureSet{
textureSet = texture,
decalFrequency = 0.0,
floorModelFrequency = 0.0,
}
self._rewardMap = map_maker:mapFromTextLevel{
mapName = REWARD_MAP,
entityLayer = REWARD_ROOM,
theme = rewardMapTheme,
callback = function (i, j, c, maker)
local pickup = self:_makePickup(c)
if pickup then
return maker:makeEntity{i = i, j = j, classname = pickup}
end
end
}
end
function api:_createExploreMap()
exploreMapInfo = {map = OPEN_TWO_ROOM}
-- Create map theme and override default wall decal placement.
local exploreMapTheme = themes.fromTextureSet{
textureSet = textureSet,
decalFrequency = 0.0,
floorModelFrequency = 0.0,
}
self._exploreMap = map_maker:mapFromTextLevel{
mapName = EXPLORE_MAP,
entityLayer = exploreMapInfo.map,
theme = exploreMapTheme,
callback = function (i, j, c, maker)
local pickup = self:_makePickup(c)
if pickup then
return maker:makeEntity{i = i, j = j, classname = pickup}
end
end
}
end
function api:_createDistractorMap()
-- Create maze to be converted into map.
local maze = createDistractorMaze{roomSize = kwargs.distractorRoomSize}
-- Create map theme with no wall decals.
local texture = textureSet
if kwargs.differentDistractRoomTexture then
texture = secondTextureSet
end
local distractorMapTheme = themes.fromTextureSet{
textureSet = texture,
decalFrequency = 0.0,
floorModelFrequency = 0.0,
}
self._distractorMap = map_maker:mapFromTextLevel{
mapName = DISTRACTOR_MAP,
entityLayer = maze:entityLayer(),
theme = distractorMapTheme,
callback = function (i, j, c, maker)
local pickup = self:_makePickup(c)
if pickup then
return maker:makeEntity{i = i, j = j, classname = pickup}
end
end
}
end
function api:start(episode, seed)
random:seed(seed)
self._map = nil
self._time = 0
self._holdingKey = false
self._keyPosCount = 0
self._collectedGoal = false
if kwargs.distractorLengthSecondsRange then
self._distractorLen = random:uniformReal(
kwargs.distractorLengthSecondsRange[1],
kwargs.distractorLengthSecondsRange[2])
else
self._distractorLen = kwargs.distractorLengthSeconds
end
-- Sample the key position in phase 1.
self._keyPosition = random:uniformInt(1, N_KEY_POS_IN_TWO_ROOM)
-- Default instruction channel to 0 (indicating the rewards in final phase.)
self.setInstruction(tostring(0))
end
function api:filledRectangles(args)
if self._showKeyCue then
return {{
x = 12,
y = 12,
width = KEY_CUE_RECTANGLE_WIDTH,
height = KEY_CUE_RECTANGLE_HEIGHT,
rgba = self._keyCueRgba
}}
end
return {}
end
function api:nextMap()
-- 1. Decide what is the next map.
if self._map == nil then
self._map = EXPLORE_MAP
elseif self._map == DISTRACTOR_MAP then
self._map = REWARD_MAP
elseif self._map == EXPLORE_MAP then
if self._distractorLen > 0.0 then
self._map = DISTRACTOR_MAP
else
self._map = REWARD_MAP
end
elseif self._map == REWARD_MAP then
-- Stay in distractor map till end of episode.
self._map = DISTRACTOR_MAP
self._collectedGoal = true
end
-- 2. Set up timeout for the up-coming map.
if self._map == DISTRACTOR_MAP and self._collectedGoal then
if not self._timeOut then -- don't override any existing timeout
self._timeOut = self._time + 0.1
end
elseif self._map == EXPLORE_MAP then
self._timeOut = self._time + kwargs.exploreLengthSeconds
elseif self._map == DISTRACTOR_MAP then
self._timeOut = self._time + self._distractorLen
elseif self._map == REWARD_MAP then
if kwargs.rewardLengthSeconds then
self._timeOut = self._time + kwargs.rewardLengthSeconds
else
self._timeOut = nil
end
end
return self._map
end
-- PICKUP functions ----------------------------------------------------------
function api:_makePickup(c)
if c == 'K' then
return 'key'
end
if c == 'G' then
return 'goal'
end
if c == 'A' then
return 'apple_reward'
end
end
function api:pickup(spawnId)
if spawnId == GOAL_ID then
local goalReward = kwargs.goalReward
game:addScore(goalReward - 10) -- Offset the default +10 for goal.
self.setInstruction(tostring(goalReward))
game:finishMap()
end
if spawnId == KEY_SPAWN_ID then
self._holdingKey = true
self._holdingKeyTime = self._time -- When the avatar got the key.
self._showKeyCue = true
end
if spawnId == APPLE_ID then
if kwargs.appleRewardProb >= 1 or
random:uniformReal(0, 1) < kwargs.appleRewardProb then
-- The -1 is to offset the default 1 point for apple in dmlab
appleReward = kwargs.appleReward +
random:uniformInt(0, kwargs.appleExtraRewardRange) - 1
game:addScore(appleReward)
else
-- The -1 is to offset the default 1 point for apple in dmlab
game:addScore(-1)
end
end
end
-- TRIGGER functions ---------------------------------------------------------
function api:canTrigger(teleportId, targetName)
if string.sub(targetName, 1, 4) == 'door' then
if self._holdingKey then
return true
else
return false
end
end
return true
end
function api:trigger(teleportId, targetName)
if string.sub(targetName, 1, 4) == 'door' then
-- When door opend, stop showing key cue, and set holding key to false.
self._showKeyCue = false
self._holdingKey = false
return
end
end
function api:hasEpisodeFinished(timeSeconds)
self._time = timeSeconds
if self._map == REWARD_MAP or self._collectedGoal then
return self._timeOut and timeSeconds > self._timeOut
end
-- Control the timing of showing key cue.
if self._holdingKey then
local showTime = self._time - self._holdingKeyTime
if showTime > kwargs.showKeyColorSquareSeconds then
self._showKeyCue = false
end
end
if self._map == EXPLORE_MAP or self._map == DISTRACTOR_MAP then
if timeSeconds > self._timeOut then
game:finishMap()
end
return false
end
end
-- END TRIGGER functions -----------------------------------------------------
function api:updateSpawnVars(spawnVars)
local classname = spawnVars.classname
if classname == "info_player_start" then
-- Spawn facing South.
spawnVars.angle = "-90"
spawnVars.randomAngleRange = "0"
elseif classname == "func_door" then
spawnVars.id = tostring(DOOR_ID)
spawnVars.wait = "1000000" -- Open the door for long time.
elseif classname == "goal" then
spawnVars.id = tostring(GOAL_ID)
elseif classname == "apple_reward" then
-- We respawn the avatar to distractor room after reaching goal
-- there will be no more apples in this case.
if self._collectedGoal == true then
return nil
end
local useApple = false
if kwargs.probAppleInDistractorMap > 0 then
useApple = random:uniformReal(0, 1) < kwargs.probAppleInDistractorMap
end
if useApple then
spawnVars.id = tostring(APPLE_ID)
else
return nil
end
elseif classname == "key" then
self._keyPosCount = self._keyPosCount + 1
if self._keyPosition == self._keyPosCount then
spawnVars.id = tostring(KEY_SPAWN_ID)
spawnVars.classname = self._keyObject
else
return nil
end
end
return spawnVars
end
custom_observations.decorate(api)
pickup_decorator.decorate(api)
setting_overrides.decorate{
api = api,
apiParams = kwargs,
decorateWithTimeout = true
}
return api
end
return factory
+28
View File
@@ -0,0 +1,28 @@
-- Copyright 2019 DeepMind Technologies Limited. All Rights Reserved.
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
-- http://www.apache.org/licenses/LICENSE-2.0
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
-- ============================================================================
local factory = require 'visual_match_factory'
return factory.createLevelApi{
exploreMapMode = 'KEY_TO_COLOR',
episodeLengthSeconds = 45,
secondOrderExploreLengthSeconds = 5,
preExploreDistractorLengthSeconds = 15,
exploreLengthSeconds = 5,
distractorLengthSeconds = 15,
differentDistractRoomTexture = true,
differentRewardRoomTexture = true,
differentSecondOrderRoomTexture = true,
secondOrderExploreRoomSize = {4, 4},
correctReward = 10,
incorrectReward = 1,
}
@@ -0,0 +1,23 @@
-- Copyright 2019 DeepMind Technologies Limited. All Rights Reserved.
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
-- http://www.apache.org/licenses/LICENSE-2.0
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
-- ============================================================================
local factory = require 'latent_information_acquisition_factory'
return factory.createLevelApi{
episodeLengthSeconds = 40,
exploreLengthSeconds = 5,
distractorLengthSeconds = 30,
numObjects = 3,
probGoodObject = 0.5,
correctReward = 20,
incorrectReward = -10,
differentDistractRoomTexture = true,
}
@@ -0,0 +1,418 @@
-- Copyright 2019 DeepMind Technologies Limited. All Rights Reserved.
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
-- http://www.apache.org/licenses/LICENSE-2.0
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
-- ============================================================================
local make_map = require 'common.make_map'
local custom_decals = require 'decorators.custom_decals_decoration'
local custom_entities = require 'common.custom_entities'
local custom_observations = require 'decorators.custom_observations'
local datasets_selector = require 'datasets.selector'
local game = require 'dmlab.system.game'
local maze_generation = require 'dmlab.system.maze_generation'
local pickup_decorator = require 'decorators.human_recognisable_pickups'
local random = require 'common.random'
local setting_overrides = require 'decorators.setting_overrides'
local texture_sets = require 'themes.texture_sets'
local themes = require 'themes.themes'
local hrp = require 'common.human_recognisable_pickups'
local SHOW_COLOR_CUE_SECOND = 0.25
local EPISODE_LENGTH_SECONDS = 30
local EXPLORE_LENGTH_SECONDS = 10
local DISTRACTOR_LENGTH_SECONDS = 10
local NUM_OBJECTS = 3
local PROB_GOOD_OBJECT = 0.5
local GAURANTEE_GOOD_OBJECTS = 0
local GAURANTEE_BAD_OBJECTS = 0
local PROB_APPLE_IN_DISTRACTOR_MAP = 0.3
local APPLE_REWARD = 5
local APPLE_EXTRA_REWARD_RANGE = 0
local DISTRACTOR_ROOM_SIZE = {11, 11}
local APPLE_ID = 1000
local CORRECT_REWARD = 2
local INCORRECT_REWARD = -1
local ROOM_SIZE = {3, 5}
local OBJECT_SCALE = 1.62
local EXPLORE_MAP = "exploreMap"
local DISTRACTOR_MAP = "distractorMap"
local EXPLOIT_MAP = "exploitMap"
local DIFFERENT_DISTRACT_ROOM_TEXTURE = false
-- Set texture set for all maps.
local textureSet = texture_sets.TRON
local secondTextureSet = texture_sets.TETRIS
-- Takes goal/location:i -> i
local function nameToLocationId(name)
return tonumber(name:match('^.+:(%d+)$'))
end
-- Takes goal/location:i -> goal/pickup
local function nameToLocationClass(name)
return name:match('^(.+):%d+$')
end
local factory = {}
game:console('cg_drawScriptRectanglesAlways 1')
function factory.createLevelApi(kwargs)
kwargs.episodeLengthSeconds = kwargs.episodeLengthSeconds or
EPISODE_LENGTH_SECONDS
kwargs.exploreLengthSeconds = kwargs.exploreLengthSeconds or
EXPLORE_LENGTH_SECONDS
if kwargs.distractorLengthSeconds == 0 then
kwargs.skipDistractor = true
else
kwargs.distractorLengthSeconds = kwargs.distractorLengthSeconds or
DISTRACTOR_LENGTH_SECONDS
end
kwargs.numObjects = kwargs.numObjects or NUM_OBJECTS
kwargs.probGoodObject = kwargs.probGoodObject or PROB_GOOD_OBJECT
kwargs.guaranteeGoodObjects = kwargs.guaranteeGoodObjects or
GAURANTEE_GOOD_OBJECTS
kwargs.guaranteeBadObjects = kwargs.guaranteeBadObjects or
GAURANTEE_BAD_OBJECTS
kwargs.correctReward = kwargs.correctReward or CORRECT_REWARD
kwargs.incorrectReward = kwargs.incorrectReward or INCORRECT_REWARD
kwargs.roomSize = kwargs.roomSize or ROOM_SIZE
kwargs.distractorRoomSize = kwargs.distractorRoomSize or DISTRACTOR_ROOM_SIZE
kwargs.probAppleInDistractorMap = kwargs.probAppleInDistractorMap or
PROB_APPLE_IN_DISTRACTOR_MAP
kwargs.differentDistractRoomTexture = kwargs.differentDistractRoomTexture or
DIFFERENT_DISTRACT_ROOM_TEXTURE
kwargs.appleReward = kwargs.appleReward or APPLE_REWARD
kwargs.appleExtraRewardRange = kwargs.appleExtraRewardRange or
APPLE_EXTRA_REWARD_RANGE
kwargs.objectScale = kwargs.objectScale or OBJECT_SCALE
local api = {}
function api:init(params)
self:_createExploreMap()
self:_createDistractorMap()
self:_createExploitMap()
end
function api:pickup(spawnId)
if self._map == EXPLORE_MAP then
-- Setup to show color cue.
self._showObjectCue = true
self._cueColor = self._objects[spawnId].cueColor
self._cueStartTime = self._time
elseif self._map == EXPLOIT_MAP then
-- Give corresponding reward and termiante when all good objects collected
game:addScore(self._objects[spawnId].reward)
-- Update the instruction channel (to record final phase rewards.)
self._finalRewardMainTask = (
self._finalRewardMainTask + self._objects[spawnId].reward)
self.setInstruction(tostring(self._finalRewardMainTask))
end
if spawnId == APPLE_ID then
-- note the -1 to offset default 1 point for apple in dmlab
appleReward = kwargs.appleReward +
random:uniformInt(0, kwargs.appleExtraRewardRange) - 1
game:addScore(appleReward)
end
end
function api:_createRoomCommon()
local roomHeight = kwargs.roomSize[1]
local roomWidth = kwargs.roomSize[2]
local maze = maze_generation:mazeGeneration{
height = roomHeight + 2,
width = roomWidth + 2
}
-- Set (2,2) as 'P' for the avatar location.
-- Set (i,j) as 'O' for possible object location if i%2 == 0 && j%2 == 0.
-- Otherwise, fill with '.' for empty location.
self._numLocations = 0
for i = 2, roomHeight + 1 do
for j = 2, roomWidth + 1 do
if i == 2 and j == 2 then
maze:setEntityCell(i, j, 'P')
elseif i % 2 == 0 and j % 2 == 0 then
maze:setEntityCell(i, j, 'O')
self._numLocations = self._numLocations + 1
else
maze:setEntityCell(i, j, '.')
end
end
end
return maze
end
function api:_createExploreMap()
maze = self:_createRoomCommon()
print('Generated explore maze with entity layer:')
print(maze:entityLayer())
io.flush()
local mapTheme = themes.fromTextureSet{
textureSet = textureSet,
decalFrequency = 0.0,
}
local counter = 1
self._exploreMap = make_map.makeMap{
mapName = EXPLORE_MAP,
mapEntityLayer = maze:entityLayer(),
theme = mapTheme,
callback = function (i, j, c, maker)
if c == 'O' then
pickup = 'location:' .. counter
counter = counter + 1
return maker:makeEntity{i = i, j = j, classname = pickup}
end
end
}
end
function api:_createDistractorMap()
-- Create map theme with no wall decals.
local distractorMapTheme = themes.fromTextureSet{
textureSet = textureSet,
decalFrequency = 0.0,
}
-- Example room with height = 2, width = 3
-- *****
-- *APA*
-- *AAA*
-- *****
local roomHeight = kwargs.distractorRoomSize[1]
local roomWidth = kwargs.distractorRoomSize[2]
centerWidth = 1 + math.ceil(roomWidth / 2)
local maze = maze_generation:mazeGeneration{
height = roomHeight + 2,
width = roomWidth + 2
}
-- Fill the room with 'A' for apples. updateSpawnVars decides which to use.
for i = 2, roomHeight + 1 do
for j = 2, roomWidth + 1 do
maze:setEntityCell(i, j, 'A')
end
end
-- Override one cell with 'P' for spawn point.
maze:setEntityCell(2, centerWidth, 'P')
print('Generated distractor maze with entity layer:')
print(maze:entityLayer())
io.flush()
local texture = textureSet
if kwargs.differentDistractRoomTexture then
texture = secondTextureSet
end
local mapTheme = themes.fromTextureSet{
textureSet = texture,
decalFrequency = 0.0,
}
self._distractMap = make_map.makeMap{
mapName = DISTRACTOR_MAP,
mapEntityLayer = maze:entityLayer(),
theme = mapTheme,
}
end
function api:_createExploitMap()
maze = self:_createRoomCommon()
print('Generated exploit maze with entity layer:')
print(maze:entityLayer())
io.flush()
local mapTheme = themes.fromTextureSet{
textureSet = textureSet,
decalFrequency = 0.0,
}
local counter = 1
self.exploitMap = make_map.makeMap{
mapName = EXPLOIT_MAP,
mapEntityLayer = maze:entityLayer(),
theme = mapTheme,
useSkybox = false,
callback = function (i, j, c, maker)
if c == 'O' then
pickup = 'location:' .. counter
counter = counter + 1
return maker:makeEntity{i = i, j = j, classname = pickup}
end
end
}
end
function api:_generateRandomObjects()
-- 1. Generate a random list of positive/negative reward, `objectValence`
-- as function(numObjects, guaranteeGood, guaranteeBad, probGoodObject)
local objectValence = {}
for i = 1, kwargs.numObjects do
if i <= kwargs.guaranteeGoodObjects then
objectValence[i] = 1
elseif i<= kwargs.guaranteeGoodObjects + kwargs.guaranteeBadObjects then
objectValence[i] = -1
else
if random:uniformReal(0, 1) < kwargs.probGoodObject then
objectValence[i] = 1
else
objectValence[i] = -1
end
end
end
random:shuffleInPlace(objectValence)
-- 2. Generate random objects and link to the object valence above.
local objects = hrp.uniquelyShapedPickups(kwargs.numObjects)
for i = 1, kwargs.numObjects do
objects[i].scale= kwargs.objectScale
end
self._objects = {}
for i, object in ipairs(objects) do
self._objects[i] = {}
self._objects[i].data = hrp.create(object)
if objectValence[i] == 1 then
self._objects[i].isGoodObject = true
self._objects[i].reward = kwargs.correctReward
self._objects[i].cueColor = {0, 1, 0, 1} -- green means good
else
self._objects[i].isGoodObject = false
self._objects[i].reward = kwargs.incorrectReward
self._objects[i].cueColor = {1, 0, 0, 1} -- red means bad
end
end
end
function api:start(episode, seed)
random:seed(seed)
-- Setup a random mapping from locationId to pickupId
-- There should be more locationId than pickupId
-- The location set with pickupId == 0 will have no object presented there.
self._mapLocationIdToPickupId = {}
for i = 1, self._numLocations do
if i <= kwargs.numObjects then
self._mapLocationIdToPickupId[i] = i
else
self._mapLocationIdToPickupId[i] = 0
end
end
random:shuffleInPlace(self._mapLocationIdToPickupId)
self:_generateRandomObjects()
self._map = nil
self._numTrials = 0
self._timeOut = kwargs.exploreLengthSeconds
-- Set the instruction channel to record the rewards in the final phase.
self._finalRewardMainTask = 0
self.setInstruction("0")
end
function api:nextMap()
if self._map == nil then -- Start of episode.
self._map = EXPLORE_MAP
elseif not kwargs.skipDistractor and self._map == EXPLORE_MAP then
-- Move from explore to distractor.
self._map = DISTRACTOR_MAP
self._timeOut = self._time + kwargs.distractorLengthSeconds
elseif (kwargs.skipDistractor and self._map == EXPLORE_MAP)
or self._map == DISTRACTOR_MAP then
-- Move from distractor or explore map to exploit map.
self._map = EXPLOIT_MAP
random:shuffleInPlace(self._mapLocationIdToPickupId)
self._timeOut = nil
end
return self._map
end
function api:hasEpisodeFinished(timeSeconds)
self._time = timeSeconds
if self._showObjectCue then
if self._time - self._cueStartTime > SHOW_COLOR_CUE_SECOND then
self._showObjectCue = false
end
end
if self._map == EXPLORE_MAP or self._map == DISTRACTOR_MAP then
if timeSeconds > self._timeOut then
game:finishMap()
end
return false
end
end
-- END TRIGGER functions -----------------------------------------------------
function api:filledRectangles(args)
if self._map == EXPLORE_MAP and self._showObjectCue then
return {{
x = 12,
y = 12,
width = 600,
height = 300,
rgba = self._cueColor,
}}
end
return {}
end
function api:updateSpawnVars(spawnVars)
local classname = spawnVars.classname
if classname == "info_player_start" then
-- Spawn facing South.
spawnVars.angle = "-90"
spawnVars.randomAngleRange = "0"
elseif classname == "apple_reward" then
local useApple = false
if kwargs.probAppleInDistractorMap > 0 then
useApple = random:uniformReal(0, 1) < kwargs.probAppleInDistractorMap
spawnVars.id = tostring(APPLE_ID)
end
if not useApple then
return nil
end
else
-- Allocate objects onto the map by mapLocationIdToPickupId.
local locationClass = nameToLocationClass(classname)
if locationClass then
local locationId = nameToLocationId(classname)
id = self._mapLocationIdToPickupId[locationId]
if id == 0 then
return nil
else
spawnVars.classname = self._objects[id].data
spawnVars.id = tostring(id)
end
end
end
return spawnVars
end
custom_observations.decorate(api)
pickup_decorator.decorate(api)
setting_overrides.decorate{
api = api,
apiParams = kwargs,
decorateWithTimeout = true
}
return api
end
return factory
+24
View File
@@ -0,0 +1,24 @@
-- Copyright 2019 DeepMind Technologies Limited. All Rights Reserved.
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
-- http://www.apache.org/licenses/LICENSE-2.0
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
-- ============================================================================
local factory = require 'visual_match_factory'
return factory.createLevelApi{
exploreMapMode = 'PASSIVE',
episodeLengthSeconds = 40,
exploreLengthSeconds = 5,
distractorLengthSeconds = 30,
differentDistractRoomTexture = true,
differentRewardRoomTexture = true,
correctReward = 10,
incorrectReward = 1,
}
File diff suppressed because it is too large Load Diff
+19
View File
@@ -0,0 +1,19 @@
-- Copyright 2019 DeepMind Technologies Limited. All Rights Reserved.
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
-- http://www.apache.org/licenses/LICENSE-2.0
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
-- ============================================================================
local factory = require 'two_keys_to_choose_factory'
return factory.createLevelApi{
episodeLengthSeconds = 37,
exploreLengthSeconds = 5,
distractorLengthSeconds = 30,
differentDistractRoomTexture = true,
}
File diff suppressed because it is too large Load Diff