Release of IODINE

PiperOrigin-RevId: 299101887
2026-05-26 17:35:21 +08:00 · 2020-03-05 15:52:20 +00:00
parent a5efafff3a
commit afcdc77239
23 changed files with 7600 additions and 0 deletions
@@ -10,6 +10,7 @@ env:
  matrix:
    - PROJECT="tvt"
    - PROJECT="cs_gan"
+    - PROJECT="iodine"
    - PROJECT="transporter"
 before_script:
  - sudo apt-get update -qq
@@ -24,6 +24,7 @@ https://deepmind.com/research/publications/

 ## Projects

+*   [Multi-Object Representation Learning with Iterative Variational Inference (IODINE)](iodine)
 *   [AlphaFold CASP13](alphafold_casp13), Nature 2020
 *   [Unrestricted Adversarial Challenge](unrestricted_advx)
 *   [Hierarchical Probabilistic U-Net (HPU-Net)](hierarchical_probabilistic_unet)
@@ -0,0 +1,142 @@
+# IODINE
+Reference implementation for the paper ["Multi-Object Representation Learning with Iterative Variational Inference"](https://arxiv.org/abs/1903.00450).
+This repository contains:
+
+* An IODINE implementation in Tensorflow v1.
+* Configurations used in the paper (checkpoints available in Cloud Storage) for:
+  * CLEVR
+  * Multi-dSprites
+  * Tetrominoes
+* A notebook for running and inspecting the model and plotting the results
+
+
+## Installation
+1. Clone the DeepMind research repository:
+
+    ``` bash
+    git clone https://github.com/deepmind/deepmind-research.git
+    cd deepmind-research
+    ```
+
+2. Download the checkpoints from GCP. A shell script is provided:
+
+   ```bash
+   ./iodine/download_checkpoints.sh
+   ```
+
+   On platforms without wget, the files can be downloaded from [this webpage](https://console.cloud.google.com/storage/browser/deepmind-research-iodine?pli=1)
+   and the unzipped `checkpoints/` folder should be placed in
+   `deepmind-research/iodine/checkpoints`.
+
+
+3. Prepare a Python 3 environment - virtualenv is recommended.
+
+   ```bash
+   python3 -m venv iodine_venv
+   source iodine_venv/bin/activate
+   ```
+
+4. Install dependencies:
+
+   ```bash
+   pip3 install -r iodine/requirements.txt
+   ```
+
+5. The `multi_object_datasets` package installed via requirements.txt provides python code to open the data files, but not the data files themselves.
+   Download the desired datasets either manually from the [Google Cloud Storage](https://console.cloud.google.com/storage/browser/multi-object-datasets) or using the commands below:
+
+    ```bash
+    pushd iodine/multi_object_datasets
+    # CLEVR
+    wget https://storage.googleapis.com/multi-object-datasets/clevr_with_masks/clevr_with_masks_train.tfrecords
+    # Multi-dSprites
+    wget https://storage.googleapis.com/multi-object-datasets/multi_dsprites/multi_dsprites_colored_on_grayscale.tfrecords
+    # Tetrominoes
+    wget https://storage.googleapis.com/multi-object-datasets/tetrominoes/tetrominoes_train.tfrecords
+    # Get back to location containing 'iodine' directory
+    popd
+    ```
+
+    See [multi_object_datasets repository](https://github.com/deepmind/multi_object_datasets)
+    for further details.
+6. Make sure that you have CUDA 10 and CuDNN 7 installed
+
+
+## Interact with a Model
+Use the jupyter notebook `Eval.ipynb` to load and run one of the checkpoints.
+It also contains code to plot the outputs and latent traversals.
+
+
+## Train a Model
+To train your own model use the [Sacred](https://github.com/IDSIA/sacred) experiment defined in `main.py`.
+The configurations used in the paper for the different datasets are available as [named configs](https://sacred.readthedocs.io/en/latest/configuration.html#named-configurations) inside of `configuration.py`.
+### Train a new model
+ * CLEVR6
+
+    ```bash
+    python3 -m iodine.main -f with clevr6
+    ```
+
+ * Multi-dSprites
+
+    ```bash
+    python3 -m iodine.main -f with multi_dsprites
+    ```
+
+ * Tetrominoes
+
+    ```bash
+    python3 -m iodine.main -f with tetrominoes
+    ```
+
+It is recommended to add an observer to your run to let Sacred record the details of run.
+To add a [FileStorageObserver](https://sacred.readthedocs.io/en/latest/command_line.html#filestorage-observer) add `-F my_storage_dir`, and add `-m my_db_name` for a [MongoObserver](https://sacred.readthedocs.io/en/latest/command_line.html#mongodb-observer).
+
+### Adjusting Config Values
+The experiment has a configuration that can be printed and adjusted from the commandline. E.g.:
+
+``` bash
+# print configuration
+python3 -m iodine.main -f print_config with clevr6
+# run experiment after adjusting batch_size and the size of the shuffle buffer
+python3 -m iodine.main -f with clevr6 batch_size=2 data.shuffle_buffer=100
+```
+
+### Tensorboard
+Each run stores checkpoints and summaries in the directory specified by `checkpoint_dir`, to which a suffix based on the run_id is appended.
+If an observer is added the `run_id` is set automatically. Otherwise it should be set manually using e.g. `run_id=5`.
+
+Summaries can be viewed using tensorboard. E.g. like this for clevr6 (assuming `run_id=1`):
+
+```bash
+tensorboard --log-dir iodine/checkpoints/clevr6_1
+```
+
+### Continue Previous Run
+To continue a previous run pass `continue_run=True` and the path of the checkpoints:
+
+```bash
+python3 -m iodine.main -f with clevr6 checkpoint_dir=iodine/checkpoints/clevr6_1
+```
+
+## Code Structure
+The main experiment defined in `main.py` uses `sacred` and the configurations for the different datasets are added as named configs and can be found in `configuration.py`.
+The model implementation can be found in the `modules` directory and is based on `tensorflow` and `sonnet`:
+
+ * `iodine.py` The main IODINE module that assembles the decoder, refinement network, distributions and factor regressor.
+ * `decoder.py` The ComponentDecoder which is a wrapper around networks that takes care of splitting the output channels into means and masks.
+ * `refinement.py` The refinement components assembles the encoder network, LSTM and refinement head.
+ * `networks.py` Different standard networks such as CNN, BroadcastCNN, and LSTM.
+ * `distribution.py` Definition of the latent and pixel distributions.
+ * `factor_eval.py` Contains the factor regressor which predicts the true factors from the inferred object latents.
+ * `data.py` Dataset wrappers around `multi_object_datasets` that take care of shuffling, batching and preprocessing.
+ * `plotting.py` Helper functions for plotting results.
+ * `utils.py` General helper functions.
+
+
+---
+**DISCLAIMER**
+
+This is not an officially supported Google product.
+
+---
@@ -0,0 +1,370 @@
+# Lint as: python3
+# Copyright 2019 Deepmind Technologies Limited.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Configurations for IODINE."""
+# pylint: disable=missing-docstring, unused-variable
+import math
+
+
+def clevr6():
+  n_z = 64  # number of latent dimensions
+  num_components = 7  # number of components (K)
+  num_iters = 5
+  checkpoint_dir = "iodine/checkpoints/clevr6"
+
+  # For the paper we used 8 GPUs with a batch size of 4 each.
+  # This means a total batch size of 32, which is too large for a single GPU.
+  # When reducing the batch size, the learning rate should also be lowered.
+  batch_size = 4
+  learn_rate = 0.001 * math.sqrt(batch_size / 32)
+
+  data = {
+      "constructor": "iodine.modules.data.CLEVR",
+      "batch_size": batch_size,
+      "path": "multi_object_datasets/clevr_with_masks_train.tfrecords",
+      "max_num_objects": 6,
+  }
+
+  model = {
+      "constructor": "iodine.modules.iodine.IODINE",
+      "n_z": n_z,
+      "num_components": num_components,
+      "num_iters": num_iters,
+      "iter_loss_weight": "linspace",
+      "coord_type": "linear",
+      "decoder": {
+          "constructor": "iodine.modules.decoder.ComponentDecoder",
+          "pixel_decoder": {
+              "constructor": "iodine.modules.networks.BroadcastConv",
+              "cnn_opt": {
+                  # Final channels is irrelevant with target_output_shape
+                  "output_channels": [64, 64, 64, 64, None],
+                  "kernel_shapes": [3],
+                  "strides": [1],
+                  "activation": "elu",
+              },
+              "coord_type": "linear",
+          },
+      },
+      "refinement_core": {
+          "constructor": "iodine.modules.refinement.RefinementCore",
+          "encoder_net": {
+              "constructor": "iodine.modules.networks.CNN",
+              "mode": "avg_pool",
+              "cnn_opt": {
+                  "output_channels": [64, 64, 64, 64],
+                  "strides": [2],
+                  "kernel_shapes": [3],
+                  "activation": "elu",
+              },
+              "mlp_opt": {
+                  "output_sizes": [256, 256],
+                  "activation": "elu"
+              },
+          },
+          "recurrent_net": {
+              "constructor": "iodine.modules.networks.LSTM",
+              "hidden_sizes": [256],
+          },
+          "refinement_head": {
+              "constructor": "iodine.modules.refinement.ResHead"
+          },
+      },
+      "latent_dist": {
+          "constructor": "iodine.modules.distributions.LocScaleDistribution",
+          "dist": "normal",
+          "scale_act": "softplus",
+          "scale": "var",
+          "name": "latent_dist",
+      },
+      "output_dist": {
+          "constructor": "iodine.modules.distributions.MaskedMixture",
+          "num_components": num_components,
+          "component_dist": {
+              "constructor":
+                  "iodine.modules.distributions.LocScaleDistribution",
+              "dist":
+                  "logistic",
+              "scale":
+                  "fixed",
+              "scale_val":
+                  0.03,
+              "name":
+                  "pixel_distribution",
+          },
+      },
+      "factor_evaluator": {
+          "constructor":
+              "iodine.modules.factor_eval.FactorRegressor",
+          "mapping": [
+              ("color", 9, "categorical"),
+              ("shape", 4, "categorical"),
+              ("size", 3, "categorical"),
+              ("position", 3, "scalar"),
+          ],
+      },
+  }
+
+  optimizer = {
+      "constructor": "tensorflow.train.AdamOptimizer",
+      "learning_rate": {
+          "constructor": "tensorflow.train.exponential_decay",
+          "learning_rate": learn_rate,
+          "global_step": {
+              "constructor": "tensorflow.train.get_or_create_global_step"
+          },
+          "decay_steps": 1000000,
+          "decay_rate": 0.1,
+      },
+      "beta1": 0.95,
+  }
+
+
+def multi_dsprites():
+  n_z = 16  # number of latent dimensions
+  num_components = 6  # number of components (K)
+  num_iters = 5
+  checkpoint_dir = "iodine/checkpoints/multi_dsprites"
+
+  # For the paper we used 8 GPUs with a batch size of 16 each.
+  # This means a total batch size of 128, which is too large for a single GPU.
+  # When reducing the batch size, the learning rate should also be lowered.
+  batch_size = 16
+  learn_rate = 0.0003 * math.sqrt(batch_size / 128)
+
+  data = {
+      "constructor":
+          "iodine.modules.data.MultiDSprites",
+      "batch_size":
+          batch_size,
+      "path":
+          "multi_object_datasets/multi_dsprites_colored_on_grayscale.tfrecords",
+      "dataset_variant":
+          "colored_on_grayscale",
+      "min_num_objs":
+          3,
+      "max_num_objs":
+          3,
+  }
+
+  model = {
+      "constructor": "iodine.modules.iodine.IODINE",
+      "n_z": n_z,
+      "num_components": num_components,
+      "num_iters": num_iters,
+      "iter_loss_weight": "linspace",
+      "coord_type": "cos",
+      "coord_freqs": 3,
+      "decoder": {
+          "constructor": "iodine.modules.decoder.ComponentDecoder",
+          "pixel_decoder": {
+              "constructor": "iodine.modules.networks.BroadcastConv",
+              "cnn_opt": {
+                  # Final channels is irrelevant with target_output_shape
+                  "output_channels": [32, 32, 32, 32, None],
+                  "kernel_shapes": [5],
+                  "strides": [1],
+                  "activation": "elu",
+              },
+              "coord_type": "linear",
+          },
+      },
+      "refinement_core": {
+          "constructor": "iodine.modules.refinement.RefinementCore",
+          "encoder_net": {
+              "constructor": "iodine.modules.networks.CNN",
+              "mode": "avg_pool",
+              "cnn_opt": {
+                  "output_channels": [32, 32, 32],
+                  "strides": [2],
+                  "kernel_shapes": [5],
+                  "activation": "elu",
+              },
+              "mlp_opt": {
+                  "output_sizes": [128],
+                  "activation": "elu"
+              },
+          },
+          "recurrent_net": {
+              "constructor": "iodine.modules.networks.LSTM",
+              "hidden_sizes": [128],
+          },
+          "refinement_head": {
+              "constructor": "iodine.modules.refinement.ResHead"
+          },
+      },
+      "latent_dist": {
+          "constructor": "iodine.modules.distributions.LocScaleDistribution",
+          "dist": "normal",
+          "scale_act": "softplus",
+          "scale": "var",
+          "name": "latent_dist",
+      },
+      "output_dist": {
+          "constructor": "iodine.modules.distributions.MaskedMixture",
+          "num_components": num_components,
+          "component_dist": {
+              "constructor":
+                  "iodine.modules.distributions.LocScaleDistribution",
+              "dist":
+                  "logistic",
+              "scale":
+                  "fixed",
+              "scale_val":
+                  0.03,
+              "name":
+                  "pixel_distribution",
+          },
+      },
+      "factor_evaluator": {
+          "constructor":
+              "iodine.modules.factor_eval.FactorRegressor",
+          "mapping": [
+              ("color", 3, "scalar"),
+              ("shape", 4, "categorical"),
+              ("scale", 1, "scalar"),
+              ("x", 1, "scalar"),
+              ("y", 1, "scalar"),
+              ("orientation", 2, "angle"),
+          ],
+      },
+  }
+
+  optimizer = {
+      "constructor": "tensorflow.train.AdamOptimizer",
+      "learning_rate": {
+          "constructor": "tensorflow.train.exponential_decay",
+          "learning_rate": learn_rate,
+          "global_step": {
+              "constructor": "tensorflow.train.get_or_create_global_step"
+          },
+          "decay_steps": 1000000,
+          "decay_rate": 0.1,
+      },
+      "beta1": 0.95,
+  }
+
+
+def tetrominoes():
+  n_z = 32  # number of latent dimensions
+  num_components = 4  # number of components (K)
+  num_iters = 5
+  checkpoint_dir = "iodine/checkpoints/tetrominoes"
+
+  # For the paper we used 8 GPUs with a batch size of 32 each.
+  # This means a total batch size of 256, which is too large for a single GPU.
+  # When reducing the batch size, the learning rate should also be lowered.
+  batch_size = 128
+  learn_rate = 0.0003 * math.sqrt(batch_size / 256)
+
+  data = {
+      "constructor": "iodine.modules.data.Tetrominoes",
+      "batch_size": batch_size,
+      "path": "iodine/multi_object_datasets/tetrominoes_train.tfrecords",
+  }
+
+  model = {
+      "constructor": "iodine.modules.iodine.IODINE",
+      "n_z": n_z,
+      "num_components": num_components,
+      "num_iters": num_iters,
+      "iter_loss_weight": "linspace",
+      "coord_type": "cos",
+      "coord_freqs": 3,
+      "decoder": {
+          "constructor": "iodine.modules.decoder.ComponentDecoder",
+          "pixel_decoder": {
+              "constructor": "iodine.modules.networks.BroadcastConv",
+              "cnn_opt": {
+                  # Final channels is irrelevant with target_output_shape
+                  "output_channels": [32, 32, 32, 32, None],
+                  "kernel_shapes": [5],
+                  "strides": [1],
+                  "activation": "elu",
+              },
+              "coord_type": "linear",
+              "coord_freqs": 3,
+          },
+      },
+      "refinement_core": {
+          "constructor": "iodine.modules.refinement.RefinementCore",
+          "encoder_net": {
+              "constructor": "iodine.modules.networks.CNN",
+              "mode": "avg_pool",
+              "cnn_opt": {
+                  "output_channels": [32, 32, 32],
+                  "strides": [2],
+                  "kernel_shapes": [5],
+                  "activation": "elu",
+              },
+              "mlp_opt": {
+                  "output_sizes": [128],
+                  "activation": "elu"
+              },
+          },
+          "recurrent_net": {
+              "constructor": "iodine.modules.networks.LSTM",
+              "hidden_sizes": [],  # No recurrent layer used for this dataset
+          },
+          "refinement_head": {
+              "constructor": "iodine.modules.refinement.ResHead"
+          },
+      },
+      "latent_dist": {
+          "constructor": "iodine.modules.distributions.LocScaleDistribution",
+          "dist": "normal",
+          "scale_act": "softplus",
+          "scale": "var",
+          "name": "latent_dist",
+      },
+      "output_dist": {
+          "constructor": "iodine.modules.distributions.MaskedMixture",
+          "num_components": num_components,
+          "component_dist": {
+              "constructor":
+                  "iodine.modules.distributions.LocScaleDistribution",
+              "dist":
+                  "logistic",
+              "scale":
+                  "fixed",
+              "scale_val":
+                  0.03,
+              "name":
+                  "pixel_distribution",
+          },
+      },
+      "factor_evaluator": {
+          "constructor":
+              "iodine.modules.factor_eval.FactorRegressor",
+          "mapping": [
+              ("position", 2, "scalar"),
+              ("color", 3, "scalar"),
+              ("shape", 20, "categorical"),
+          ],
+      },
+  }
+
+  optimizer = {
+      "constructor": "tensorflow.train.AdamOptimizer",
+      "learning_rate": {
+          "constructor": "tensorflow.train.exponential_decay",
+          "learning_rate": learn_rate,
+          "global_step": {
+              "constructor": "tensorflow.train.get_or_create_global_step"
+          },
+          "decay_steps": 1000000,
+          "decay_rate": 0.1,
+      },
+      "beta1": 0.95,
+  }
@@ -0,0 +1,20 @@
+#!/bin/bash
+# Copyright 2019 Deepmind Technologies Limited.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+pushd iodine
+wget http://storage.googleapis.com/deepmind-research-iodine/iodine_checkpoints.zip
+unzip iodine_checkpoints.zip
+popd
+
@@ -0,0 +1,202 @@
+# Lint as: python3
+# Copyright 2019 Deepmind Technologies Limited.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# pylint: disable=g-importing-member, g-multiple-import, g-import-not-at-top
+# pylint: disable=protected-access, g-bad-import-order, missing-docstring
+# pylint: disable=unused-variable, invalid-name, no-value-for-parameter
+
+from copy import deepcopy
+import os.path
+import warnings
+from absl import logging
+import numpy as np
+from sacred import Experiment, SETTINGS
+
+# Ignore all tensorflow deprecation warnings
+logging._warn_preinit_stderr = 0
+warnings.filterwarnings("ignore", module=".*tensorflow.*")
+import tensorflow.compat.v1 as tf
+
+tf.logging.set_verbosity(tf.logging.ERROR)
+import sonnet as snt
+from sacred.stflow import LogFileWriter
+from iodine.modules import utils
+from iodine import configurations
+
+SETTINGS.CONFIG.READ_ONLY_CONFIG = False
+
+ex = Experiment("iodine")
+
+
+@ex.config
+def default_config():
+  continue_run = False  # set to continue experiment from an existing checkpoint
+  checkpoint_dir = ("checkpoints/iodine"
+                   )  # if continue_run is False, "_{run_id}" will be appended
+  save_summaries_steps = 10
+  save_checkpoint_steps = 1000
+
+  n_z = 64  # number of latent dimensions
+  num_components = 7  # number of components (K)
+  num_iters = 5
+
+  learn_rate = 0.001
+  batch_size = 4
+  stop_after_steps = int(1e6)
+
+  # Details for the dataset, model and optimizer are left empty here.
+  # They can be found in the configurations for individual datasets,
+  # which are provided in configurations.py and added as named configs.
+  data = {}  # Dataset details will go here
+  model = {}  # Model details will go here
+  optimizer = {}  # Optimizer details will go here
+
+
+ex.named_config(configurations.clevr6)
+ex.named_config(configurations.multi_dsprites)
+ex.named_config(configurations.tetrominoes)
+
+
+@ex.capture
+def build(identifier, _config):
+  config_copy = deepcopy(_config[identifier])
+  return utils.build(config_copy, identifier=identifier)
+
+
+def get_train_step(model, dataset, optimizer):
+  loss, scalars, _ = model(dataset("train"))
+  global_step = tf.train.get_or_create_global_step()
+  grads = optimizer.compute_gradients(loss)
+  gradients, variables = zip(*grads)
+  global_norm = tf.global_norm(gradients)
+  gradients, global_norm = tf.clip_by_global_norm(
+      gradients, 5.0, use_norm=global_norm)
+  grads = zip(gradients, variables)
+  train_op = optimizer.apply_gradients(grads, global_step=global_step)
+
+  with tf.control_dependencies([train_op]):
+    overview = model.get_overview_images(dataset("summary"))
+    scalars["debug/global_grad_norm"] = global_norm
+
+    summaries = {
+        k: tf.summary.scalar(k, v) for k, v in scalars.items()
+    }
+    summaries.update(
+        {k: tf.summary.image(k, v) for k, v in overview.items()})
+
+    return tf.identity(global_step), scalars, train_op
+
+
+@ex.capture
+def get_checkpoint_dir(continue_run, checkpoint_dir, _run, _log):
+  if continue_run:
+    assert os.path.exists(checkpoint_dir)
+    _log.info("Continuing run from checkpoint at {}".format(checkpoint_dir))
+    return checkpoint_dir
+
+  run_id = _run._id
+  if run_id is None:  # then no observer was added that provided an _id
+    if not _run.unobserved:
+      _log.warning(
+          "No run_id given or provided by an Observer. (Re-)using run_id=1.")
+    run_id = 1
+  checkpoint_dir = checkpoint_dir + "_{run_id}".format(run_id=run_id)
+  _log.info(
+      "Starting a new run using checkpoint dir: '{}'".format(checkpoint_dir))
+  return checkpoint_dir
+
+
+@ex.capture
+def get_session(chkp_dir, loss, stop_after_steps, save_summaries_steps,
+                save_checkpoint_steps):
+  config = tf.ConfigProto()
+  config.gpu_options.allow_growth = True
+
+  hooks = [
+      tf.train.StopAtStepHook(last_step=stop_after_steps),
+      tf.train.NanTensorHook(loss),
+  ]
+
+  return tf.train.MonitoredTrainingSession(
+      hooks=hooks,
+      config=config,
+      checkpoint_dir=chkp_dir,
+      save_summaries_steps=save_summaries_steps,
+      save_checkpoint_steps=save_checkpoint_steps,
+  )
+
+
+@ex.command(unobserved=True)
+def load_checkpoint(use_placeholder=False, session=None):
+  dataset = build("data")
+  model = build("model")
+  if use_placeholder:
+    inputs = dataset.get_placeholders()
+  else:
+    inputs = dataset()
+
+  info = model.eval(inputs)
+  if session is None:
+    session = tf.Session()
+  saver = tf.train.Saver()
+  checkpoint_dir = get_checkpoint_dir()
+  checkpoint_file = tf.train.latest_checkpoint(checkpoint_dir)
+  saver.restore(session, checkpoint_file)
+
+  print('Successfully restored Checkpoint "{}"'.format(checkpoint_file))
+  # print variables
+  variables = tf.global_variables() + tf.local_variables()
+  for row in snt.format_variables(variables, join_lines=False):
+    print(row)
+
+  return {
+      "session": session,
+      "model": model,
+      "info": info,
+      "inputs": inputs,
+      "dataset": dataset,
+  }
+
+
+@ex.automain
+@LogFileWriter(ex)
+def main(save_summaries_steps):
+  checkpoint_dir = get_checkpoint_dir()
+
+  dataset = build("data")
+  model = build("model")
+  optimizer = build("optimizer")
+  gstep, train_step_exports, train_op = get_train_step(model, dataset,
+                                                       optimizer)
+
+  loss, ari = [], []
+  with get_session(checkpoint_dir, train_step_exports["loss/total"]) as sess:
+    while not sess.should_stop():
+      out = sess.run({
+          "step": gstep,
+          "loss": train_step_exports["loss/total"],
+          "ari": train_step_exports["loss/ari_nobg"],
+          "train": train_op,
+      })
+      loss.append(out["loss"])
+      ari.append(out["ari"])
+      step = out["step"]
+      if step % save_summaries_steps == 0:
+        mean_loss = np.mean(loss)
+        mean_ari = np.mean(ari)
+        ex.log_scalar("loss", mean_loss, step)
+        ex.log_scalar("ari", mean_ari, step)
+        print("{step:>6d} Loss: {loss: >12.2f}\t\tARI-nobg:{ari: >6.2f}".format(
+            step=step, loss=mean_loss, ari=mean_ari))
+        loss, ari = [], []
@@ -0,0 +1,13 @@
+# Copyright 2019 Deepmind Technologies Limited.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
@@ -0,0 +1,264 @@
+# Lint as: python3
+# Copyright 2019 Deepmind Technologies Limited.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Data loading functionality for IODINE."""
+# pylint: disable=g-multiple-import, missing-docstring, unused-import
+import os.path
+
+from iodine.modules.utils import flatten_all_but_last, ensure_3d
+from multi_object_datasets import (
+    clevr_with_masks,
+    multi_dsprites,
+    tetrominoes,
+    objects_room,
+)
+from shapeguard import ShapeGuard
+import sonnet as snt
+import tensorflow.compat.v1 as tf
+
+
+class IODINEDataset(snt.AbstractModule):
+  num_true_objects = 1
+  num_channels = 3
+
+  factors = {}
+
+  def __init__(
+      self,
+      path,
+      batch_size,
+      image_dim,
+      crop_region=None,
+      shuffle_buffer=1000,
+      max_num_objects=None,
+      min_num_objects=None,
+      grayscale=False,
+      name="dataset",
+      **kwargs,
+  ):
+    super().__init__(name=name)
+    self.path = os.path.abspath(os.path.expanduser(path))
+    self.batch_size = batch_size
+    self.crop_region = crop_region
+    self.image_dim = image_dim
+    self.shuffle_buffer = shuffle_buffer
+    self.max_num_objects = max_num_objects
+    self.min_num_objects = min_num_objects
+    self.grayscale = grayscale
+    self.dataset = None
+
+  def _build(self, subset="train"):
+    dataset = self.dataset
+
+    # filter by number of objects
+    if self.max_num_objects is not None or self.min_num_objects is not None:
+      dataset = self.dataset.filter(self.filter_by_num_objects)
+
+    if subset == "train":
+      # normal mode returns a shuffled dataset iterator
+      if self.shuffle_buffer is not None:
+        dataset = dataset.shuffle(self.shuffle_buffer)
+    elif subset == "summary":
+      # for generating summaries and overview images
+      # returns a single fixed batch
+      dataset = dataset.take(self.batch_size)
+
+    # repeat and batch
+    dataset = dataset.repeat().batch(self.batch_size, drop_remainder=True)
+    iterator = dataset.make_one_shot_iterator()
+    data = iterator.get_next()
+
+    # preprocess the data to ensure correct format, scale images etc.
+    data = self.preprocess(data)
+    return data
+
+  def filter_by_num_objects(self, d):
+    if "visibility" not in d:
+      return tf.constant(True)
+    min_num_objects = self.max_num_objects or 0
+    max_num_objects = self.max_num_objects or 6
+
+    min_predicate = tf.greater_equal(
+        tf.reduce_sum(d["visibility"]),
+        tf.constant(min_num_objects - 1e-5, dtype=tf.float32),
+    )
+    max_predicate = tf.less_equal(
+        tf.reduce_sum(d["visibility"]),
+        tf.constant(max_num_objects + 1e-5, dtype=tf.float32),
+    )
+    return tf.logical_and(min_predicate, max_predicate)
+
+  def preprocess(self, data):
+    sg = ShapeGuard(dims={
+        "B": self.batch_size,
+        "H": self.image_dim[0],
+        "W": self.image_dim[1]
+    })
+    image = sg.guard(data["image"], "B, h, w, C")
+    mask = sg.guard(data["mask"], "B, L, h, w, 1")
+
+    # to float
+    image = tf.cast(image, tf.float32) / 255.0
+    mask = tf.cast(mask, tf.float32) / 255.0
+
+    # crop
+    if self.crop_region is not None:
+      height_slice = slice(self.crop_region[0][0], self.crop_region[0][1])
+      width_slice = slice(self.crop_region[1][0], self.crop_region[1][1])
+      image = image[:, height_slice, width_slice, :]
+
+      mask = mask[:, :, height_slice, width_slice, :]
+
+    flat_mask, unflatten = flatten_all_but_last(mask, n_dims=3)
+
+    # rescale
+    size = tf.constant(
+        self.image_dim, dtype=tf.int32, shape=[2], verify_shape=True)
+    image = tf.image.resize_images(
+        image, size, method=tf.image.ResizeMethod.BILINEAR)
+    mask = tf.image.resize_images(
+        flat_mask, size, method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
+
+    if self.grayscale:
+      image = tf.reduce_mean(image, axis=-1, keepdims=True)
+
+    output = {
+        "image": sg.guard(image[:, None], "B, T, H, W, C"),
+        "mask": sg.guard(unflatten(mask)[:, None], "B, T, L, H, W, 1"),
+        "factors": self.preprocess_factors(data, sg),
+    }
+
+    if "visibility" in data:
+      output["visibility"] = sg.guard(data["visibility"], "B, L")
+    else:
+      output["visibility"] = tf.ones(sg["B, L"], dtype=tf.float32)
+
+    return output
+
+  def preprocess_factors(self, data, sg):
+    return {
+        name: sg.guard(ensure_3d(data[name]), "B, L, *")
+        for name in self.factors
+    }
+
+  def get_placeholders(self, batch_size=None):
+    batch_size = batch_size or self.batch_size
+    sg = ShapeGuard(
+        dims={
+            "B": batch_size,
+            "H": self.image_dim[0],
+            "W": self.image_dim[1],
+            "L": self.num_true_objects,
+            "C": 3,
+            "T": 1,
+        })
+    return {
+        "image": tf.placeholder(dtype=tf.float32, shape=sg["B, T, H, W, C"]),
+        "mask": tf.placeholder(dtype=tf.float32, shape=sg["B, T, L, H, W, 1"]),
+        "visibility": tf.placeholder(dtype=tf.float32, shape=sg["B, L"]),
+        "factors": {
+            name:
+            tf.placeholder(dtype=dtype, shape=sg["B, L, {}".format(size)])
+            for name, (dtype, size) in self.factors
+        },
+    }
+
+
+class CLEVR(IODINEDataset):
+  num_true_objects = 11
+  num_channels = 3
+  factors = {
+      "color": (tf.uint8, 1),
+      "shape": (tf.uint8, 1),
+      "size": (tf.uint8, 1),
+      "position": (tf.float32, 3),
+      "rotation": (tf.float32, 1),
+  }
+
+  def __init__(
+      self,
+      path,
+      crop_region=((29, 221), (64, 256)),
+      image_dim=(128, 128),
+      name="clevr",
+      **kwargs,
+  ):
+    super().__init__(
+        path=path,
+        crop_region=crop_region,
+        image_dim=image_dim,
+        name=name,
+        **kwargs)
+    self.dataset = clevr_with_masks.dataset(self.path)
+
+  def preprocess_factors(self, data, sg):
+
+    return {
+        "color": sg.guard(ensure_3d(data["color"]), "B, L, 1"),
+        "shape": sg.guard(ensure_3d(data["shape"]), "B, L, 1"),
+        "size": sg.guard(ensure_3d(data["color"]), "B, L, 1"),
+        "position": sg.guard(ensure_3d(data["pixel_coords"]), "B, L, 3"),
+        "rotation": sg.guard(ensure_3d(data["rotation"]), "B, L, 1"),
+    }
+
+
+class MultiDSprites(IODINEDataset):
+  num_true_objects = 6
+  num_channels = 3
+  factors = {
+      "color": (tf.float32, 3),
+      "shape": (tf.uint8, 1),
+      "scale": (tf.float32, 1),
+      "x": (tf.float32, 1),
+      "y": (tf.float32, 1),
+      "orientation": (tf.float32, 1),
+  }
+
+  def __init__(
+      self,
+      path,
+      # variant from ['binarized', 'colored_on_grayscale', 'colored_on_colored']
+      dataset_variant="colored_on_grayscale",
+      image_dim=(64, 64),
+      name="multi_dsprites",
+      **kwargs,
+  ):
+    super().__init__(path=path, name=name, image_dim=image_dim, **kwargs)
+    self.dataset_variant = dataset_variant
+    self.dataset = multi_dsprites.dataset(self.path, self.dataset_variant)
+
+
+class Tetrominoes(IODINEDataset):
+  num_true_objects = 6
+  num_channels = 3
+  factors = {
+      "color": (tf.uint8, 3),
+      "shape": (tf.uint8, 1),
+      "position": (tf.float32, 2),
+  }
+
+  def __init__(self, path, image_dim=(35, 35), name="tetrominoes", **kwargs):
+    super().__init__(path=path, name=name, image_dim=image_dim, **kwargs)
+    self.dataset = tetrominoes.dataset(self.path)
+
+  def preprocess_factors(self, data, sg):
+    pos_x = ensure_3d(data["x"])
+    pos_y = ensure_3d(data["y"])
+    position = tf.concat([pos_x, pos_y], axis=2)
+
+    return {
+        "color": sg.guard(ensure_3d(data["color"]), "B, L, 3"),
+        "shape": sg.guard(ensure_3d(data["shape"]), "B, L, 1"),
+        "position": sg.guard(ensure_3d(position), "B, L, 2"),
+    }
@@ -0,0 +1,49 @@
+# Lint as: python3
+# Copyright 2019 Deepmind Technologies Limited.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Decoders for rendering images."""
+# pylint: disable=missing-docstring
+from iodine.modules.distributions import MixtureParameters
+import shapeguard
+import sonnet as snt
+
+
+class ComponentDecoder(snt.AbstractModule):
+
+  def __init__(self, pixel_decoder, name="component_decoder"):
+    super().__init__(name=name)
+    self._pixel_decoder = pixel_decoder
+    self._sg = shapeguard.ShapeGuard()
+
+  def set_output_shapes(self, pixel, mask):
+    self._sg.guard(pixel, "K, H, W, Cp")
+    self._sg.guard(mask, "K, H, W, 1")
+    self._pixel_decoder.set_output_shapes(self._sg["H, W, 1 + Cp"])
+
+  def _build(self, z):
+    self._sg.guard(z, "B, K, Z")
+    z_flat = self._sg.reshape(z, "B*K, Z")
+    pixel_params = self._pixel_decoder(z_flat).params
+
+    self._sg.guard(pixel_params, "B*K, H, W, 1 + Cp")
+    mask_params = pixel_params[Ellipsis, 0:1]
+    pixel_params = pixel_params[Ellipsis, 1:]
+
+    output = MixtureParameters(
+        pixel=self._sg.reshape(pixel_params, "B, K, H, W, Cp"),
+        mask=self._sg.reshape(mask_params, "B, K, H, W, 1"),
+    )
+
+    del self._sg.B
+    return output
@@ -0,0 +1,223 @@
+# Lint as: python3
+# Copyright 2019 Deepmind Technologies Limited.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Collection of sonnet modules that wrap useful distributions."""
+# pylint: disable=missing-docstring, g-doc-args, g-short-docstring-punctuation
+# pylint: disable=g-space-before-docstring-summary
+# pylint: disable=g-no-space-after-docstring-summary
+import collections
+from iodine.modules.utils import get_act_func
+from iodine.modules.utils import get_distribution
+import shapeguard
+import sonnet as snt
+import tensorflow.compat.v1 as tf
+import tensorflow_probability as tfp
+
+
+tfd = tfp.distributions
+
+FlatParameters = collections.namedtuple("ParameterOut", ["params"])
+MixtureParameters = collections.namedtuple("MixtureOut", ["pixel", "mask"])
+
+
+class DistributionModule(snt.AbstractModule):
+  """Distribution Base class supporting shape inference & default priors."""
+
+  def __init__(self, name="distribution"):
+    super().__init__(name=name)
+    self._output_shape = None
+
+  def set_output_shape(self, shape):
+    self._output_shape = shape
+
+  @property
+  def output_shape(self):
+    return self._output_shape
+
+  @property
+  def input_shapes(self):
+    raise NotImplementedError()
+
+  def get_default_prior(self, batch_dim=(1,)):
+    return self(
+        tf.zeros(list(batch_dim) + self.input_shapes.params, dtype=tf.float32))
+
+
+class BernoulliOutput(DistributionModule):
+
+  def __init__(self, name="bernoulli_output"):
+    super().__init__(name=name)
+
+  @property
+  def input_shapes(self):
+    return FlatParameters(self.output_shape)
+
+  def _build(self, params):
+    return tfd.Independent(
+        tfd.Bernoulli(logits=params, dtype=tf.float32),
+        reinterpreted_batch_ndims=1)
+
+
+class LocScaleDistribution(DistributionModule):
+  """Generic IID location / scale distribution.
+
+    Input parameters are concatenation of location and scale (2*Z,)
+
+    Args:
+      dist: Distribution or str Kind of distribution used. Supports Normal,
+        Logistic, Laplace, and StudentT distributions.
+      dist_kwargs: dict custom keyword arguments for the distribution
+      scale_act: function or str or None activation function to be applied to
+        the scale input
+      scale: str
+        different modes for computing the scale:
+          * stddev: scale is computed as scale_act(s)
+          * var: scale is computed as sqrt(scale_act(s))
+          * prec: scale is computed as 1./scale_act(s)
+          * fixed: scale is a global variable (same for all pixels) if
+            scale_val==-1. then it is a trainable variable initialized to 0.1
+            else it is fixed to scale_val (input shape is only (Z,) in this
+            case)
+      scale_val: float determines the scale value (only used if scale=='fixed').
+      loc_act: function or str or None activation function to be applied to the
+        location input. Supports optional activation functions for scale and
+        location.
+    Supports different "modes" for scaling:
+      * stddev:
+  """
+
+  def __init__(
+      self,
+      dist=tfd.Normal,
+      dist_kwargs=None,
+      scale_act=tf.exp,
+      scale="stddev",
+      scale_val=1.0,
+      loc_act=None,
+      name="loc_scale_dist",
+  ):
+    super().__init__(name=name)
+    self._scale_act = get_act_func(scale_act)
+    self._loc_act = get_act_func(loc_act)
+    # supports Normal, Logstic, Laplace, StudentT
+    self._dist = get_distribution(dist)
+    self._dist_kwargs = dist_kwargs or {}
+
+    assert scale in ["stddev", "var", "prec", "fixed"], scale
+    self._scale = scale
+    self._scale_val = scale_val
+
+  @property
+  def input_shapes(self):
+    if self._scale == "fixed":
+      param_shape = self.output_shape
+    else:
+      param_shape = self.output_shape[:-1] + [self.output_shape[-1] * 2]
+    return FlatParameters(param_shape)
+
+  def _build(self, params):
+    if self._scale == "fixed":
+      loc = params
+      scale = None  # set later
+    else:
+      n_channels = params.get_shape().as_list()[-1]
+      assert n_channels % 2 == 0
+      assert n_channels // 2 == self.output_shape[-1]
+      loc = params[Ellipsis, :n_channels // 2]
+      scale = params[Ellipsis, n_channels // 2:]
+
+    # apply activation functions
+    if self._scale != "fixed":
+      scale = self._scale_act(scale)
+    loc = self._loc_act(loc)
+
+    # apply the correct parametrization
+    if self._scale == "var":
+      scale = tf.sqrt(scale)
+    elif self._scale == "prec":
+      scale = tf.reciprocal(scale)
+    elif self._scale == "fixed":
+      if self._scale_val == -1.0:
+        scale_val = tf.get_variable(
+            "scale", initializer=tf.constant(0.1, dtype=tf.float32))
+      else:
+        scale_val = self._scale_val
+      scale = tf.ones_like(loc) * scale_val
+    # else 'stddev'
+
+    dist = self._dist(loc=loc, scale=scale, **self._dist_kwargs)
+
+    return tfd.Independent(dist, reinterpreted_batch_ndims=1)
+
+
+class MaskedMixture(DistributionModule):
+
+  def __init__(
+      self,
+      num_components,
+      component_dist,
+      mask_activation=None,
+      name="masked_mixture",
+  ):
+    """
+        Spatial Mixture Model composed of a categorical masking distribution and
+        a custom pixel-wise component distribution (usually logistic or
+        gaussian).
+
+        Args:
+          num_components: int Number of mixture components >= 2
+          component_dist: the distribution to use for the individual components
+          mask_activation: str or function or None activation function that
+            should be applied to the mask before the softmax.
+          name: str
+    """
+
+    super().__init__(name=name)
+    self._num_components = num_components
+    self._dist = component_dist
+    self._mask_activation = get_act_func(mask_activation)
+
+  def set_output_shape(self, shape):
+    super().set_output_shape(shape)
+    self._dist.set_output_shape(shape)
+
+  def _build(self, pixel, mask):
+    sg = shapeguard.ShapeGuard()
+    # MASKING
+    sg.guard(mask, "B, K, H, W, 1")
+    mask = tf.transpose(mask, perm=[0, 2, 3, 4, 1])
+    mask = sg.reshape(mask, "B, H, W, K")
+    mask = self._mask_activation(mask)
+    mask = mask[:, tf.newaxis]  # add K=1 axis since K is removed by mixture
+    mix_dist = tfd.Categorical(logits=mask)
+
+    # COMPONENTS
+    sg.guard(pixel, "B, K, H, W, Cp")
+    params = tf.transpose(pixel, perm=[0, 2, 3, 1, 4])
+    params = params[:, tf.newaxis]  # add K=1 axis since K is removed by mixture
+    dist = self._dist(params)
+    return tfd.MixtureSameFamily(
+        mixture_distribution=mix_dist, components_distribution=dist)
+
+  @property
+  def input_shapes(self):
+    pixel = [self._num_components] + self._dist.input_shapes.params
+    mask = pixel[:-1] + [1]
+    return MixtureParameters(pixel, mask)
+
+  def get_default_prior(self, batch_dim=(1,)):
+    pixel = tf.zeros(
+        list(batch_dim) + self.input_shapes.pixel, dtype=tf.float32)
+    mask = tf.zeros(list(batch_dim) + self.input_shapes.mask, dtype=tf.float32)
+    return self(pixel, mask)
@@ -0,0 +1,206 @@
+# Lint as: python3
+# Copyright 2019 Deepmind Technologies Limited.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Factor Evaluation Module."""
+# pylint: disable=unused-variable
+
+import collections
+import functools
+from iodine.modules import utils
+import shapeguard
+import sonnet as snt
+import tensorflow.compat.v1 as tf
+
+Factor = collections.namedtuple("Factor", ["name", "size", "type"])
+
+
+class FactorRegressor(snt.AbstractModule):
+  """Assess representations by learning a linear mapping to latents."""
+
+  def __init__(self, mapping=None, name="repres_content"):
+    super().__init__(name=name)
+    if mapping is None:
+      self._mapping = [
+          Factor("color", 3, "scalar"),
+          Factor("shape", 4, "categorical"),
+          Factor("scale", 1, "scalar"),
+          Factor("x", 1, "scalar"),
+          Factor("y", 1, "scalar"),
+          Factor("orientation", 2, "angle"),
+      ]
+    else:
+      self._mapping = [Factor(*m) for m in mapping]
+
+  def _build(self, z, latent, visibility, pred_mask, true_mask):
+    sg = shapeguard.ShapeGuard()
+    z = sg.guard(z, "B, K, Z")
+    pred_mask = sg.guard(pred_mask, "B, K, H, W, 1")
+    true_mask = sg.guard(true_mask, "B, L, H, W, 1")
+
+    visibility = sg.guard(visibility, "B, L")
+    num_visible_obj = tf.reduce_sum(visibility)
+
+    # Map z to predictions for all latents
+    sg.M = sum([m.size for m in self._mapping])
+    self.predictor = snt.Linear(sg.M, name="predict_latents")
+    z_flat = sg.reshape(z, "B*K, Z")
+    all_preds = sg.guard(self.predictor(z_flat), "B*K, M")
+    all_preds = sg.reshape(all_preds, "B, 1, K, M")
+    all_preds = tf.tile(all_preds, sg["1, L, 1, 1"])
+
+    # prepare latents
+    latents = {}
+    mean_var_tot = {}
+    for m in self._mapping:
+      with tf.name_scope(m.name):
+        # preprocess, reshape, and tile
+        lat_preprocess = self.get_preprocessing(m)
+        lat = sg.guard(
+            lat_preprocess(latent[m.name]), "B, L, {}".format(m.size))
+        # compute mean over latent by training a variable using mse
+        if m.type in {"scalar", "angle"}:
+          mvt = utils.OnlineMeanVarEstimator(
+              axis=[0, 1], ddof=1, name="{}_mean_var".format(m.name))
+          mean_var_tot[m.name] = mvt(lat, visibility[:, :, tf.newaxis])
+
+        lat = tf.reshape(lat, sg["B, L, 1"] + [-1])
+        lat = tf.tile(lat, sg["1, 1, K, 1"])
+        latents[m.name] = lat
+
+    # prepare predictions
+    idx = 0
+    predictions = {}
+    for m in self._mapping:
+      with tf.name_scope(m.name):
+        assert m.name in latent, "{} not in {}".format(m.name, latent.keys())
+        pred = all_preds[Ellipsis, idx:idx + m.size]
+        predictions[m.name] = sg.guard(pred, "B, L, K, {}".format(m.size))
+        idx += m.size
+
+    # compute error
+    total_pairwise_errors = None
+    for m in self._mapping:
+      with tf.name_scope(m.name):
+        error_fn = self.get_error_func(m)
+        sg.guard(latents[m.name], "B, L, K, {}".format(m.size))
+        sg.guard(predictions[m.name], "B, L, K, {}".format(m.size))
+        err = error_fn(latents[m.name], predictions[m.name])
+        sg.guard(err, "B, L, K")
+        if total_pairwise_errors is None:
+          total_pairwise_errors = err
+        else:
+          total_pairwise_errors += err
+
+    # determine best assignment by comparing masks
+    obj_mask = true_mask[:, :, tf.newaxis]
+    pred_mask = pred_mask[:, tf.newaxis]
+    pairwise_overlap = tf.reduce_sum(obj_mask * pred_mask, axis=[3, 4, 5])
+    best_match = sg.guard(tf.argmax(pairwise_overlap, axis=2), "B, L")
+    assignment = tf.one_hot(best_match, sg.K)
+    assignment *= visibility[:, :, tf.newaxis]  # Mask non-visible objects
+
+    # total error
+    total_error = (
+        tf.reduce_sum(assignment * total_pairwise_errors) / num_visible_obj)
+
+    # compute scalars
+    monitored_scalars = {}
+    for m in self._mapping:
+      with tf.name_scope(m.name):
+        metric = self.get_metric(m)
+        scalar = metric(
+            latents[m.name],
+            predictions[m.name],
+            assignment[:, :, :, tf.newaxis],
+            mean_var_tot.get(m.name),
+            num_visible_obj,
+        )
+        monitored_scalars[m.name] = scalar
+    return total_error, monitored_scalars, mean_var_tot, predictions, assignment
+
+  @snt.reuse_variables
+  def predict(self, z):
+    sg = shapeguard.ShapeGuard()
+    z = sg.guard(z, "B, Z")
+    all_preds = sg.guard(self.predictor(z), "B, M")
+
+    idx = 0
+    predictions = {}
+    for m in self._mapping:
+      with tf.name_scope(m.name):
+        pred = all_preds[:, idx:idx + m.size]
+        predictions[m.name] = sg.guard(pred, "B, {}".format(m.size))
+        idx += m.size
+    return predictions
+
+  @staticmethod
+  def get_error_func(factor):
+    if factor.type in {"scalar", "angle"}:
+      return sse
+    elif factor.type == "categorical":
+      return functools.partial(
+          tf.losses.softmax_cross_entropy, reduction="none")
+    else:
+      raise KeyError(factor.type)
+
+  @staticmethod
+  def get_metric(factor):
+    if factor.type in {"scalar", "angle"}:
+      return r2
+    elif factor.type == "categorical":
+      return accuracy
+    else:
+      raise KeyError(factor.type)
+
+  @staticmethod
+  def one_hot(f, nr_categories):
+    return tf.one_hot(tf.cast(f[Ellipsis, 0], tf.int32), depth=nr_categories)
+
+  @staticmethod
+  def angle_to_vector(theta):
+    return tf.concat([tf.math.cos(theta), tf.math.sin(theta)], axis=-1)
+
+  @staticmethod
+  def get_preprocessing(factor):
+    if factor.type == "scalar":
+      return tf.identity
+    elif factor.type == "categorical":
+      return functools.partial(
+          FactorRegressor.one_hot, nr_categories=factor.size)
+    elif factor.type == "angle":
+      return FactorRegressor.angle_to_vector
+    else:
+      raise KeyError(factor.type)
+
+
+def sse(true, pred):
+  # run our own sum squared error because we want to reduce sum over last dim
+  return tf.reduce_sum(tf.square(true - pred), axis=-1)
+
+
+def accuracy(labels, logits, assignment, mean_var_tot, num_vis):
+  del mean_var_tot  # unused
+  pred = tf.argmax(logits, axis=-1, output_type=tf.int32)
+  labels = tf.argmax(labels, axis=-1, output_type=tf.int32)
+  correct = tf.cast(tf.equal(labels, pred), tf.float32)
+  return tf.reduce_sum(correct * assignment[Ellipsis, 0]) / num_vis
+
+
+def r2(labels, pred, assignment, mean_var_tot, num_vis):
+  del num_vis  # unused
+  mean, var, _ = mean_var_tot
+  # labels, pred: (B, L, K, n)
+  ss_res = tf.reduce_sum(tf.square(labels - pred) * assignment, axis=2)
+  ss_tot = var[tf.newaxis, tf.newaxis, :]  # (1, 1, n)
+  return tf.reduce_mean(1.0 - ss_res / ss_tot)
@@ -0,0 +1,300 @@
+# Lint as: python3
+# Copyright 2019 Deepmind Technologies Limited.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Network modules."""
+# pylint: disable=g-multiple-import, g-doc-args, g-short-docstring-punctuation
+# pylint: disable=g-no-space-after-docstring-summary
+from iodine.modules.distributions import FlatParameters
+from iodine.modules.utils import flatten_all_but_last, get_act_func
+import numpy as np
+import shapeguard
+import sonnet as snt
+import tensorflow.compat.v1 as tf
+
+
+class CNN(snt.AbstractModule):
+  """ConvNet2D followed by an MLP.
+
+  This is a typical encoder architecture for VAEs, and has been found to work
+  well. One small improvement is to append coordinate channels on the input,
+  though for most datasets the improvement obtained is negligible.
+  """
+
+  def __init__(self, cnn_opt, mlp_opt, mode="flatten", name="cnn"):
+    """Constructor.
+
+        Args:
+          cnn_opt: Dictionary. Kwargs for the cnn. See vae_lib.ConvNet2D for
+            details.
+          mlp_opt: Dictionary. Kwargs for the mlp. See vae_lib.MLP for details.
+          name: String. Optional name.
+    """
+    super().__init__(name=name)
+    if "activation" in cnn_opt:
+      cnn_opt["activation"] = get_act_func(cnn_opt["activation"])
+    self._cnn_opt = cnn_opt
+
+    if "activation" in mlp_opt:
+      mlp_opt["activation"] = get_act_func(mlp_opt["activation"])
+    self._mlp_opt = mlp_opt
+
+    self._mode = mode
+
+  def set_output_shapes(self, shape):
+    # assert self._mlp_opt['output_sizes'][-1] is None, self._mlp_opt
+    sg = shapeguard.ShapeGuard()
+    sg.guard(shape, "1, Y")
+    self._mlp_opt["output_sizes"][-1] = sg.Y
+
+  def _build(self, image):
+    """Connect model to TensorFlow graph."""
+    assert self._mlp_opt["output_sizes"][-1] is not None, "set output_shapes"
+    sg = shapeguard.ShapeGuard()
+    flat_image, unflatten = flatten_all_but_last(image, n_dims=3)
+    sg.guard(flat_image, "B, H, W, C")
+
+    cnn = snt.nets.ConvNet2D(
+        activate_final=True,
+        paddings=("SAME",),
+        normalize_final=False,
+        **self._cnn_opt)
+    mlp = snt.nets.MLP(**self._mlp_opt)
+
+    # run CNN
+    net = cnn(flat_image)
+
+    if self._mode == "flatten":
+      # flatten
+      net_shape = net.get_shape().as_list()
+      flat_shape = net_shape[:-3] + [np.prod(net_shape[-3:])]
+      net = tf.reshape(net, flat_shape)
+    elif self._mode == "avg_pool":
+      net = tf.reduce_mean(net, axis=[1, 2])
+    else:
+      raise KeyError('Unknown mode "{}"'.format(self._mode))
+    # run MLP
+    output = sg.guard(mlp(net), "B, Y")
+    return FlatParameters(unflatten(output))
+
+
+class MLP(snt.AbstractModule):
+  """MLP."""
+
+  def __init__(self, name="mlp", **mlp_opt):
+    super().__init__(name=name)
+    if "activation" in mlp_opt:
+      mlp_opt["activation"] = get_act_func(mlp_opt["activation"])
+    self._mlp_opt = mlp_opt
+    assert mlp_opt["output_sizes"][-1] is None, mlp_opt
+
+  def set_output_shapes(self, shape):
+    sg = shapeguard.ShapeGuard()
+    sg.guard(shape, "1, Y")
+    self._mlp_opt["output_sizes"][-1] = sg.Y
+
+  def _build(self, data):
+    """Connect model to TensorFlow graph."""
+    assert self._mlp_opt["output_sizes"][-1] is not None, "set output_shapes"
+    sg = shapeguard.ShapeGuard()
+    flat_data, unflatten = flatten_all_but_last(data)
+    sg.guard(flat_data, "B, N")
+
+    mlp = snt.nets.MLP(**self._mlp_opt)
+    # run MLP
+    output = sg.guard(mlp(flat_data), "B, Y")
+    return FlatParameters(unflatten(output))
+
+
+class DeConv(snt.AbstractModule):
+  """MLP followed by Deconv net.
+
+  This decoder is commonly used by vanilla VAE models. However, in practice
+  BroadcastConv (see below) seems to disentangle slightly better.
+  """
+
+  def __init__(self, mlp_opt, cnn_opt, name="deconv"):
+    """Constructor.
+
+        Args:
+          mlp_opt: Dictionary. Kwargs for vae_lib.MLP.
+          cnn_opt: Dictionary. Kwargs for vae_lib.ConvNet2D for the CNN.
+          name: Optional name.
+    """
+    super().__init__(name=name)
+    assert cnn_opt["output_channels"][-1] is None, cnn_opt
+    if "activation" in cnn_opt:
+      cnn_opt["activation"] = get_act_func(cnn_opt["activation"])
+    self._cnn_opt = cnn_opt
+
+    if mlp_opt and "activation" in mlp_opt:
+      mlp_opt["activation"] = get_act_func(mlp_opt["activation"])
+    self._mlp_opt = mlp_opt
+    self._target_out_shape = None
+
+  def set_output_shapes(self, shape):
+    self._target_out_shape = shape
+    self._cnn_opt["output_channels"][-1] = self._target_out_shape[-1]
+
+  def _build(self, z):
+    """Connect model to TensorFlow graph."""
+    sg = shapeguard.ShapeGuard()
+    flat_z, unflatten = flatten_all_but_last(z)
+    sg.guard(flat_z, "B, Z")
+    sg.guard(self._target_out_shape, "H, W, C")
+    mlp = snt.nets.MLP(**self._mlp_opt)
+    cnn = snt.nets.ConvNet2DTranspose(
+        paddings=("SAME",), normalize_final=False, **self._cnn_opt)
+    net = mlp(flat_z)
+    output = sg.guard(cnn(net), "B, H, W, C")
+    return FlatParameters(unflatten(output))
+
+
+class BroadcastConv(snt.AbstractModule):
+  """MLP followed by a broadcast convolution.
+
+  This decoder takes a latent vector z, (optionally) applies an MLP to it,
+  then tiles the resulting vector across space to have dimension [B, H, W, C]
+  i.e. tiles across H and W. Then coordinate channels are appended and a
+  convolutional layer is applied.
+  """
+
+  def __init__(
+      self,
+      cnn_opt,
+      mlp_opt=None,
+      coord_type="linear",
+      coord_freqs=3,
+      name="broadcast_conv",
+  ):
+    """Args:
+          cnn_opt: dict Kwargs for vae_lib.ConvNet2D for the CNN.
+          mlp_opt: None or dict If dictionary, then kwargs for snt.nets.MLP. If
+            None, then the model will not process the latent vector by an mlp.
+          coord_type: ["linear", "cos", None] type of coordinate channels to
+            add.
+            None: add no coordinate channels.
+            linear: two channels with values linearly spaced from -1. to 1. in
+              the H and W dimension respectively.
+            cos: coord_freqs^2 many channels containing cosine basis functions.
+          coord_freqs: int number of frequencies used to construct the cosine
+            basis functions (only for coord_type=="cos")
+          name: Optional name.
+    """
+    super().__init__(name=name)
+
+    assert cnn_opt["output_channels"][-1] is None, cnn_opt
+    if "activation" in cnn_opt:
+      cnn_opt["activation"] = get_act_func(cnn_opt["activation"])
+    self._cnn_opt = cnn_opt
+
+    if mlp_opt and "activation" in mlp_opt:
+      mlp_opt["activation"] = get_act_func(mlp_opt["activation"])
+    self._mlp_opt = mlp_opt
+
+    self._target_out_shape = None
+    self._coord_type = coord_type
+    self._coord_freqs = coord_freqs
+
+  def set_output_shapes(self, shape):
+    self._target_out_shape = shape
+    self._cnn_opt["output_channels"][-1] = self._target_out_shape[-1]
+
+  def _build(self, z):
+    """Connect model to TensorFlow graph."""
+    assert self._target_out_shape is not None, "Call set_output_shape"
+    # reshape components into batch dimension before processing them
+    sg = shapeguard.ShapeGuard()
+    flat_z, unflatten = flatten_all_but_last(z)
+    sg.guard(flat_z, "B, Z")
+    sg.guard(self._target_out_shape, "H, W, C")
+
+    if self._mlp_opt is None:
+      mlp = tf.identity
+    else:
+      mlp = snt.nets.MLP(activate_final=True, **self._mlp_opt)
+    mlp_output = sg.guard(mlp(flat_z), "B, hidden")
+
+    # tile MLP output spatially and append coordinate channels
+    broadcast_mlp_output = tf.tile(
+        mlp_output[:, tf.newaxis, tf.newaxis],
+        multiples=tf.constant(sg["1, H, W, 1"]),
+    )  # B, H, W, Z
+
+    dec_cnn_inputs = self.append_coordinate_channels(broadcast_mlp_output)
+
+    cnn = snt.nets.ConvNet2D(
+        paddings=("SAME",), normalize_final=False, **self._cnn_opt)
+    cnn_outputs = cnn(dec_cnn_inputs)
+    sg.guard(cnn_outputs, "B, H, W, C")
+
+    return FlatParameters(unflatten(cnn_outputs))
+
+  def append_coordinate_channels(self, output):
+    sg = shapeguard.ShapeGuard()
+    sg.guard(output, "B, H, W, C")
+    if self._coord_type is None:
+      return output
+    if self._coord_type == "linear":
+      w_coords = tf.linspace(-1.0, 1.0, sg.W)[None, None, :, None]
+      h_coords = tf.linspace(-1.0, 1.0, sg.H)[None, :, None, None]
+      w_coords = tf.tile(w_coords, sg["B, H, 1, 1"])
+      h_coords = tf.tile(h_coords, sg["B, 1, W, 1"])
+      return tf.concat([output, h_coords, w_coords], axis=-1)
+    elif self._coord_type == "cos":
+      freqs = sg.guard(tf.range(0.0, self._coord_freqs), "F")
+      valx = tf.linspace(0.0, np.pi, sg.W)[None, None, :, None, None]
+      valy = tf.linspace(0.0, np.pi, sg.H)[None, :, None, None, None]
+      x_basis = tf.cos(valx * freqs[None, None, None, :, None])
+      y_basis = tf.cos(valy * freqs[None, None, None, None, :])
+      xy_basis = tf.reshape(x_basis * y_basis, sg["1, H, W, F*F"])
+      coords = tf.tile(xy_basis, sg["B,  1, 1, 1"])[Ellipsis, 1:]
+      return tf.concat([output, coords], axis=-1)
+    else:
+      raise KeyError('Unknown coord_type: "{}"'.format(self._coord_type))
+
+
+class LSTM(snt.RNNCore):
+  """Wrapper around snt.LSTM that supports multi-layers and runs K components in
+  parallel.
+
+  Expects input data of shape (B, K, H) and outputs data of shape (B, K, Y)
+  """
+
+  def __init__(self, hidden_sizes, name="lstm"):
+    super().__init__(name=name)
+    self._hidden_sizes = hidden_sizes
+    with self._enter_variable_scope():
+      self._lstm_layers = [snt.LSTM(hidden_size=h) for h in self._hidden_sizes]
+
+  def initial_state(self, batch_size, **kwargs):
+    return [
+        lstm.initial_state(batch_size, **kwargs) for lstm in self._lstm_layers
+    ]
+
+  def _build(self, data, prev_states):
+    assert not self._hidden_sizes or self._hidden_sizes[-1] is not None
+    assert len(prev_states) == len(self._hidden_sizes)
+    sg = shapeguard.ShapeGuard()
+    sg.guard(data, "B, K, H")
+    data = sg.reshape(data, "B*K, H")
+
+    out = data
+    new_states = []
+    for lstm, pstate in zip(self._lstm_layers, prev_states):
+      out, nstate = lstm(out, pstate)
+      new_states.append(nstate)
+
+    sg.guard(out, "B*K, Y")
+    out = sg.reshape(out, "B, K, Y")
+    return out, new_states
@@ -0,0 +1,226 @@
+# Lint as: python3
+# Copyright 2019 Deepmind Technologies Limited.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Plotting tools for IODINE."""
+# pylint: disable=unused-import, missing-docstring, unused-variable
+# pylint: disable=invalid-name, unexpected-keyword-arg
+import functools
+from iodine.modules.utils import get_mask_plot_colors
+from matplotlib.colors import hsv_to_rgb
+import matplotlib.pyplot as plt
+import numpy as np
+
+__all__ = ("get_mask_plot_colors", "example_plot", "iterations_plot",
+           "inputs_plot")
+
+
+def clean_ax(ax, color=None, lw=4.0):
+  ax.set_xticks([])
+  ax.set_yticks([])
+  if color is not None:
+    for spine in ax.spines.values():
+      spine.set_linewidth(lw)
+      spine.set_color(color)
+
+
+def optional_ax(fn):
+
+  def _wrapped(*args, **kwargs):
+    if kwargs.get("ax", None) is None:
+      figsize = kwargs.pop("figsize", (4, 4))
+      fig, ax = plt.subplots(figsize=figsize)
+      kwargs["ax"] = ax
+    return fn(*args, **kwargs)
+
+  return _wrapped
+
+
+def optional_clean_ax(fn):
+
+  def _wrapped(*args, **kwargs):
+    if kwargs.get("ax", None) is None:
+      figsize = kwargs.pop("figsize", (4, 4))
+      fig, ax = plt.subplots(figsize=figsize)
+      kwargs["ax"] = ax
+    color = kwargs.pop("color", None)
+    lw = kwargs.pop("lw", 4.0)
+    res = fn(*args, **kwargs)
+    clean_ax(kwargs["ax"], color, lw)
+    return res
+
+  return _wrapped
+
+
+@optional_clean_ax
+def show_img(img, mask=None, ax=None, norm=False):
+  if norm:
+    vmin, vmax = np.min(img), np.max(img)
+    img = (img - vmin) / (vmax - vmin)
+  if mask is not None:
+    img = img * mask + np.ones_like(img) * (1.0 - mask)
+
+  return ax.imshow(img.clip(0.0, 1.0), interpolation="nearest")
+
+
+@optional_clean_ax
+def show_mask(m, ax):
+  color_conv = get_mask_plot_colors(m.shape[0])
+  color_mask = np.dot(np.transpose(m, [1, 2, 0]), color_conv)
+  return ax.imshow(color_mask.clip(0.0, 1.0), interpolation="nearest")
+
+
+@optional_clean_ax
+def show_mat(m, ax, vmin=None, vmax=None, cmap="viridis"):
+  return ax.matshow(
+      m[Ellipsis, 0], cmap=cmap, vmin=vmin, vmax=vmax, interpolation="nearest")
+
+
+@optional_clean_ax
+def show_coords(m, ax):
+  vmin, vmax = np.min(m), np.max(m)
+  m = (m - vmin) / (vmax - vmin)
+  color_conv = get_mask_plot_colors(m.shape[-1])
+  color_mask = np.dot(m, color_conv)
+  return ax.imshow(color_mask, interpolation="nearest")
+
+
+def example_plot(rinfo,
+                 b=0,
+                 t=-1,
+                 mask_components=False,
+                 size=2,
+                 column_titles=True):
+  image = rinfo["data"]["image"][b, 0]
+  recons = rinfo["outputs"]["recons"][b, t, 0]
+  pred_mask = rinfo["outputs"]["pred_mask"][b, t]
+  components = rinfo["outputs"]["components"][b, t]
+
+  K, H, W, C = components.shape
+  colors = get_mask_plot_colors(K)
+
+  nrows = 1
+  ncols = 3 + K
+  fig, axes = plt.subplots(ncols=ncols, figsize=(ncols * size, nrows * size))
+
+  show_img(image, ax=axes[0], color="#000000")
+  show_img(recons, ax=axes[1], color="#000000")
+  show_mask(pred_mask[Ellipsis, 0], ax=axes[2], color="#000000")
+  for k in range(K):
+    mask = pred_mask[k] if mask_components else None
+    show_img(components[k], ax=axes[k + 3], color=colors[k], mask=mask)
+
+  if column_titles:
+    labels = ["Image", "Recons.", "Mask"
+             ] + ["Component {}".format(k + 1) for k in range(K)]
+    for ax, title in zip(axes, labels):
+      ax.set_title(title)
+  plt.subplots_adjust(hspace=0.03, wspace=0.035)
+  return fig
+
+
+def iterations_plot(rinfo, b=0, mask_components=False, size=2):
+  image = rinfo["data"]["image"][b]
+  true_mask = rinfo["data"]["true_mask"][b]
+  recons = rinfo["outputs"]["recons"][b]
+  pred_mask = rinfo["outputs"]["pred_mask"][b]
+  pred_mask_logits = rinfo["outputs"]["pred_mask_logits"][b]
+  components = rinfo["outputs"]["components"][b]
+
+  T, K, H, W, C = components.shape
+  colors = get_mask_plot_colors(K)
+  nrows = T + 1
+  ncols = 2 + K
+  fig, axes = plt.subplots(
+      nrows=nrows, ncols=ncols, figsize=(ncols * size, nrows * size))
+  for t in range(T):
+    show_img(recons[t, 0], ax=axes[t, 0])
+    show_mask(pred_mask[t, Ellipsis, 0], ax=axes[t, 1])
+    axes[t, 0].set_ylabel("iter {}".format(t))
+    for k in range(K):
+      mask = pred_mask[t, k] if mask_components else None
+      show_img(components[t, k], ax=axes[t, k + 2], color=colors[k], mask=mask)
+
+  axes[0, 0].set_title("Reconstruction")
+  axes[0, 1].set_title("Mask")
+  show_img(image[0], ax=axes[T, 0])
+  show_mask(true_mask[0, Ellipsis, 0], ax=axes[T, 1])
+  vmin = np.min(pred_mask_logits[T - 1])
+  vmax = np.max(pred_mask_logits[T - 1])
+
+  for k in range(K):
+    axes[0, k + 2].set_title("Component {}".format(k + 1))  # , color=colors[k])
+    show_mat(
+        pred_mask_logits[T - 1, k], ax=axes[T, k + 2], vmin=vmin, vmax=vmax)
+    axes[T, k + 2].set_xlabel(
+        "Mask Logits for\nComponent {}".format(k + 1))  # , color=colors[k])
+  axes[T, 0].set_xlabel("Input Image")
+  axes[T, 1].set_xlabel("Ground Truth Mask")
+
+  plt.subplots_adjust(wspace=0.05, hspace=0.05)
+  return fig
+
+
+def inputs_plot(rinfo, b=0, t=0, size=2):
+  B, T, K, H, W, C = rinfo["outputs"]["components"].shape
+  colors = get_mask_plot_colors(K)
+  inputs = rinfo["inputs"]["spatial"]
+  rows = [
+      ("image", show_img, False),
+      ("components", show_img, False),
+      ("dcomponents", functools.partial(show_img, norm=True), False),
+      ("mask", show_mat, True),
+      ("pred_mask", show_mat, True),
+      ("dmask", functools.partial(show_mat, cmap="coolwarm"), True),
+      ("posterior", show_mat, True),
+      ("log_prob", show_mat, True),
+      ("counterfactual", show_mat, True),
+      ("coordinates", show_coords, False),
+  ]
+  rows = [(n, f, mcb) for n, f, mcb in rows if n in inputs]
+  nrows = len(rows)
+  ncols = K + 1
+
+  fig, axes = plt.subplots(
+      nrows=nrows,
+      ncols=ncols,
+      figsize=(ncols * size - size * 0.9, nrows * size),
+      gridspec_kw={"width_ratios": [1] * K + [0.1]},
+  )
+  for r, (name, plot_fn, make_cbar) in enumerate(rows):
+    axes[r, 0].set_ylabel(name)
+    if make_cbar:
+      vmin = np.min(inputs[name][b, t])
+      vmax = np.max(inputs[name][b, t])
+      if np.abs(vmin - vmax) < 1e-6:
+        vmin -= 0.1
+        vmax += 0.1
+      plot_fn = functools.partial(plot_fn, vmin=vmin, vmax=vmax)
+      # print("range of {:<16}: [{:0.2f}, {:0.2f}]".format(name, vmin, vmax))
+    for k in range(K):
+      if inputs[name].shape[2] == 1:
+        m = inputs[name][b, t, 0]
+        color = (0.0, 0.0, 0.0)
+      else:
+        m = inputs[name][b, t, k]
+        color = colors[k]
+      mappable = plot_fn(m, ax=axes[r, k], color=color)
+    if make_cbar:
+      fig.colorbar(mappable, cax=axes[r, K])
+    else:
+      axes[r, K].set_visible(False)
+  for k in range(K):
+    axes[0, k].set_title("Component {}".format(k + 1))  # , color=colors[k])
+
+  plt.subplots_adjust(hspace=0.05, wspace=0.05)
+  return fig
@@ -0,0 +1,163 @@
+# Lint as: python3
+# Copyright 2019 Deepmind Technologies Limited.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Iterative refinement modules."""
+# pylint: disable=g-doc-bad-indent, unused-variable
+from iodine.modules import utils
+import shapeguard
+import sonnet as snt
+import tensorflow.compat.v1 as tf
+
+
+class RefinementCore(snt.RNNCore):
+  """Recurrent Refinement Module.
+
+    Refinement modules take as inputs:
+      * previous state (which could be an arbitrary nested structure)
+      * current inputs which include
+        * image-space inputs like pixel-based errors, or mask-posteriors
+        * latent-space inputs like the previous z_dist, or dz
+
+    They use these inputs to produce:
+      * output (usually a new z_dist)
+      * new_state
+    """
+
+  def __init__(self,
+               encoder_net,
+               recurrent_net,
+               refinement_head,
+               name="refinement"):
+    super().__init__(name=name)
+    self._encoder_net = encoder_net
+    self._recurrent_net = recurrent_net
+    self._refinement_head = refinement_head
+    self._sg = shapeguard.ShapeGuard()
+
+  def initial_state(self, batch_size, **unused_kwargs):
+    return self._recurrent_net.initial_state(batch_size)
+
+  def _build(self, inputs, prev_state):
+    sg = self._sg
+    assert "spatial" in inputs, inputs.keys()
+    assert "flat" in inputs, inputs.keys()
+    assert "zp" in inputs["flat"], inputs["flat"].keys()
+    zp = sg.guard(inputs["flat"]["zp"], "B, K, Zp")
+
+    x = sg.guard(self.prepare_spatial_inputs(inputs["spatial"]), "B*K, H, W, C")
+    h1 = sg.guard(self._encoder_net(x).params, "B*K, H1")
+    h2 = sg.guard(self.prepare_flat_inputs(h1, inputs["flat"]), "B*K, H2")
+    h2_unflattened = sg.reshape(h2, "B, K, H2")
+    h3, next_state = self._recurrent_net(h2_unflattened, prev_state)
+    sg.guard(h3, "B, K, H3")
+    outputs = sg.guard(self._refinement_head(zp, h3), "B, K, Y")
+
+    del self._sg.B
+    return outputs, next_state
+
+  def prepare_spatial_inputs(self, inputs):
+    values = []
+    for name, val in sorted(inputs.items(), key=lambda it: it[0]):
+      if val.shape.as_list()[1] == 1:
+        self._sg.guard(val, "B, 1, H, W, _C")
+        val = tf.tile(val, self._sg["1, K, 1, 1, 1"])
+      else:
+        self._sg.guard(val, "B, K, H, W, _C")
+      values.append(val)
+    concat_inputs = self._sg.guard(tf.concat(values, axis=-1), "B, K, H, W, C")
+    return self._sg.reshape(concat_inputs, "B*K, H, W, C")
+
+  def prepare_flat_inputs(self, hidden, inputs):
+    values = [self._sg.guard(hidden, "B*K, H1")]
+
+    for name, val in sorted(inputs.items(), key=lambda it: it[0]):
+      self._sg.guard(val, "B, K, _")
+      val_flat = tf.reshape(val, self._sg["B*K"] + [-1])
+      values.append(val_flat)
+    return tf.concat(values, axis=-1)
+
+
+class ResHead(snt.AbstractModule):
+  """Updates Zp using a residual mechanism."""
+
+  def __init__(self, name="residual_head"):
+    super().__init__(name=name)
+
+  def _build(self, zp_old, inputs):
+    sg = shapeguard.ShapeGuard()
+    sg.guard(zp_old, "B, K, Zp")
+    sg.guard(inputs, "B, K, H")
+    update = snt.Linear(sg.Zp)
+
+    flat_zp = sg.reshape(zp_old, "B*K, Zp")
+    flat_inputs = sg.reshape(inputs, "B*K, H")
+
+    zp = flat_zp + update(flat_inputs)
+
+    return sg.reshape(zp, "B, K, Zp")
+
+
+class PredictorCorrectorHead(snt.AbstractModule):
+  """This refinement head is used for sequential data.
+
+    At every step it computes a prediction from the λ of the previous timestep
+    and an update from the refinement network of the current timestep.
+
+    The next step λ' is computed as a gated combination of both:
+    λ' = g * λ_corr + (1-g) * λ_pred
+
+    """
+
+  def __init__(
+      self,
+      hidden_sizes=(64,),
+      pred_gate_bias=0.0,
+      corrector_gate_bias=0.0,
+      activation=tf.nn.elu,
+      name="predcorr_head",
+  ):
+    super().__init__(name=name)
+    self._hidden_sizes = hidden_sizes
+    self._activation = utils.get_act_func(activation)
+    self._pred_gate_bias = pred_gate_bias
+    self._corrector_gate_bias = corrector_gate_bias
+
+  def _build(self, zp_old, inputs):
+    sg = shapeguard.ShapeGuard()
+    sg.guard(zp_old, "B, K, Zp")
+    sg.guard(inputs, "B, K, H")
+    update = snt.Linear(sg.Zp)
+    update_gate = snt.Linear(sg.Zp)
+    predict = snt.nets.MLP(
+        output_sizes=list(self._hidden_sizes) + [sg.Zp * 2],
+        activation=self._activation,
+    )
+
+    flat_zp = sg.reshape(zp_old, "B*K, Zp")
+    flat_inputs = sg.reshape(inputs, "B*K, H")
+
+    g = tf.nn.sigmoid(update_gate(flat_inputs) + self._corrector_gate_bias)
+    u = update(flat_inputs)
+
+    # a slightly more efficient way of computing the gated update
+    # (1-g) * flat_zp + g * u
+    zp_corrected = flat_zp + g * (u - flat_zp)
+
+    predicted = predict(flat_zp)
+    pred_up = predicted[:, :sg.Zp]
+    pred_gate = tf.nn.sigmoid(predicted[:, sg.Zp:] + self._pred_gate_bias)
+
+    zp = zp_corrected + pred_gate * (pred_up - zp_corrected)
+
+    return sg.reshape(zp, "B, K, Zp")
@@ -0,0 +1,9 @@
+tensorflow-gpu==1.14.0
+tensorflow-probability==0.7.0
+dm-sonnet==1.35
+sacred>=0.7,<0.8
+shapeguard
+seaborn
+pymongo
+jupyterlab
+git+git://github.com/deepmind/multi_object_datasets.git
@@ -0,0 +1,36 @@
+#!/bin/sh
+# Copyright 2019 Deepmind Technologies Limited.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+set -e
+
+echo "downloading checkpoints from GCP"
+iodine/download_checkpoints.sh
+
+python3 -m venv iodine_venv
+source iodine_venv/bin/activate
+pip3 install --upgrade setuptools wheel
+pip3 install -r iodine/requirements.txt
+
+# Get some fake data and put it where the real multi_objects_dataset files live.
+mkdir -p iodine/multi_object_datasets
+cp iodine/test_data/tetrominoes_mini.tfrecords iodine/multi_object_datasets/tetrominoes_train.tfrecords
+
+# Run training with a cut down size.
+python3 -m iodine.main \
+  -f with tetrominoes \
+  data.shuffle_buffer=2 \
+  data.batch_size=2 \
+  n_z=4 \
+  num_components=3 \
+  stop_after_steps=11