Initial release of A Realistic Simulation Framework for Learning with Label Noise

PiperOrigin-RevId: 384875129
2026-05-31 21:15:21 +08:00 · 2021-07-15 09:28:02 +01:00
parent 8fffed3922
commit ea6c6c5782
4 changed files with 594 additions and 0 deletions
@@ -0,0 +1,133 @@
+<img src="paradigm.png" width="50%">
+
+# A Realistic Simulation Framework for Learning with Label Noise
+
+We propose a simulation framework for generating realistic instance-dependent
+noisy labels via a pseudo-labeling paradigm. We show that this framework
+generates synthetic noisy labels that exhibit important characteristics of the
+label noise in practical settings. Equipped with controllable label noise, we
+study the negative impact of noisy labels across a few realistic settings to
+understand when label noise is more problematic. Additionally, with the
+availability of annotator information from our simulation framework, we propose
+a new technique, Label Quality Model (LQM), that leverages annotator features to
+predict and correct against noisy labels. We show that by adding LQM as a label
+correction step before applying existing noisy label techniques, we can further
+improve the models' performance.
+
+[A Realistic Simulation Framework for Learning with Label Noise](https://openreview.net/pdf?id=e9P6bypUFd).
+
+In this repository, we provide the link to the datasets that we used in Sections
+4 and 5 of the above paper, along with a colab that demonstrates how to load the
+data and rater features.
+We consider 4 tasks:
+[CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html),
+[CIFAR100](https://www.cs.toronto.edu/~kriz/cifar.html),
+[Patch Camelyon](https://patchcamelyon.grand-challenge.org/),
+and
+[Cats vs Dogs](https://www.microsoft.com/en-us/download/details.aspx?id=54765).
+For each task, we generate three synthetic noisy label
+datasets, named as "low", "medium", and "high" according to the amount of label
+noise. The data are stored as TFRecords and the rater features are stored as
+json files.
+
+The data is available under
+[noisy label synthetic dataset GCP bucket](https://console.cloud.google.com/storage/browser/noisy_label_synthetic_datasets).
+
+The colab that contains details of the datasets and examples for data loading
+is at
+[this colab example](https://github.com/deepmind/deepmind-research/blob/master/noisy_label/noisy_label_datasets_and_rater_features.ipynb)
+
+## License
+The noisy labels and rater features in our datasets are under the
+[CC0 License](https://choosealicense.com/licenses/cc0-1.0/).
+Other parts of the datasets are under the original license of the datasets.
+
+When using the datasets based on CIFAR10/CIFAR100, users are required to
+attribute the following paper:
+
+Learning Multiple Layers of Features from Tiny Images, Alex Krizhevsky, 2009
+
+When using the datasets based on Patch Camelyon, users are required to
+attribute the following paper:
+
+Rotation Equivariant CNNs for Digital Pathology, Bastiaan S. Veeling,
+Jasper Linmans, Jim Winkens, Taco Cohen, and Max Welling, arXiv:1806.03962.
+
+When using the datasets based on Cats vs Dogs, users are required to
+attribute the following paper:
+
+Asirra: a CAPTCHA that exploits interest-aligned manual image categorization,
+Jeremy Elson, John R. Douceur, Jon Howell, and Jared Saul, ACM Conference on
+Computer and Communications Security, 2007.
+
+The colab example is provided under the Apache License, Version 2.0.
+
+
+## Citation
+
+Please use the following bibtex for citations to our paper:
+
+```
+@article{gu2021realistic,
+  title={A Realistic Simulation Framework for Learning with Label Noise},
+  author={Gu, Keren and Masotto, Xander and Bachani, Vandana and Lakshminarayanan, Balaji and Nikodem, Jack and Yin, Dong},
+  year={2021}
+}
+```
+
+# Dataset Metadata
+
+The following table is necessary for this dataset to be indexed by search
+engines such as <a href="https://g.co/datasetsearch">Google Dataset Search</a>.
+<div itemscope itemtype="http://schema.org/Dataset">
+<table>
+  <tr>
+    <th>property</th>
+    <th>value</th>
+  </tr>
+  <tr>
+    <td>name</td>
+    <td><code itemprop="name">Noisy Label Synthetic Datasets</code></td>
+  </tr>
+  <tr>
+    <td>url</td>
+    <td><code itemprop="url">https://github.com/deepmind/deepmind-research/tree/master/noisy_label</code></td>
+  </tr>
+  <tr>
+    <td>sameAs</td>
+    <td><code itemprop="sameAs">https://github.com/deepmind/deepmind-research/tree/master/noisy_label</code></td>
+  </tr>
+  <tr>
+    <td>description</td>
+    <td><code itemprop="description">
+      Data accompanying
+[A Realistic Simulation Framework for Learning with Label Noise]().
+      </code></td>
+  </tr>
+  <tr>
+    <td>provider</td>
+    <td>
+      <div itemscope itemtype="http://schema.org/Organization" itemprop="provider">
+        <table>
+          <tr>
+            <th>property</th>
+            <th>value</th>
+          </tr>
+          <tr>
+            <td>name</td>
+            <td><code itemprop="name">DeepMind</code></td>
+          </tr>
+          <tr>
+            <td>sameAs</td>
+            <td><code itemprop="sameAs">https://en.wikipedia.org/wiki/DeepMind</code></td>
+          </tr>
+        </table>
+      </div>
+    </td>
+  </tr>
+  <tr>
+    <td>citation</td>
+    <td><code itemprop="citation">https://openreview.net/pdf?id=e9P6bypUFd</code></td>
+  </tr>
+</table>
+</div>
@@ -0,0 +1,460 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "cv38ildJKsei"
+      },
+      "source": [
+        "Copyright 2021 DeepMind Technologies Limited.\n",
+        "\n",
+        "Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use\n",
+        "this file except in compliance with the License. You may obtain a copy of the\n",
+        "License at\n",
+        "\n",
+        "[https://www.apache.org/licenses/LICENSE-2.0](https://www.apache.org/licenses/LICENSE-2.0)\n",
+        "\n",
+        "Unless required by applicable law or agreed to in writing, software distributed\n",
+        "under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR\n",
+        "CONDITIONS OF ANY KIND, either express or implied. See the License for the\n",
+        "specific language governing permissions and limitations under the License."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "tAJQfAHhAxz9"
+      },
+      "source": [
+        "# A Realistic Simulation Framework for Learning with Label Noise\n",
+        "\n",
+        "In this colab, we provide metadata and examples for data loading for the noisy label datasets generated using the pseudo-labeling paradigm propsed in the paper *A Realistic Simulation Framework for Learning with Label Noise*.\n",
+        "We also provide the associated rater features. We consider 4 tasks: CIFAR10 [1], CIFAR100 [1], Patch Camelyon [2,3], and Cats vs Dogs [4]. For each task, we generate three synthetic noisy label datasets, named as \"low\", \"medium\", and \"high\" according to the amount of label noise.\n",
+        "\n",
+        "[1] Krizhevsky, Alex, and Geoffrey Hinton. \"Learning multiple layers of features from tiny images.\", 2009. \\\\\n",
+        "[2] Veeling, Bastiaan S., Jasper Linmans, Jim Winkens, Taco Cohen, and Max Welling. \"Rotation equivariant CNNs for digital pathology.\" In International Conference on Medical image computing and computer-assisted intervention, pp. 210-218. Springer, Cham, 2018. \\\\\n",
+        "[3] Bejnordi, Babak Ehteshami, Mitko Veta, Paul Johannes Van Diest, Bram Van Ginneken, Nico Karssemeijer, Geert Litjens, Jeroen AWM Van Der Laak et al. \"Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer.\" Jama 318, no. 22 (2017): 2199-2210. \\\\\n",
+        "[4] Elson, Jeremy, John R. Douceur, Jon Howell, and Jared Saul. \"Asirra: a CAPTCHA that exploits interest-aligned manual image categorization.\" In ACM Conference on Computer and Communications Security, vol. 7, pp. 366-374. 2007."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "_9SigscwVH_s"
+      },
+      "outputs": [],
+      "source": [
+        "# @title Imports and global variable\n",
+        "import os\n",
+        "import matplotlib.pyplot as plt\n",
+        "import json\n",
+        "import tensorflow as tf\n",
+        "\n",
+        "root_dir = '/root/directory/to/the/dataset/'"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "7Xv86m2rNOFH"
+      },
+      "source": [
+        "**CIFAR10 noisy label datasets**\n",
+        "\n",
+        "**Download size**\n",
+        "*   79MB for each of the low, medium, and high noise datasets.\n",
+        "\n",
+        "**Number of examples**\n",
+        "*   train: 19987, valid: 5021, for each of the low, medium, and high noise datasets.\n",
+        "\n",
+        "Both the train and valid splits are subsampled from the train split of the original CIFAR10 dataset.\n",
+        "\n",
+        "**Data features**\n",
+        "*   \"image/raw\": images in bytes, shape = (32, 32, 3).\n",
+        "*   \"image/class/label\": clean label, tf.int64.\n",
+        "*   \"noisy_labels\": the noisy label given by rater models, a list of 10 tf.int64 integers.\n",
+        "*   \"rater_ids\": the ID of the rater models, a list of 10 tf.string.\n",
+        "\n",
+        "**Rater features**\n",
+        "*   model_name: name of the model\n",
+        "*   accuracy: accuracy of the rater model on the rater validation set\n",
+        "*   loss: loss of the rater model on the rater validation set\n",
+        "*   experience: the total number of data that the rater model has seen during training"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "aPDR8avkiFWA"
+      },
+      "outputs": [],
+      "source": [
+        "# @title An example for loading CIFAR10 noisy label datasets\n",
+        "task_name = 'cifar10'\n",
+        "\n",
+        "# One of ['low', 'medium', 'high']\n",
+        "noise_level = 'low'\n",
+        "\n",
+        "# One of ['train', 'valid']. The `valid` split should be used for\n",
+        "# hyperparameter tuning. The model should be tested on the original test\n",
+        "# slipt for these tasks.\n",
+        "split = 'train'\n",
+        "\n",
+        "# We have 10 rater models for CIFAR10.\n",
+        "num_raters = 10\n",
+        "\n",
+        "directory = os.path.join(root_dir, task_name, noise_level, split) + '*'\n",
+        "raw_image_dataset = tf.data.TFRecordDataset(tf.io.gfile.glob(directory))\n",
+        "\n",
+        "# Create a dictionary describing the features.\n",
+        "image_feature_description = {\n",
+        "    # the raw image\n",
+        "    'image/raw': tf.io.FixedLenFeature([], tf.string),\n",
+        "    # the clean label\n",
+        "    'image/class/label': tf.io.FixedLenFeature([1], tf.int64),\n",
+        "    # noisy labels from all the raters\n",
+        "    'noisy_labels': tf.io.FixedLenFeature([num_raters], tf.int64),\n",
+        "    # the IDs of rater models\n",
+        "    'rater_ids': tf.io.FixedLenFeature([num_raters], tf.string),\n",
+        "}\n",
+        "\n",
+        "def _parse_image_function(example_proto):\n",
+        "  # Parse the input tf.train.Example proto using the dictionary above.\n",
+        "  return tf.io.parse_single_example(example_proto, image_feature_description)\n",
+        "\n",
+        "parsed_image_dataset = raw_image_dataset.map(_parse_image_function)\n",
+        "\n",
+        "for features in parsed_image_dataset.take(1):\n",
+        "  # Check the IDs of the rater models. The rater IDs are the same for all the\n",
+        "  # examples in the dataset.\n",
+        "  rater_ids = features['rater_ids']\n",
+        "  rater_id_string = [r.numpy().decode('utf-8') for r in rater_ids]\n",
+        "  print('The IDs of the rater models for this dataset are:')\n",
+        "  print(rater_id_string)\n",
+        "  clean_label = features['image/class/label'].numpy()\n",
+        "  print('The clean label for the following example is %d' % clean_label)\n",
+        "  noisy_labels = features['noisy_labels'].numpy()\n",
+        "  print('The noisy labels from the rater models are:')\n",
+        "  print(noisy_labels)\n",
+        "  image = tf.reshape(tf.io.decode_raw(features['image/raw'], tf.uint8),\n",
+        "                     (32, 32, 3))\n",
+        "  plt.imshow(image)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "YahpOXMmmhFO"
+      },
+      "source": [
+        "**CIFAR100 noisy label datasets**\n",
+        "\n",
+        "**Download size**\n",
+        "*   82MB for each of the low, medium, and high noise datasets.\n",
+        "\n",
+        "**Number of examples**\n",
+        "*   train: 20114, valid: 4978, for each of the low, medium, and high noise datasets.\n",
+        "\n",
+        "Both the train and valid splits are subsampled from the train split of the original CIFAR100 dataset.\n",
+        "\n",
+        "**Data features**\n",
+        "*   \"image/encoded\": images in bytes, shape=(32, 32, 3).\n",
+        "*   \"image/class/fine_label\": clean fine-grained label, tf.int64.\n",
+        "*   \"image/class/coarse_label\": clean coarse label, tf.int64\n",
+        "*   \"noisy_labels\": the noisy label given by rater models, a list of 11 tf.int64 integers.\n",
+        "*   \"rater_ids\": the ID of the rater models, a list of 11 tf.string.\n",
+        "\n",
+        "**Rater features**\n",
+        "*   model_name: name of the model\n",
+        "*   accuracy: accuracy of the rater model on the rater validation set\n",
+        "*   loss: loss of the rater model on the rater validation set\n",
+        "*   mAP: the mean average precision of the rater model on the rater validation set\n",
+        "*   experience: the total number of data that the rater model has seen during training\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "AccCP_BZnz3S"
+      },
+      "outputs": [],
+      "source": [
+        "# @title An example for loading CIFAR100 noisy label datasets\n",
+        "task_name = 'cifar100'\n",
+        "\n",
+        "# One of ['low', 'medium', 'high']\n",
+        "noise_level = 'high'\n",
+        "\n",
+        "# One of ['train', 'valid']. The `valid` split should be used for\n",
+        "# hyperparameter tuning. The model should be tested on the original test\n",
+        "# slipt for these tasks.\n",
+        "split = 'train'\n",
+        "\n",
+        "# We have 11 rater models for CIFAR100.\n",
+        "num_raters = 11\n",
+        "\n",
+        "directory = os.path.join(root_dir, task_name, noise_level, split) + '*'\n",
+        "raw_image_dataset = tf.data.TFRecordDataset(tf.io.gfile.glob(directory))\n",
+        "\n",
+        "# Create a dictionary describing the features.\n",
+        "image_feature_description = {\n",
+        "    # the raw image\n",
+        "    'image/encoded': tf.io.FixedLenFeature([], tf.string),\n",
+        "    # the fine-grained clean label, value in [0, 99]\n",
+        "    'image/class/fine_label': tf.io.FixedLenFeature([1], tf.int64),\n",
+        "    # the coarse clean label, value in [0, 19]\n",
+        "    'image/class/coarse_label': tf.io.FixedLenFeature([1], tf.int64),\n",
+        "    # noisy labels from all the raters\n",
+        "    'noisy_labels': tf.io.FixedLenFeature([num_raters], tf.int64),\n",
+        "    # the IDs of rater models\n",
+        "    'rater_ids': tf.io.FixedLenFeature([num_raters], tf.string),\n",
+        "}\n",
+        "\n",
+        "def _parse_image_function(example_proto):\n",
+        "  # Parse the input tf.train.Example proto using the dictionary above.\n",
+        "  return tf.io.parse_single_example(example_proto, image_feature_description)\n",
+        "\n",
+        "parsed_image_dataset = raw_image_dataset.map(_parse_image_function)\n",
+        "\n",
+        "for features in parsed_image_dataset.take(1):\n",
+        "  # Check the IDs of the rater models. The rater IDs are the same for all the\n",
+        "  # examples in the dataset.\n",
+        "  rater_ids = features['rater_ids']\n",
+        "  rater_id_string = [r.numpy().decode('utf-8') for r in rater_ids]\n",
+        "  print('The IDs of the rater models for this dataset are:')\n",
+        "  print(rater_id_string)\n",
+        "  clean_label = features['image/class/fine_label'].numpy()\n",
+        "  print('The clean label for the following example is %d' % clean_label)\n",
+        "  noisy_labels = features['noisy_labels'].numpy()\n",
+        "  print('The noisy labels from the rater models are:')\n",
+        "  print(noisy_labels)\n",
+        "  image = tf.reshape(tf.io.decode_raw(features['image/encoded'], tf.uint8),\n",
+        "                     (32, 32, 3))\n",
+        "  plt.imshow(image)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "ASlnEQ0DpPY2"
+      },
+      "source": [
+        "**Patch Camelyon noisy label datasets**\n",
+        "\n",
+        "**Download size**\n",
+        "*   3.27GB for each of the low, medium, and high noise datasets.\n",
+        "\n",
+        "**Number of examples**\n",
+        "*   train: 130982, valid: 16394, for each of the low, medium, and high noise datasets.\n",
+        "\n",
+        "The train and valid splits are subsampled from the train and valid splits of the original Patch Camelyon dataset, respectively.\n",
+        "\n",
+        "**Data features**\n",
+        "*   \"image\": images in png format, shape=(96, 96, 3).\n",
+        "*   \"label\": clean label, tf.int64.\n",
+        "*   \"id\": the ID of this image in the original Patch Camelyon dataset, a tf.string that begins with \"train_\" or \"valid_\".\n",
+        "*   \"noisy_labels\": the noisy label given by rater models, a list of 20 (low and high noise) or 19 (medium noise) tf.int64 integers.\n",
+        "*   \"rater_ids\": the ID of the rater models, a list of 20 (low and high noise) or 19 (medium noise) tf.string.\n",
+        "\n",
+        "**Rater features**\n",
+        "*   model_name: name of the model\n",
+        "*   accuracy: accuracy of the rater model on the rater validation set\n",
+        "*   loss: loss of the rater model on the rater validation set\n",
+        "*   experience: the total number of data that the rater model has seen during training"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "Usw0YmwupYgP"
+      },
+      "outputs": [],
+      "source": [
+        "# @title An example for loading Patch Camelyon noisy label datasets\n",
+        "task_name = 'patch_camelyon'\n",
+        "\n",
+        "# One of ['low', 'medium', 'high']\n",
+        "noise_level = 'medium'\n",
+        "\n",
+        "# One of ['train', 'valid']. The `valid` split should be used for\n",
+        "# hyperparameter tuning. The model should be tested on the original test\n",
+        "# slipt for these tasks.\n",
+        "split = 'train'\n",
+        "\n",
+        "# We have 20 rater models for low and high noise for Patch Camelyon.\n",
+        "# For medium noise, we have 19 rater models.\n",
+        "num_raters = 19 if noise_level == 'medium' else 20\n",
+        "\n",
+        "directory = os.path.join(root_dir, task_name, noise_level, split) + '*'\n",
+        "raw_image_dataset = tf.data.TFRecordDataset(tf.io.gfile.glob(directory))\n",
+        "\n",
+        "# Create a dictionary describing the features.\n",
+        "image_feature_description = {\n",
+        "    # the raw image\n",
+        "    'image': tf.io.FixedLenFeature([], tf.string),\n",
+        "    # the clean label, value in {0, 1}\n",
+        "    'label': tf.io.FixedLenFeature([1], tf.int64),\n",
+        "    # noisy labels from all the raters\n",
+        "    'noisy_labels': tf.io.FixedLenFeature([num_raters], tf.int64),\n",
+        "    # the IDs of rater models\n",
+        "    'rater_ids': tf.io.FixedLenFeature([num_raters], tf.string),\n",
+        "}\n",
+        "\n",
+        "def _parse_image_function(example_proto):\n",
+        "  # Parse the input tf.train.Example proto using the dictionary above.\n",
+        "  return tf.io.parse_single_example(example_proto, image_feature_description)\n",
+        "\n",
+        "parsed_image_dataset = raw_image_dataset.map(_parse_image_function)\n",
+        "\n",
+        "for features in parsed_image_dataset.take(1):\n",
+        "  # Check the IDs of the rater models. The rater IDs are the same for all the\n",
+        "  # examples in the dataset.\n",
+        "  rater_ids = features['rater_ids']\n",
+        "  rater_id_string = [r.numpy().decode('utf-8') for r in rater_ids]\n",
+        "  print('The IDs of the rater models for this dataset are:')\n",
+        "  print(rater_id_string)\n",
+        "  clean_label = features['label'].numpy()\n",
+        "  print('The clean label for the following example is %d' % clean_label)\n",
+        "  noisy_labels = features['noisy_labels'].numpy()\n",
+        "  print('The noisy labels from the rater models are:')\n",
+        "  print(noisy_labels)\n",
+        "  image = tf.io.decode_png(features['image'])\n",
+        "  plt.imshow(image)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "CLYyqNIOsV4N"
+      },
+      "source": [
+        "**Cats vs Dogs noisy label datasets**\n",
+        "\n",
+        "**Download size**\n",
+        "*   2.4MB for each of the low, medium, and high noise datasets.\n",
+        "\n",
+        "**Number of examples**\n",
+        "*   train: 9302, valid: 1184, for each of the low, medium, and high noise datasets.\n",
+        "\n",
+        "Both the train and valid splits are subsampled from the original Cats vs Dogs dataset.\n",
+        "\n",
+        "**Data features**\n",
+        "*   \"noisy_labels\": the noisy label given by rater models, a list of 10 tf.int64 integers. Label 0 for cats, 1 for dogs.\n",
+        "*   \"rater_ids\": the ID of the rater models, a list of 10 tf.string.\n",
+        "*   \"image/filename\": the filename of the image, corresponding to the filename in the original Cats vs Dogs dataset, tf.string.\n",
+        "\n",
+        "\n",
+        "**Rater features**\n",
+        "*   model_name: name of the model\n",
+        "*   accuracy: accuracy of the rater model on the rater validation set\n",
+        "*   loss: loss of the rater model on the rater validation set\n",
+        "*   mAP: the mean average precision of the rater model on the rater validation set\n",
+        "*   auc_PR: the area under curve--precision recall of the rater model on the rater validation set\n",
+        "*   auc_ROC: the area under curve--ROC of the rater model on the rater validation set\n",
+        "*   experience: the total number of data that the rater model has seen during training"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "GmZ44wRJsnhR"
+      },
+      "outputs": [],
+      "source": [
+        "# @title An example for loading Cats vs Dongs noisy label datasets\n",
+        "task_name = 'cats_vs_dogs'\n",
+        "\n",
+        "# One of ['low', 'medium', 'high']\n",
+        "noise_level = 'medium'\n",
+        "\n",
+        "# One of ['train', 'valid']. The `valid` split should be used for\n",
+        "# hyperparameter tuning. The model should be tested on the original test\n",
+        "# slipt for these tasks.\n",
+        "split = 'train'\n",
+        "\n",
+        "# We have 10 rater models for Cats vs Dogs.\n",
+        "num_raters = 10\n",
+        "\n",
+        "directory = os.path.join(root_dir, task_name, noise_level, split) + '*'\n",
+        "raw_image_dataset = tf.data.TFRecordDataset(tf.io.gfile.glob(directory))\n",
+        "\n",
+        "# Create a dictionary describing the features.\n",
+        "image_feature_description = {\n",
+        "    # noisy labels from all the raters\n",
+        "    'noisy_labels': tf.io.FixedLenFeature([num_raters], tf.int64),\n",
+        "    # the IDs of rater models\n",
+        "    'rater_ids': tf.io.FixedLenFeature([num_raters], tf.string),\n",
+        "    # filename of the image\n",
+        "    'image/filename': tf.io.FixedLenFeature([1], tf.string),\n",
+        "}\n",
+        "\n",
+        "def _parse_image_function(example_proto):\n",
+        "  # Parse the input tf.train.Example proto using the dictionary above.\n",
+        "  return tf.io.parse_single_example(example_proto, image_feature_description)\n",
+        "\n",
+        "parsed_image_dataset = raw_image_dataset.map(_parse_image_function)\n",
+        "\n",
+        "for features in parsed_image_dataset.take(1):\n",
+        "  # Check the IDs of the rater models. The rater IDs are the same for all the\n",
+        "  # examples in the dataset.\n",
+        "  rater_ids = features['rater_ids']\n",
+        "  rater_id_string = [r.numpy().decode('utf-8') for r in rater_ids]\n",
+        "  print('The IDs of the rater models for this dataset are:')\n",
+        "  print(rater_id_string)\n",
+        "  print('Image filename:')\n",
+        "  print(features['image/filename'][0].numpy().decode('utf-8'))\n",
+        "  noisy_labels = features['noisy_labels'].numpy()\n",
+        "  print('The noisy labels from the rater models are:')\n",
+        "  print(noisy_labels)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "6AO3XJnrQPeY"
+      },
+      "outputs": [],
+      "source": [
+        "# @title An example for loading rater features\n",
+        "for task_name in ['cifar10', 'cifar100', 'patch_camelyon', 'cats_vs_dogs']:\n",
+        "  for noise_level in ['low', 'medium', 'high']:\n",
+        "    dir = os.path.join(root_dir, task_name, noise_level, 'rater_features.json')\n",
+        "    with tf.io.gfile.GFile(dir, 'rb') as fj:\n",
+        "      rater_features_dict = json.load(fj)\n",
+        "    print(rater_features_dict)"
+      ]
+    }
+  ],
+  "metadata": {
+    "colab": {
+      "collapsed_sections": [],
+      "last_runtime": {
+        "build_target": "//learning/deepmind/dm_python:dm_notebook3",
+        "kind": "private"
+      },
+      "name": "noisy_label_datasets_and_rater_features.ipynb",
+      "private_outputs": true,
+      "provenance": [
+        {
+          "file_id": "1zhPgvKIniqkpiY2SEnAdmiatJ__WyBz2",
+          "timestamp": 1621924313986
+        }
+      ]
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}