Files
Alvaro Sanchez-Gonzalez fb1d757863 pytype fix
PiperOrigin-RevId: 525432152
2023-06-02 18:04:22 +01:00
..
2021-06-16 00:00:06 +01:00
2021-06-16 00:00:06 +01:00
2021-06-16 00:00:06 +01:00
2021-06-16 00:00:06 +01:00
2023-06-02 18:04:22 +01:00
2021-06-16 00:00:06 +01:00
2021-06-16 00:00:06 +01:00
2021-06-16 00:00:06 +01:00

DeepMind entry for PCQM4M-LSC

This repository contains DeepMind's entry to the PCQM4M-LSC (quantum chemistry) track of the OGB Large-Scale Challenge (OGB-LSC).

For full details regarding this entry, please see our technical report.

DeepMind PCQ Team ("Quantum")

(in alphabetical order)

  • Ravichandra Addanki
  • Peter Battaglia
  • David Budden
  • Andreea Deac
  • Jonathan Godwin
  • Alvaro Sanchez-Gonzalez
  • Wai Lok Sibon Li
  • Jacklynn Stott
  • Shantanu Thakoor
  • Petar Veličković

Performance

Our final test set performance was achieved by pooling an ensemble of 20 models (10 folds x 2 seeds). See technical report for details.

Each model was trained for < 48 hours using 4x Google Cloud TPUv4 and 1x AMD EPYC 7B12 64-core CPU @2.25GHz.

Inference takes < 6 hours on 1x NVIDIA V100 16GB GPU and 1x Intel Xeon Gold 6148 20-core CPU @2.40GHz.

Running our model

Setup

You can set up Python virtual environment (you might need to install the python3-venv package first) with all needed dependencies inside the forked deepmind_research repository using:

python3 -m venv /tmp/pcq_venv
source /tmp/pcq_venv/bin/activate
pip3 install --upgrade pip setuptools wheel
pip3 install -r ogb_lsc/pcq/requirements.txt

Download and pre-process data

All the additional features used in training (k-fold splits and conformer position features) can be generated by running:

/bin/bash run_preprocessing.sh -r ${HOME}/pcq/

Or downloaded using:

python download_pcq.py --task_root=${HOME}/pcq/ --payload="data"

Reproducing our final results

We have provided pre-trained weights of our final submission (~150 GB worth of model checkpoints) for convenience, which can be downloaded with:

python download_pcq.py --task_root=${HOME}/pcq/ --payload="models"

Then to reproduce our final results please run:

/bin/bash run_pretrained_eval.sh -r ${HOME}/pcq/

Note that this script does not use the downloaded conformer position features, and instead computes them for the test set as part of the script.

Retraining our model

Disclaimer: This script is provided for illustrative purposes. It is not practical for actual training since it only uses a single machine, and likely requires reducing the batch size and/or model size to fit on a single GPU.

To train a model, please run:

/bin/bash run_training.sh -r ${HOME}/pcq/

To simply validate that the code is running correctly on your hardware setup, consider setting debug=True in config.py, which trains a smaller model.

Citation

To cite this work (together with our MAG240M-LSC entry):

@article{deepmind2021ogb,
  author = {Ravichandra Addanki and Peter Battaglia and David Budden and Andreea
    Deac and Jonathan Godwin and Thomas Keck and Wai Lok Sibon Li and Alvaro
    Sanchez-Gonzalez and Jacklynn Stott and Shantanu Thakoor and Petar
    Veli\v{c}kovi\'{c}},
  title = {Large-scale graph representation learning with very deep GNNs and
    self-supervision},
  year = {2021},
  journal={arXiv preprint arXiv:2107.09422},
}

Our technical report can be found here.