PiperOrigin-RevId: 525432152
DeepMind entry for PCQM4M-LSC
This repository contains DeepMind's entry to the PCQM4M-LSC (quantum chemistry) track of the OGB Large-Scale Challenge (OGB-LSC).
For full details regarding this entry, please see our technical report.
DeepMind PCQ Team ("Quantum")
(in alphabetical order)
- Ravichandra Addanki
- Peter Battaglia
- David Budden
- Andreea Deac
- Jonathan Godwin
- Alvaro Sanchez-Gonzalez
- Wai Lok Sibon Li
- Jacklynn Stott
- Shantanu Thakoor
- Petar Veličković
Performance
Our final test set performance was achieved by pooling an ensemble of 20 models (10 folds x 2 seeds). See technical report for details.
Each model was trained for < 48 hours using 4x Google Cloud TPUv4 and 1x AMD EPYC 7B12 64-core CPU @2.25GHz.
Inference takes < 6 hours on 1x NVIDIA V100 16GB GPU and 1x Intel Xeon Gold 6148 20-core CPU @2.40GHz.
Running our model
Setup
You can set up Python virtual environment (you might need to install the
python3-venv package first) with all needed dependencies inside the forked
deepmind_research repository using:
python3 -m venv /tmp/pcq_venv
source /tmp/pcq_venv/bin/activate
pip3 install --upgrade pip setuptools wheel
pip3 install -r ogb_lsc/pcq/requirements.txt
Download and pre-process data
All the additional features used in training (k-fold splits and conformer position features) can be generated by running:
/bin/bash run_preprocessing.sh -r ${HOME}/pcq/
Or downloaded using:
python download_pcq.py --task_root=${HOME}/pcq/ --payload="data"
Reproducing our final results
We have provided pre-trained weights of our final submission (~150 GB worth of model checkpoints) for convenience, which can be downloaded with:
python download_pcq.py --task_root=${HOME}/pcq/ --payload="models"
Then to reproduce our final results please run:
/bin/bash run_pretrained_eval.sh -r ${HOME}/pcq/
Note that this script does not use the downloaded conformer position features, and instead computes them for the test set as part of the script.
Retraining our model
Disclaimer: This script is provided for illustrative purposes. It is not practical for actual training since it only uses a single machine, and likely requires reducing the batch size and/or model size to fit on a single GPU.
To train a model, please run:
/bin/bash run_training.sh -r ${HOME}/pcq/
To simply validate that the code is running correctly on your hardware setup,
consider setting debug=True in config.py, which trains a smaller model.
Citation
To cite this work (together with our MAG240M-LSC entry):
@article{deepmind2021ogb,
author = {Ravichandra Addanki and Peter Battaglia and David Budden and Andreea
Deac and Jonathan Godwin and Thomas Keck and Wai Lok Sibon Li and Alvaro
Sanchez-Gonzalez and Jacklynn Stott and Shantanu Thakoor and Petar
Veli\v{c}kovi\'{c}},
title = {Large-scale graph representation learning with very deep GNNs and
self-supervision},
year = {2021},
journal={arXiv preprint arXiv:2107.09422},
}
Our technical report can be found here.