mirror of https://github.com/google-deepmind/deepmind-research.git synced 2026-06-02 14:45:25 +08:00

Files

T

History

Alvaro Sanchez-Gonzalez fb1d757863 pytype fix

PiperOrigin-RevId: 525432152

2023-06-02 18:04:22 +01:00

batching_utils.py

Use jax.tree_util.tree_map in place of deprecated tree_multimap.

2022-07-24 17:53:28 +01:00

config.py

Add OGB-LSC code.

2021-06-16 00:00:06 +01:00

conformer_utils.py

Add OGB-LSC code.

2021-06-16 00:00:06 +01:00

dataset_utils.py

Replace references to deprecated jax.curry function.

2023-06-02 18:03:54 +01:00

datasets.py

Add OGB-LSC code.

2021-06-16 00:00:06 +01:00

download_pcq.py

Add OGB-LSC code.

2021-06-16 00:00:06 +01:00

ensemble_predictions.py

pytype fix

2023-06-02 18:04:22 +01:00

experiment.py

Use jax.tree_util.tree_map in place of deprecated tree_multimap.

2022-07-24 17:53:28 +01:00

generate_conformer_features.py

Add OGB-LSC code.

2021-06-16 00:00:06 +01:00

generate_validation_splits.py

Add OGB-LSC code.

2021-06-16 00:00:06 +01:00

model.py

Replaces references to jax.numpy.DeviceArray with jax.Array.

2023-06-02 18:03:13 +01:00

README.md

Update arXiv version + citation for OGB-LSC

2021-09-17 17:49:29 +01:00

requirements.txt

Add note about installing jaxlib version with GPU support.

2021-06-16 13:20:27 +01:00

run_preprocessing.sh

Add OGB-LSC code.

2021-06-16 00:00:06 +01:00

run_pretrained_eval.sh

Add OGB-LSC code.

2021-06-16 00:00:06 +01:00

run_training.sh

Add OGB-LSC code.

2021-06-16 00:00:06 +01:00

README.md

DeepMind entry for PCQM4M-LSC

This repository contains DeepMind's entry to the PCQM4M-LSC (quantum chemistry) track of the OGB Large-Scale Challenge (OGB-LSC).

For full details regarding this entry, please see our technical report.

DeepMind PCQ Team ("Quantum")

(in alphabetical order)

Ravichandra Addanki
Peter Battaglia
David Budden
Andreea Deac
Jonathan Godwin
Alvaro Sanchez-Gonzalez
Wai Lok Sibon Li
Jacklynn Stott
Shantanu Thakoor
Petar Veličković

Performance

Our final test set performance was achieved by pooling an ensemble of 20 models (10 folds x 2 seeds). See technical report for details.

Each model was trained for < 48 hours using 4x Google Cloud TPUv4 and 1x AMD EPYC 7B12 64-core CPU @2.25GHz.

Inference takes < 6 hours on 1x NVIDIA V100 16GB GPU and 1x Intel Xeon Gold 6148 20-core CPU @2.40GHz.

Running our model

Setup

You can set up Python virtual environment (you might need to install the python3-venv package first) with all needed dependencies inside the forked deepmind_research repository using:

python3 -m venv /tmp/pcq_venv
source /tmp/pcq_venv/bin/activate
pip3 install --upgrade pip setuptools wheel
pip3 install -r ogb_lsc/pcq/requirements.txt

Download and pre-process data

All the additional features used in training (k-fold splits and conformer position features) can be generated by running:

/bin/bash run_preprocessing.sh -r ${HOME}/pcq/

Or downloaded using:

python download_pcq.py --task_root=${HOME}/pcq/ --payload="data"

Reproducing our final results

We have provided pre-trained weights of our final submission (~150 GB worth of model checkpoints) for convenience, which can be downloaded with:

python download_pcq.py --task_root=${HOME}/pcq/ --payload="models"

Then to reproduce our final results please run:

/bin/bash run_pretrained_eval.sh -r ${HOME}/pcq/

Note that this script does not use the downloaded conformer position features, and instead computes them for the test set as part of the script.

Retraining our model

Disclaimer: This script is provided for illustrative purposes. It is not practical for actual training since it only uses a single machine, and likely requires reducing the batch size and/or model size to fit on a single GPU.

To train a model, please run:

/bin/bash run_training.sh -r ${HOME}/pcq/

To simply validate that the code is running correctly on your hardware setup, consider setting debug=True in config.py, which trains a smaller model.

Citation

To cite this work (together with our MAG240M-LSC entry):

@article{deepmind2021ogb,
  author = {Ravichandra Addanki and Peter Battaglia and David Budden and Andreea
    Deac and Jonathan Godwin and Thomas Keck and Wai Lok Sibon Li and Alvaro
    Sanchez-Gonzalez and Jacklynn Stott and Shantanu Thakoor and Petar
    Veli\v{c}kovi\'{c}},
  title = {Large-scale graph representation learning with very deep GNNs and
    self-supervision},
  year = {2021},
  journal={arXiv preprint arXiv:2107.09422},
}

Our technical report can be found here.