diff --git a/ogb_lsc/README.md b/ogb_lsc/README.md index f428648..dba0055 100644 --- a/ogb_lsc/README.md +++ b/ogb_lsc/README.md @@ -1,6 +1,6 @@ -# DeepMind entry for PCQM4M-LSC +# DeepMind entry for OGB-LSC -This repository contains DeepMind's entry to the [PCWM4M-LSC](https://ogb.stanford.edu/kddcup2021/pcqm4m/) (quantum chemistry) and +This repository contains DeepMind's entry to the [PCQM4M-LSC](https://ogb.stanford.edu/kddcup2021/pcqm4m/) (quantum chemistry) and [MAG240M-LSC](https://ogb.stanford.edu/kddcup2021/mag240m/) (academic graph) tracks of the [OGB Large-Scale Challenge](https://ogb.stanford.edu/kddcup2021/) (OGB-LSC). diff --git a/ogb_lsc/mag/README.md b/ogb_lsc/mag/README.md index 6c8c114..32349ee 100644 --- a/ogb_lsc/mag/README.md +++ b/ogb_lsc/mag/README.md @@ -60,7 +60,7 @@ See https://github.com/google/jax/issues/5231 for details. `ROOT`.** **2. Run this script to reorganize the data into a flat directory structure with -transparent names** +transparent names.** ```bash /bin/bash organize_data.sh -r ROOT @@ -81,12 +81,13 @@ created, with contents: We refer to this as the "raw" data. -**3. To run the preprocessing code** +**3. Run the preprocessing code.** + ```bash /bin/bash run_preprocessing.sh -r ROOT ``` -The pre-processing is very time- and memory-consuming, and should only be run +The pre-processing is both time- and memory-consuming, and should only be run to verify the full pipeline. You can download the pre-processed data using the following script, for use in training and evaluating models: @@ -99,11 +100,16 @@ python3 download_mag.py --task_root=${HOME}/mag --payload="data" We have provided pre-trained weights of our final submission for convenience. They can be downloaded with: -``` + +```bash python3 download_mag.py --task_root=${HOME}/mag --payload="models" ``` -Then to reproduce our final results, please run `bash run_pretrain_eval.sh`. +Then to reproduce our final results, please run: + +```bash +/bin/bash run_preprocessing.sh -r ${HOME}/mag/ +``` ## Retraining our model @@ -111,9 +117,14 @@ Disclaimer: This script is provided for illustrative purposes. It is not practical for actual training since it only uses a single machine, and likely requires reducing the batch size and/or model size to fit on a single GPU. -If you still want to train a model, please run `run_training.sh`. To simply -validate that the code is running correctly on your hardware setup, consider -setting `debug=True` in `config.py`, which trains a smaller model. +To train a model, please run: + +```bash +/bin/bash run_training.sh -r ${HOME}/mag/ +``` + +To simply validate that the code is running correctly on your hardware setup, +consider setting `debug=True` in `config.py`, which trains a smaller model. # Citation diff --git a/ogb_lsc/pcq/README.md b/ogb_lsc/pcq/README.md index 625be5a..eda7070 100644 --- a/ogb_lsc/pcq/README.md +++ b/ogb_lsc/pcq/README.md @@ -1,6 +1,6 @@ # DeepMind entry for PCQM4M-LSC -This repository contains DeepMind's entry to the [PCWM4M-LSC](https://ogb.stanford.edu/kddcup2021/pcqm4m/) (quantum chemistry) +This repository contains DeepMind's entry to the [PCQM4M-LSC](https://ogb.stanford.edu/kddcup2021/pcqm4m/) (quantum chemistry) track of the [OGB Large-Scale Challenge](https://ogb.stanford.edu/kddcup2021/) (OGB-LSC). @@ -48,55 +48,54 @@ pip3 install --upgrade pip setuptools wheel pip3 install -r ogb_lsc/pcq/requirements.txt ``` -Use the following command to get a jaxlib version built compatible with V100 GPUs. -```bash -pip install --upgrade jax jaxlib==0.1.67+cuda110 -f https://storage.googleapis.com/jax-releases/jax_releases.html -``` -See https://github.com/google/jax/issues/5231 for details. +## Download and pre-process data - -## Downloading data and model weights - -All necessary data and pre-trained model weights can be downloaded by running -the following command. -This downloads about ~ 150 GB worth of model checkpoints. +All the additional features used in training (k-fold splits and conformer +position features) can be generated by running: ```bash -python download_required_pcq_data.py --data_root=${HOME}/data/ +/bin/bash run_preprocessing.sh -r ${HOME}/pcq/ ``` -## Generating Pre-processed features +Or downloaded using: -All the additional features used in training -(k-fold splits and conformer position features) can be generated by running. ```bash -/bin/bash run_preprocessing.sh -r ${HOME}/data/pcq/ +python download_pcq.py --task_root=${HOME}/pcq/ --payload="data" ``` ## Reproducing our final results -We have provided pre-trained weights of our final submission for convenience. To -reproduce our final results, please run `run_pretrained_eval.sh` as follows. +We have provided pre-trained weights of our final submission (~150 GB worth of +model checkpoints) for convenience, which can be downloaded with: ```bash -/bin/bash run_pretrained_eval.sh -r ${HOME}/data/pcq/ +python download_pcq.py --task_root=${HOME}/pcq/ --payload="models" ``` +Then to reproduce our final results please run: + +```bash +/bin/bash run_pretrained_eval.sh -r ${HOME}/pcq/ +``` + +Note that this script does not use the downloaded conformer position features, +and instead computes them for the test set as part of the script. + ## Retraining our model Disclaimer: This script is provided for illustrative purposes. It is not practical for actual training since it only uses a single machine, and likely requires reducing the batch size and/or model size to fit on a single GPU. -If you still want to train a model, please run `run_training.sh`. To simply -validate that the code is running correctly on your hardware setup, consider -setting `debug=True` in `config.py`, which trains a smaller model. - +To train a model, please run: ```bash -/bin/bash run_training.sh -r ${HOME}/data/pcq/ +/bin/bash run_training.sh -r ${HOME}/pcq/ ``` +To simply validate that the code is running correctly on your hardware setup, +consider setting `debug=True` in `config.py`, which trains a smaller model. + # Citation