Misc README fixes.

PiperOrigin-RevId: 379669157
This commit is contained in:
Alvaro Sanchez-Gonzalez
2021-06-16 09:39:23 +01:00
committed by Saran Tunyasuvunakool
parent 4c80e527c4
commit 438d06513e
3 changed files with 45 additions and 35 deletions
+2 -2
View File
@@ -1,6 +1,6 @@
# DeepMind entry for PCQM4M-LSC # DeepMind entry for OGB-LSC
This repository contains DeepMind's entry to the [PCWM4M-LSC](https://ogb.stanford.edu/kddcup2021/pcqm4m/) (quantum chemistry) and This repository contains DeepMind's entry to the [PCQM4M-LSC](https://ogb.stanford.edu/kddcup2021/pcqm4m/) (quantum chemistry) and
[MAG240M-LSC](https://ogb.stanford.edu/kddcup2021/mag240m/) (academic graph) [MAG240M-LSC](https://ogb.stanford.edu/kddcup2021/mag240m/) (academic graph)
tracks of the [OGB Large-Scale Challenge](https://ogb.stanford.edu/kddcup2021/) tracks of the [OGB Large-Scale Challenge](https://ogb.stanford.edu/kddcup2021/)
(OGB-LSC). (OGB-LSC).
+19 -8
View File
@@ -60,7 +60,7 @@ See https://github.com/google/jax/issues/5231 for details.
`ROOT`.** `ROOT`.**
**2. Run this script to reorganize the data into a flat directory structure with **2. Run this script to reorganize the data into a flat directory structure with
transparent names** transparent names.**
```bash ```bash
/bin/bash organize_data.sh -r ROOT /bin/bash organize_data.sh -r ROOT
@@ -81,12 +81,13 @@ created, with contents:
We refer to this as the "raw" data. We refer to this as the "raw" data.
**3. To run the preprocessing code** **3. Run the preprocessing code.**
```bash ```bash
/bin/bash run_preprocessing.sh -r ROOT /bin/bash run_preprocessing.sh -r ROOT
``` ```
The pre-processing is very time- and memory-consuming, and should only be run The pre-processing is both time- and memory-consuming, and should only be run
to verify the full pipeline. You can download the pre-processed data using the to verify the full pipeline. You can download the pre-processed data using the
following script, for use in training and evaluating models: following script, for use in training and evaluating models:
@@ -99,11 +100,16 @@ python3 download_mag.py --task_root=${HOME}/mag --payload="data"
We have provided pre-trained weights of our final submission for convenience. We have provided pre-trained weights of our final submission for convenience.
They can be downloaded with: They can be downloaded with:
```
```bash
python3 download_mag.py --task_root=${HOME}/mag --payload="models" python3 download_mag.py --task_root=${HOME}/mag --payload="models"
``` ```
Then to reproduce our final results, please run `bash run_pretrain_eval.sh`.
Then to reproduce our final results, please run:
```bash
/bin/bash run_preprocessing.sh -r ${HOME}/mag/
```
## Retraining our model ## Retraining our model
@@ -111,9 +117,14 @@ Disclaimer: This script is provided for illustrative purposes. It is not
practical for actual training since it only uses a single machine, and likely practical for actual training since it only uses a single machine, and likely
requires reducing the batch size and/or model size to fit on a single GPU. requires reducing the batch size and/or model size to fit on a single GPU.
If you still want to train a model, please run `run_training.sh`. To simply To train a model, please run:
validate that the code is running correctly on your hardware setup, consider
setting `debug=True` in `config.py`, which trains a smaller model. ```bash
/bin/bash run_training.sh -r ${HOME}/mag/
```
To simply validate that the code is running correctly on your hardware setup,
consider setting `debug=True` in `config.py`, which trains a smaller model.
# Citation # Citation
+24 -25
View File
@@ -1,6 +1,6 @@
# DeepMind entry for PCQM4M-LSC # DeepMind entry for PCQM4M-LSC
This repository contains DeepMind's entry to the [PCWM4M-LSC](https://ogb.stanford.edu/kddcup2021/pcqm4m/) (quantum chemistry) This repository contains DeepMind's entry to the [PCQM4M-LSC](https://ogb.stanford.edu/kddcup2021/pcqm4m/) (quantum chemistry)
track of the [OGB Large-Scale Challenge](https://ogb.stanford.edu/kddcup2021/) track of the [OGB Large-Scale Challenge](https://ogb.stanford.edu/kddcup2021/)
(OGB-LSC). (OGB-LSC).
@@ -48,55 +48,54 @@ pip3 install --upgrade pip setuptools wheel
pip3 install -r ogb_lsc/pcq/requirements.txt pip3 install -r ogb_lsc/pcq/requirements.txt
``` ```
Use the following command to get a jaxlib version built compatible with V100 GPUs. ## Download and pre-process data
```bash
pip install --upgrade jax jaxlib==0.1.67+cuda110 -f https://storage.googleapis.com/jax-releases/jax_releases.html
```
See https://github.com/google/jax/issues/5231 for details.
All the additional features used in training (k-fold splits and conformer
## Downloading data and model weights position features) can be generated by running:
All necessary data and pre-trained model weights can be downloaded by running
the following command.
This downloads about ~ 150 GB worth of model checkpoints.
```bash ```bash
python download_required_pcq_data.py --data_root=${HOME}/data/ /bin/bash run_preprocessing.sh -r ${HOME}/pcq/
``` ```
## Generating Pre-processed features Or downloaded using:
All the additional features used in training
(k-fold splits and conformer position features) can be generated by running.
```bash ```bash
/bin/bash run_preprocessing.sh -r ${HOME}/data/pcq/ python download_pcq.py --task_root=${HOME}/pcq/ --payload="data"
``` ```
## Reproducing our final results ## Reproducing our final results
We have provided pre-trained weights of our final submission for convenience. To We have provided pre-trained weights of our final submission (~150 GB worth of
reproduce our final results, please run `run_pretrained_eval.sh` as follows. model checkpoints) for convenience, which can be downloaded with:
```bash ```bash
/bin/bash run_pretrained_eval.sh -r ${HOME}/data/pcq/ python download_pcq.py --task_root=${HOME}/pcq/ --payload="models"
``` ```
Then to reproduce our final results please run:
```bash
/bin/bash run_pretrained_eval.sh -r ${HOME}/pcq/
```
Note that this script does not use the downloaded conformer position features,
and instead computes them for the test set as part of the script.
## Retraining our model ## Retraining our model
Disclaimer: This script is provided for illustrative purposes. It is not Disclaimer: This script is provided for illustrative purposes. It is not
practical for actual training since it only uses a single machine, and likely practical for actual training since it only uses a single machine, and likely
requires reducing the batch size and/or model size to fit on a single GPU. requires reducing the batch size and/or model size to fit on a single GPU.
If you still want to train a model, please run `run_training.sh`. To simply To train a model, please run:
validate that the code is running correctly on your hardware setup, consider
setting `debug=True` in `config.py`, which trains a smaller model.
```bash ```bash
/bin/bash run_training.sh -r ${HOME}/data/pcq/ /bin/bash run_training.sh -r ${HOME}/pcq/
``` ```
To simply validate that the code is running correctly on your hardware setup,
consider setting `debug=True` in `config.py`, which trains a smaller model.
# Citation # Citation