Misc README fixes.

PiperOrigin-RevId: 379669157
This commit is contained in:
Alvaro Sanchez-Gonzalez
2021-06-16 09:39:23 +01:00
committed by Saran Tunyasuvunakool
parent 4c80e527c4
commit 438d06513e
3 changed files with 45 additions and 35 deletions
+2 -2
View File
@@ -1,6 +1,6 @@
# DeepMind entry for PCQM4M-LSC
# DeepMind entry for OGB-LSC
This repository contains DeepMind's entry to the [PCWM4M-LSC](https://ogb.stanford.edu/kddcup2021/pcqm4m/) (quantum chemistry) and
This repository contains DeepMind's entry to the [PCQM4M-LSC](https://ogb.stanford.edu/kddcup2021/pcqm4m/) (quantum chemistry) and
[MAG240M-LSC](https://ogb.stanford.edu/kddcup2021/mag240m/) (academic graph)
tracks of the [OGB Large-Scale Challenge](https://ogb.stanford.edu/kddcup2021/)
(OGB-LSC).
+19 -8
View File
@@ -60,7 +60,7 @@ See https://github.com/google/jax/issues/5231 for details.
`ROOT`.**
**2. Run this script to reorganize the data into a flat directory structure with
transparent names**
transparent names.**
```bash
/bin/bash organize_data.sh -r ROOT
@@ -81,12 +81,13 @@ created, with contents:
We refer to this as the "raw" data.
**3. To run the preprocessing code**
**3. Run the preprocessing code.**
```bash
/bin/bash run_preprocessing.sh -r ROOT
```
The pre-processing is very time- and memory-consuming, and should only be run
The pre-processing is both time- and memory-consuming, and should only be run
to verify the full pipeline. You can download the pre-processed data using the
following script, for use in training and evaluating models:
@@ -99,11 +100,16 @@ python3 download_mag.py --task_root=${HOME}/mag --payload="data"
We have provided pre-trained weights of our final submission for convenience.
They can be downloaded with:
```
```bash
python3 download_mag.py --task_root=${HOME}/mag --payload="models"
```
Then to reproduce our final results, please run `bash run_pretrain_eval.sh`.
Then to reproduce our final results, please run:
```bash
/bin/bash run_preprocessing.sh -r ${HOME}/mag/
```
## Retraining our model
@@ -111,9 +117,14 @@ Disclaimer: This script is provided for illustrative purposes. It is not
practical for actual training since it only uses a single machine, and likely
requires reducing the batch size and/or model size to fit on a single GPU.
If you still want to train a model, please run `run_training.sh`. To simply
validate that the code is running correctly on your hardware setup, consider
setting `debug=True` in `config.py`, which trains a smaller model.
To train a model, please run:
```bash
/bin/bash run_training.sh -r ${HOME}/mag/
```
To simply validate that the code is running correctly on your hardware setup,
consider setting `debug=True` in `config.py`, which trains a smaller model.
# Citation
+24 -25
View File
@@ -1,6 +1,6 @@
# DeepMind entry for PCQM4M-LSC
This repository contains DeepMind's entry to the [PCWM4M-LSC](https://ogb.stanford.edu/kddcup2021/pcqm4m/) (quantum chemistry)
This repository contains DeepMind's entry to the [PCQM4M-LSC](https://ogb.stanford.edu/kddcup2021/pcqm4m/) (quantum chemistry)
track of the [OGB Large-Scale Challenge](https://ogb.stanford.edu/kddcup2021/)
(OGB-LSC).
@@ -48,55 +48,54 @@ pip3 install --upgrade pip setuptools wheel
pip3 install -r ogb_lsc/pcq/requirements.txt
```
Use the following command to get a jaxlib version built compatible with V100 GPUs.
```bash
pip install --upgrade jax jaxlib==0.1.67+cuda110 -f https://storage.googleapis.com/jax-releases/jax_releases.html
```
See https://github.com/google/jax/issues/5231 for details.
## Download and pre-process data
## Downloading data and model weights
All necessary data and pre-trained model weights can be downloaded by running
the following command.
This downloads about ~ 150 GB worth of model checkpoints.
All the additional features used in training (k-fold splits and conformer
position features) can be generated by running:
```bash
python download_required_pcq_data.py --data_root=${HOME}/data/
/bin/bash run_preprocessing.sh -r ${HOME}/pcq/
```
## Generating Pre-processed features
Or downloaded using:
All the additional features used in training
(k-fold splits and conformer position features) can be generated by running.
```bash
/bin/bash run_preprocessing.sh -r ${HOME}/data/pcq/
python download_pcq.py --task_root=${HOME}/pcq/ --payload="data"
```
## Reproducing our final results
We have provided pre-trained weights of our final submission for convenience. To
reproduce our final results, please run `run_pretrained_eval.sh` as follows.
We have provided pre-trained weights of our final submission (~150 GB worth of
model checkpoints) for convenience, which can be downloaded with:
```bash
/bin/bash run_pretrained_eval.sh -r ${HOME}/data/pcq/
python download_pcq.py --task_root=${HOME}/pcq/ --payload="models"
```
Then to reproduce our final results please run:
```bash
/bin/bash run_pretrained_eval.sh -r ${HOME}/pcq/
```
Note that this script does not use the downloaded conformer position features,
and instead computes them for the test set as part of the script.
## Retraining our model
Disclaimer: This script is provided for illustrative purposes. It is not
practical for actual training since it only uses a single machine, and likely
requires reducing the batch size and/or model size to fit on a single GPU.
If you still want to train a model, please run `run_training.sh`. To simply
validate that the code is running correctly on your hardware setup, consider
setting `debug=True` in `config.py`, which trains a smaller model.
To train a model, please run:
```bash
/bin/bash run_training.sh -r ${HOME}/data/pcq/
/bin/bash run_training.sh -r ${HOME}/pcq/
```
To simply validate that the code is running correctly on your hardware setup,
consider setting `debug=True` in `config.py`, which trains a smaller model.
# Citation