Misc README fixes.

PiperOrigin-RevId: 379669157
2026-05-19 19:01:30 +08:00 · 2021-06-16 09:39:23 +01:00
parent 4c80e527c4
commit 438d06513e
3 changed files with 45 additions and 35 deletions
@@ -1,6 +1,6 @@
-# DeepMind entry for PCQM4M-LSC
+# DeepMind entry for OGB-LSC

-This repository contains DeepMind's entry to the [PCWM4M-LSC](https://ogb.stanford.edu/kddcup2021/pcqm4m/) (quantum chemistry) and
+This repository contains DeepMind's entry to the [PCQM4M-LSC](https://ogb.stanford.edu/kddcup2021/pcqm4m/) (quantum chemistry) and
 [MAG240M-LSC](https://ogb.stanford.edu/kddcup2021/mag240m/) (academic graph)
 tracks of the [OGB Large-Scale Challenge](https://ogb.stanford.edu/kddcup2021/)
 (OGB-LSC).
@@ -60,7 +60,7 @@ See https://github.com/google/jax/issues/5231 for details.
 `ROOT`.**

 **2. Run this script to reorganize the data into a flat directory structure with
-transparent names**
+transparent names.**

 ```bash
 /bin/bash organize_data.sh -r ROOT
@@ -81,12 +81,13 @@ created, with contents:

 We refer to this as the "raw" data.

-**3. To run the preprocessing code**
+**3. Run the preprocessing code.**
+
 ```bash
 /bin/bash run_preprocessing.sh -r ROOT
 ```

-The pre-processing is very time- and memory-consuming, and should only be run
+The pre-processing is both time- and memory-consuming, and should only be run
 to verify the full pipeline. You can download the pre-processed data using the
 following script, for use in training and evaluating models:

@@ -99,11 +100,16 @@ python3 download_mag.py --task_root=${HOME}/mag --payload="data"

 We have provided pre-trained weights of our final submission for convenience.
 They can be downloaded with:
-```
+
+```bash
 python3 download_mag.py --task_root=${HOME}/mag --payload="models"
 ```
-Then to reproduce our final results, please run `bash run_pretrain_eval.sh`.

+Then to reproduce our final results, please run:
+
+```bash
+/bin/bash run_preprocessing.sh -r ${HOME}/mag/
+```

 ## Retraining our model

@@ -111,9 +117,14 @@ Disclaimer: This script is provided for illustrative purposes. It is not
 practical for actual training since it only uses a single machine, and likely
 requires reducing the batch size and/or model size to fit on a single GPU.

-If you still want to train a model, please run `run_training.sh`. To simply
-validate that the code is running correctly on your hardware setup, consider
-setting `debug=True` in `config.py`, which trains a smaller model.
+To train a model, please run:
+
+```bash
+/bin/bash run_training.sh -r ${HOME}/mag/
+```
+
+To simply validate that the code is running correctly on your hardware setup,
+consider setting `debug=True` in `config.py`, which trains a smaller model.


 # Citation
@@ -1,6 +1,6 @@
 # DeepMind entry for PCQM4M-LSC

-This repository contains DeepMind's entry to the [PCWM4M-LSC](https://ogb.stanford.edu/kddcup2021/pcqm4m/) (quantum chemistry)
+This repository contains DeepMind's entry to the [PCQM4M-LSC](https://ogb.stanford.edu/kddcup2021/pcqm4m/) (quantum chemistry)
 track of the [OGB Large-Scale Challenge](https://ogb.stanford.edu/kddcup2021/)
 (OGB-LSC).

@@ -48,55 +48,54 @@ pip3 install --upgrade pip setuptools wheel
 pip3 install -r ogb_lsc/pcq/requirements.txt
 ```

-Use the following command to get a jaxlib version built compatible with V100 GPUs.
-```bash
-pip install --upgrade jax jaxlib==0.1.67+cuda110 -f https://storage.googleapis.com/jax-releases/jax_releases.html
-```
-See https://github.com/google/jax/issues/5231 for details.
+## Download and pre-process data

-
-## Downloading data and model weights
-
-All necessary data and pre-trained model weights can be downloaded by running
-the following command.
-This downloads about ~ 150 GB worth of model checkpoints.
+All the additional features used in training (k-fold splits and conformer
+position features) can be generated by running:

 ```bash
-python download_required_pcq_data.py --data_root=${HOME}/data/
+/bin/bash run_preprocessing.sh -r ${HOME}/pcq/
 ```

-## Generating Pre-processed features
+Or downloaded using:

-All the additional features used in training
-(k-fold splits and conformer position features) can be generated by running.
 ```bash
-/bin/bash run_preprocessing.sh -r ${HOME}/data/pcq/
+python download_pcq.py --task_root=${HOME}/pcq/ --payload="data"
 ```

 ## Reproducing our final results

-We have provided pre-trained weights of our final submission for convenience. To
-reproduce our final results, please run `run_pretrained_eval.sh` as follows.
+We have provided pre-trained weights of our final submission (~150 GB worth of
+model checkpoints) for convenience, which can be downloaded with:

 ```bash
-/bin/bash run_pretrained_eval.sh -r ${HOME}/data/pcq/
+python download_pcq.py --task_root=${HOME}/pcq/ --payload="models"
 ```

+Then to reproduce our final results please run:
+
+```bash
+/bin/bash run_pretrained_eval.sh -r ${HOME}/pcq/
+```
+
+Note that this script does not use the downloaded conformer position features,
+and instead computes them for the test set as part of the script.
+
 ## Retraining our model

 Disclaimer: This script is provided for illustrative purposes. It is not
 practical for actual training since it only uses a single machine, and likely
 requires reducing the batch size and/or model size to fit on a single GPU.

-If you still want to train a model, please run `run_training.sh`. To simply
-validate that the code is running correctly on your hardware setup, consider
-setting `debug=True` in `config.py`, which trains a smaller model.
-
+To train a model, please run:

 ```bash
-/bin/bash run_training.sh -r ${HOME}/data/pcq/
+/bin/bash run_training.sh -r ${HOME}/pcq/
 ```

+To simply validate that the code is running correctly on your hardware setup,
+consider setting `debug=True` in `config.py`, which trains a smaller model.
+

 # Citation