Misc README fixes.

PiperOrigin-RevId: 379669157
2026-05-28 19:31:14 +08:00 · 2021-06-16 09:39:23 +01:00
parent 4c80e527c4
commit 438d06513e
3 changed files with 45 additions and 35 deletions
@@ -1,6 +1,6 @@
-# DeepMind entry for PCQM4M-LSC
+# DeepMind entry for OGB-LSC
-This repository contains DeepMind's entry to the [PCWM4M-LSC](https://ogb.stanford.edu/kddcup2021/pcqm4m/) (quantum chemistry) and
+This repository contains DeepMind's entry to the [PCQM4M-LSC](https://ogb.stanford.edu/kddcup2021/pcqm4m/) (quantum chemistry) and
 [MAG240M-LSC](https://ogb.stanford.edu/kddcup2021/mag240m/) (academic graph)
 tracks of the [OGB Large-Scale Challenge](https://ogb.stanford.edu/kddcup2021/)
 (OGB-LSC).
@@ -60,7 +60,7 @@ See https://github.com/google/jax/issues/5231 for details.
 `ROOT`.**
 **2. Run this script to reorganize the data into a flat directory structure with
-transparent names**
+transparent names.**
 ```bash
 /bin/bash organize_data.sh -r ROOT
@@ -81,12 +81,13 @@ created, with contents:
 We refer to this as the "raw" data.
-**3. To run the preprocessing code**
+**3. Run the preprocessing code.**
 ```bash
 /bin/bash run_preprocessing.sh -r ROOT
 ```
-The pre-processing is very time- and memory-consuming, and should only be run
+The pre-processing is both time- and memory-consuming, and should only be run
 to verify the full pipeline. You can download the pre-processed data using the
 following script, for use in training and evaluating models:
@@ -99,11 +100,16 @@ python3 download_mag.py --task_root=${HOME}/mag --payload="data"
 We have provided pre-trained weights of our final submission for convenience.
 They can be downloaded with:
-```
+
 ```bash
 python3 download_mag.py --task_root=${HOME}/mag --payload="models"
 ```
 Then to reproduce our final results, please run `bash run_pretrain_eval.sh`.
 Then to reproduce our final results, please run:
 ```bash
 /bin/bash run_preprocessing.sh -r ${HOME}/mag/
 ```
 ## Retraining our model
@@ -111,9 +117,14 @@ Disclaimer: This script is provided for illustrative purposes. It is not
 practical for actual training since it only uses a single machine, and likely
 requires reducing the batch size and/or model size to fit on a single GPU.
-If you still want to train a model, please run `run_training.sh`. To simply
+To train a model, please run:
-validate that the code is running correctly on your hardware setup, consider
+
-setting `debug=True` in `config.py`, which trains a smaller model.
+```bash
 /bin/bash run_training.sh -r ${HOME}/mag/
 ```
 To simply validate that the code is running correctly on your hardware setup,
 consider setting `debug=True` in `config.py`, which trains a smaller model.
 # Citation
@@ -1,6 +1,6 @@
 # DeepMind entry for PCQM4M-LSC
-This repository contains DeepMind's entry to the [PCWM4M-LSC](https://ogb.stanford.edu/kddcup2021/pcqm4m/) (quantum chemistry)
+This repository contains DeepMind's entry to the [PCQM4M-LSC](https://ogb.stanford.edu/kddcup2021/pcqm4m/) (quantum chemistry)
 track of the [OGB Large-Scale Challenge](https://ogb.stanford.edu/kddcup2021/)
 (OGB-LSC).
@@ -48,55 +48,54 @@ pip3 install --upgrade pip setuptools wheel
 pip3 install -r ogb_lsc/pcq/requirements.txt
 ```
-Use the following command to get a jaxlib version built compatible with V100 GPUs.
+## Download and pre-process data
 ```bash
 pip install --upgrade jax jaxlib==0.1.67+cuda110 -f https://storage.googleapis.com/jax-releases/jax_releases.html
 ```
 See https://github.com/google/jax/issues/5231 for details.
-
+All the additional features used in training (k-fold splits and conformer
-## Downloading data and model weights
+position features) can be generated by running:
 All necessary data and pre-trained model weights can be downloaded by running
 the following command.
 This downloads about ~ 150 GB worth of model checkpoints.
 ```bash
-python download_required_pcq_data.py --data_root=${HOME}/data/
+/bin/bash run_preprocessing.sh -r ${HOME}/pcq/
 ```
-## Generating Pre-processed features
+Or downloaded using:
 All the additional features used in training
 (k-fold splits and conformer position features) can be generated by running.
 ```bash
-/bin/bash run_preprocessing.sh -r ${HOME}/data/pcq/
+python download_pcq.py --task_root=${HOME}/pcq/ --payload="data"
 ```
 ## Reproducing our final results
-We have provided pre-trained weights of our final submission for convenience. To
+We have provided pre-trained weights of our final submission (~150 GB worth of
-reproduce our final results, please run `run_pretrained_eval.sh` as follows.
+model checkpoints) for convenience, which can be downloaded with:
 ```bash
-/bin/bash run_pretrained_eval.sh -r ${HOME}/data/pcq/
+python download_pcq.py --task_root=${HOME}/pcq/ --payload="models"
 ```
 Then to reproduce our final results please run:
 ```bash
 /bin/bash run_pretrained_eval.sh -r ${HOME}/pcq/
 ```
 Note that this script does not use the downloaded conformer position features,
 and instead computes them for the test set as part of the script.
 ## Retraining our model
 Disclaimer: This script is provided for illustrative purposes. It is not
 practical for actual training since it only uses a single machine, and likely
 requires reducing the batch size and/or model size to fit on a single GPU.
-If you still want to train a model, please run `run_training.sh`. To simply
+To train a model, please run:
 validate that the code is running correctly on your hardware setup, consider
 setting `debug=True` in `config.py`, which trains a smaller model.
 ```bash
-/bin/bash run_training.sh -r ${HOME}/data/pcq/
+/bin/bash run_training.sh -r ${HOME}/pcq/
 ```
 To simply validate that the code is running correctly on your hardware setup,
 consider setting `debug=True` in `config.py`, which trains a smaller model.
 # Citation