mirror of
https://github.com/google-deepmind/deepmind-research.git
synced 2026-05-27 18:25:49 +08:00
Add paper link.
PiperOrigin-RevId: 289852567
This commit is contained in:
committed by
Diego de Las Casas
parent
32abf74fd5
commit
c8052237b8
+37
-30
@@ -4,12 +4,15 @@ This package provides an implementation of the contact prediction network,
|
|||||||
associated model weights and CASP13 dataset as published in Nature.
|
associated model weights and CASP13 dataset as published in Nature.
|
||||||
|
|
||||||
Any publication that discloses findings arising from using this source code must
|
Any publication that discloses findings arising from using this source code must
|
||||||
cite *AlphaFold: Protein structure prediction using potentials from deep
|
cite *Improved protein structure prediction using potentials from deep learning*
|
||||||
learning* by Andrew W. Senior, Richard Evans, John Jumper, James Kirkpatrick,
|
by Andrew W. Senior, Richard Evans, John Jumper, James Kirkpatrick, Laurent
|
||||||
Laurent Sifre, Tim Green, Chongli Qin, Augustin Žídek, Alexander W. R. Nelson,
|
Sifre, Tim Green, Chongli Qin, Augustin Žídek, Alexander W. R. Nelson, Alex
|
||||||
Alex Bridgland, Hugo Penedones, Stig Petersen, Karen Simonyan, Steve Crossan,
|
Bridgland, Hugo Penedones, Stig Petersen, Karen Simonyan, Steve Crossan,
|
||||||
Pushmeet Kohli, David T. Jones, David Silver, Koray Kavukcuoglu, Demis Hassabis.
|
Pushmeet Kohli, David T. Jones, David Silver, Koray Kavukcuoglu, Demis Hassabis.
|
||||||
|
|
||||||
|
The paper is available at https://www.nature.com/articles/s41586-019-1923-7 (DOI
|
||||||
|
10.1038/s41586-019-1923-7).
|
||||||
|
|
||||||
## Setup
|
## Setup
|
||||||
|
|
||||||
### Dependencies
|
### Dependencies
|
||||||
@@ -24,8 +27,9 @@ Pushmeet Kohli, David T. Jones, David Silver, Koray Kavukcuoglu, Demis Hassabis.
|
|||||||
2.0+.
|
2.0+.
|
||||||
* [TensorFlow Probability 0.7.0](https://www.tensorflow.org/probability)
|
* [TensorFlow Probability 0.7.0](https://www.tensorflow.org/probability)
|
||||||
|
|
||||||
You can set up Python virtual environment with these dependencies inside the
|
You can set up Python virtual environment (you might need to install the
|
||||||
forked `deepmind_research` repository using:
|
`python3-venv` package first) with all needed dependencies inside the forked
|
||||||
|
`deepmind_research` repository using:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
python3 -m venv alphafold_venv
|
python3 -m venv alphafold_venv
|
||||||
@@ -34,33 +38,32 @@ pip install wheel
|
|||||||
pip install -r alphafold_casp13/requirements.txt
|
pip install -r alphafold_casp13/requirements.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
Alternatively, you can just use the `run_eval.sh` script provided which runs
|
Alternatively, you can just use the `run_eval.sh` script provided which will run
|
||||||
these commands for you, see the section on running the system below for more
|
these commands for you. See the section on running the system below for more
|
||||||
details.
|
details.
|
||||||
|
|
||||||
## Data
|
## Data
|
||||||
|
|
||||||
While the code is licensed under the Apache License, the AlphaFold weights and
|
While the code is licensed under the Apache 2.0 License, the AlphaFold weights
|
||||||
data are made available for non-commercial use only under the terms of the
|
and data are made available for non-commercial use only under the terms of the
|
||||||
Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
|
Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
|
||||||
license. You can find details at:
|
license. You can find details at:
|
||||||
https://creativecommons.org/licenses/by-nc/4.0/legalcode
|
https://creativecommons.org/licenses/by-nc/4.0/legalcode
|
||||||
|
|
||||||
In order to download the AlphaFold weights and data, you will need to request
|
You can download the data from:
|
||||||
access using the
|
|
||||||
[request form](https://docs.google.com/forms/d/1yrZXhQfSlwYnouDujrL2RkZKVBjF5AjomyF_RJ95dew/).
|
|
||||||
|
|
||||||
Once you have obtained access, you can download the data from
|
|
||||||
[Google Cloud Storage](https://console.cloud.google.com/storage/browser/alphafold_casp13_data).
|
|
||||||
|
|
||||||
|
* http://bit.ly/alphafold-data-license: The data license file.
|
||||||
|
* http://bit.ly/alphafold-data-casp13: The dataset to reproduce AlphaFold's
|
||||||
|
CASP13 results.
|
||||||
|
* http://bit.ly/alphafold-data-weights: The model checkpoints.
|
||||||
|
|
||||||
### Input data
|
### Input data
|
||||||
|
|
||||||
The dataset to reproduce AlphaFold's CASP13 results can be downloaded from
|
The dataset to reproduce AlphaFold's CASP13 results can be downloaded from
|
||||||
[Google Cloud Storage](https://console.cloud.google.com/storage/browser/alphafold_casp13_data).
|
http://bit.ly/alphafold-data-casp13. The dataset is in a single zip file called
|
||||||
The dataset is in a file called `casp13_data.zip` which has about **43.5 GB**.
|
`casp13_data.zip` which has about **43.5 GB**.
|
||||||
|
|
||||||
The zip file contains 1 directory for each CASP13 target and a `LICENSE.md`
|
The zip file contains 1 directory for each CASP13 target and a `LICENSE.txt`
|
||||||
file. Each target directory contains the following files:
|
file. Each target directory contains the following files:
|
||||||
|
|
||||||
1. `TARGET.tfrec` file. This is a
|
1. `TARGET.tfrec` file. This is a
|
||||||
@@ -84,9 +87,8 @@ targets to get the contact map.
|
|||||||
### Model checkpoints
|
### Model checkpoints
|
||||||
|
|
||||||
The model checkpoints can be downloaded from
|
The model checkpoints can be downloaded from
|
||||||
[Google Cloud Storage](https://console.cloud.google.com/storage/browser/alphafold_casp13_data).
|
http://bit.ly/alphafold-data-weights. The model checkpoints are in a zip file
|
||||||
The model checkpoints are in a file called `alphafold_casp13_weights.zip` which
|
called `alphafold_casp13_weights.zip` which has about **210 MB**.
|
||||||
has about **210 MB**.
|
|
||||||
|
|
||||||
The zip file contains:
|
The zip file contains:
|
||||||
|
|
||||||
@@ -94,7 +96,7 @@ The zip file contains:
|
|||||||
1. A directory `916425`. This contains the weights for the background distogram
|
1. A directory `916425`. This contains the weights for the background distogram
|
||||||
model.
|
model.
|
||||||
1. A directory `941521`. This contains the weights for the torsion model.
|
1. A directory `941521`. This contains the weights for the torsion model.
|
||||||
1. `LICENSE.md`. The model checkpoints have a non-commercial license which is
|
1. `LICENSE.txt`. The model checkpoints have a non-commercial license which is
|
||||||
defined in this file.
|
defined in this file.
|
||||||
|
|
||||||
Each directory with model weights contains a number of different model
|
Each directory with model weights contains a number of different model
|
||||||
@@ -109,15 +111,18 @@ used for feature normalization specific to that model.
|
|||||||
You can use the `run_eval.sh` script to run the entire Distogram prediction
|
You can use the `run_eval.sh` script to run the entire Distogram prediction
|
||||||
system. There are a few steps you need to start with:
|
system. There are a few steps you need to start with:
|
||||||
|
|
||||||
1. Download the input data as described above. Unpack the data in the
|
1. Download the input data as described above. Unpack the data in the directory
|
||||||
directory with the code.
|
with the code.
|
||||||
1. Download the model checkpoints as described above. Unpack the data.
|
1. Download the model checkpoints as described above. Unpack the data.
|
||||||
1. In `run_eval.sh` set the following:
|
1. In `run_eval.sh` set the following:
|
||||||
* `DISTOGRAM_MODEL` to the path to the directory with the distogram model.
|
* `DISTOGRAM_MODEL` to the path to the directory with the distogram model.
|
||||||
* `BACKGROUND_MODEL` to the path to the directory with the background
|
* `BACKGROUND_MODEL` to the path to the directory with the background
|
||||||
model.
|
model.
|
||||||
* `TORSION_MODEL` to the path to the directory with the torsion model.
|
* `TORSION_MODEL` to the path to the directory with the torsion model.
|
||||||
* `TARGET` to the path to the directory with the target input data.
|
* `TARGET` to the name of the target.
|
||||||
|
* `TARGET_PATH` to the path to the directory with the target input data.
|
||||||
|
* `OUTPUT_DIR` is by default set to a new directory with a timestamp
|
||||||
|
within your home directory.
|
||||||
|
|
||||||
Then run `alphafold_casp13/run_eval.sh` from the `deepmind_research` parent
|
Then run `alphafold_casp13/run_eval.sh` from the `deepmind_research` parent
|
||||||
directory (you will get errors if you try running `run_eval.sh` directly from
|
directory (you will get errors if you try running `run_eval.sh` directly from
|
||||||
@@ -133,8 +138,8 @@ The contact prediction works in the following way:
|
|||||||
1. 1 replica is launched to predict the torsions.
|
1. 1 replica is launched to predict the torsions.
|
||||||
1. The predictions from the different replicas are averaged together using
|
1. The predictions from the different replicas are averaged together using
|
||||||
`ensemble_contact_maps.py`.
|
`ensemble_contact_maps.py`.
|
||||||
1. The predictions for the 64 × 64 distogram crops are pasted together using
|
1. The predictions for the 64 × 64, 128 × 128 and 256 × 256 distogram crops are
|
||||||
`paste_contact_maps.py`.
|
pasted together using `paste_contact_maps.py`.
|
||||||
|
|
||||||
When running `run_eval.sh` the output has the following directory structure:
|
When running `run_eval.sh` the output has the following directory structure:
|
||||||
|
|
||||||
@@ -149,7 +154,8 @@ When running `run_eval.sh` the output has the following directory structure:
|
|||||||
* **torsion/**: Contains 1 subfolder as there was only a single replica. This
|
* **torsion/**: Contains 1 subfolder as there was only a single replica. This
|
||||||
folder contains contains the predicted ASA, secondary structure, backbone
|
folder contains contains the predicted ASA, secondary structure, backbone
|
||||||
torsions and a pickle file with the distogram for each crop. It also
|
torsions and a pickle file with the distogram for each crop. It also
|
||||||
contains an `ensemble` directory with the ensembled torsions.
|
contains an `ensemble` directory, which contains a copy of the predicted
|
||||||
|
output as there is only a single replica in this case.
|
||||||
* **pasted/**: Contains distograms obtained from the ensembled distograms by
|
* **pasted/**: Contains distograms obtained from the ensembled distograms by
|
||||||
pasting. An RR contact map file is computed from this pasted distogram.
|
pasting. An RR contact map file is computed from this pasted distogram.
|
||||||
**This is the final distogram that was used in the subsequent AlphaFold
|
**This is the final distogram that was used in the subsequent AlphaFold
|
||||||
@@ -159,6 +165,7 @@ When running `run_eval.sh` the output has the following directory structure:
|
|||||||
|
|
||||||
We used a version of [PDB](https://www.rcsb.org/) downloaded on 2018-03-15. The
|
We used a version of [PDB](https://www.rcsb.org/) downloaded on 2018-03-15. The
|
||||||
train/test split can be found in the `train_domains.txt` and `test_domains.txt`
|
train/test split can be found in the `train_domains.txt` and `test_domains.txt`
|
||||||
files.
|
files in this repository. The split is based on the
|
||||||
|
[CATH 2018-03-16](https://www.cathdb.info/) database.
|
||||||
|
|
||||||
Disclaimer: This is not an official Google product.
|
Disclaimer: This is not an official Google product.
|
||||||
|
|||||||
Reference in New Issue
Block a user