mirror of
https://github.com/google-deepmind/deepmind-research.git
synced 2026-05-23 15:55:20 +08:00
Add more details about the output distogram format.
PiperOrigin-RevId: 310134498
This commit is contained in:
committed by
Diego de Las Casas
parent
7bb484fffa
commit
6f14cb5983
@@ -154,8 +154,8 @@ When running `run_eval.sh` the output has the following directory structure:
|
||||
|
||||
* **distogram/**: Contains 4 subfolders, one for each replica. Each of these
|
||||
contain the predicted ASA, secondary structure and a pickle file with the
|
||||
distogram for each crop. It also contains an `ensemble` directory with the
|
||||
ensembled distograms.
|
||||
distogram for each crop (see below for more details). It also contains an
|
||||
`ensemble` directory with the ensembled distograms.
|
||||
* **background_distogram/**: Contains 4 subfolders, one for each replica. Each
|
||||
of these contain a pickle file with the background distogram for each crop.
|
||||
It also contains an `ensemble` directory with the ensembled background
|
||||
@@ -170,6 +170,27 @@ When running `run_eval.sh` the output has the following directory structure:
|
||||
**This is the final distogram that was used in the subsequent AlphaFold
|
||||
folding pipeline in CASP13.**
|
||||
|
||||
### Distogram output format
|
||||
|
||||
The distogram is a Python pickle file with a dictionary containing the following
|
||||
fields:
|
||||
|
||||
* `min_range`: The minimum range in Angstroms to consider in distograms.
|
||||
* `max_range`: The range in Angstroms to consider in distograms, see
|
||||
`num_bins` below for clarification. The upper end of the distogram is
|
||||
`min_range + max_range`.
|
||||
* `num_bins`: The number of bins in the distance histogram being predicted. We
|
||||
divide the interval from `min_range` to `min_range + max_range` into this
|
||||
many bins. The distograms were trained so that distances lower than
|
||||
`min_range` were counted in the lowest bin and distances higher than
|
||||
`min_range + max_range` were added to the final bin. The `num_bins - 1`
|
||||
boundaries between bins are thus `np.linspace(0, max_range, num_bins + 1,
|
||||
endpoint=True)[1:-1] + min_range`.
|
||||
* `sequence`: The target sequence of amino acids of length `L`.
|
||||
* `target`: The name of the target.
|
||||
* `domain`: The name of the target including the domain name.
|
||||
* `probs`: The distogram as a Numpy array of shape `[L, L, num_bins]`.
|
||||
|
||||
## Data splits
|
||||
|
||||
We used a version of [PDB](https://www.rcsb.org/) downloaded on 2018-03-15. The
|
||||
|
||||
Reference in New Issue
Block a user