openfold.data.data_transforms_multimer¶
Functions
|
Expand extra_msa into 1hot and concat with other extra msa features. |
|
Create and concatenate MSA features. |
|
Create the target features |
|
|
|
|
|
|
|
Samples with replacement from a distribution given by 'logits'. |
|
Samples from a probability distribution given by 'logits'. |
|
Generate Gumbel Noise of given Shape. |
|
Create data for BERT on raw MSA. |
|
Compute the MSA profile. |
|
Assign each extra MSA sequence to its nearest neighbor in sampled MSA. |
|
|
|
Crop randomly to crop_size, or keep as is if shorter than that. |
|
Sample MSA randomly, remaining sequences are stored as extra_*. |
- build_extra_msa_feat(batch)¶
Expand extra_msa into 1hot and concat with other extra msa features.
We do this as late as possible as the one_hot extra msa can be very large.
- Parameters:
batch –
a dictionary with the following keys: * ‘extra_msa’: [num_seq, num_res] MSA that wasn’t selected as a cluster
centre. Note - This isn’t one-hotted.
- ’extra_deletion_matrix’: [num_seq, num_res] Number of deletions at given
position.
num_extra_msa – Number of extra msa to use.
- Returns:
Concatenated tensor of extra MSA features.
- create_msa_feat(batch)¶
Create and concatenate MSA features.
- create_target_feat(batch)¶
Create the target features
- get_contiguous_crop_idx(protein, crop_size, generator)¶
- get_interface_residues(positions, atom_mask, asym_id, interface_threshold)¶
- get_spatial_crop_idx(protein, crop_size, interface_threshold, generator)¶
- gumbel_argsort_sample_idx(logits, generator=None)¶
Samples with replacement from a distribution given by ‘logits’.
This uses Gumbel trick to implement the sampling an efficient manner. For a distribution over k items this samples k times without replacement, so this is effectively sampling a random permutation with probabilities over the permutations derived from the logprobs.
- gumbel_max_sample(logits, generator=None)¶
Samples from a probability distribution given by ‘logits’.
This uses Gumbel-max trick to implement the sampling in an efficient manner.
- gumbel_noise(shape, device, eps=1e-06, generator=None)¶
Generate Gumbel Noise of given Shape.
This generates samples from Gumbel(0, 1).
- make_masked_msa(batch, config, replace_fraction, seed, eps=1e-06)¶
Create data for BERT on raw MSA.
- make_msa_profile(batch)¶
Compute the MSA profile.
- nearest_neighbor_clusters(batch, gap_agreement_weight=0.0)¶
Assign each extra MSA sequence to its nearest neighbor in sampled MSA.
- randint(lower, upper, generator, device)¶
- random_crop_to_size(protein, crop_size, max_templates, shape_schema, spatial_crop_prob, interface_threshold, subsample_templates=False, seed=None)¶
Crop randomly to crop_size, or keep as is if shorter than that.
- sample_msa(batch, max_seq, max_extra_msa_seq, seed, inf=1000000.0)¶
Sample MSA randomly, remaining sequences are stored as extra_*.
- Parameters:
batch – batch to sample msa from.
max_seq – number of sequences to sample.
- Returns:
Protein with sampled msa.