openfold.data.data_transforms

Functions

add_constant_field(protein, key, value)

add_distillation_flag(protein, distillation)

atom37_to_frames(protein[, eps])

atom37_to_torsion_angles(protein[, prefix])

Convert coordinates to torsion angles.

block_delete_msa(protein, config)

cast_to_64bit_ints(protein)

correct_msa_restypes(protein)

Correct MSA restype to have the same order as rc.

crop_extra_msa(protein, max_extra_msa)

crop_templates(protein, max_templates)

curry1(f)

Supply all arguments but the first.

delete_extra_msa(protein)

fix_templates_aatype(protein)

get_backbone_frames(protein)

get_chi_angles(protein)

get_chi_atom_indices()

Returns atom indices needed to compute chi angles for all residue types.

make_all_atom_aatype(protein)

make_atom14_masks(protein)

Construct denser atom positions (14 dimensions instead of 37).

make_atom14_masks_np(batch)

make_atom14_positions(protein)

Constructs denser atom positions (14 dimensions instead of 37).

make_fixed_size(protein, shape_schema, ...)

Guess at the MSA and sequence dimension to make fixed size.

make_hhblits_profile(protein)

Compute the HHblits MSA profile if not already present.

make_masked_msa(protein, config, ...)

Create data for BERT on raw MSA.

make_msa_feat(protein)

Create and concatenate MSA features.

make_msa_mask(protein)

Mask features are all ones, but will later be zero-padded.

make_one_hot(x, num_classes)

make_pseudo_beta(protein[, prefix])

Create pseudo-beta (alpha for glycine) position and mask.

make_seq_mask(protein)

make_template_mask(protein)

nearest_neighbor_clusters(protein[, ...])

pseudo_beta_fn(aatype, all_atom_positions, ...)

Create pseudo beta features.

random_crop_to_size(protein, crop_size, ...)

Crop randomly to crop_size, or keep as is if shorter than that.

randomly_replace_msa_with_unknown(protein, ...)

Replace a portion of the MSA with 'X'.

sample_msa(protein, max_seq, keep_extra[, seed])

Sample MSA randomly, remaining sequences are stored are stored as extra_*.

sample_msa_distillation(protein, max_seq)

select_feat(protein, feature_list)

shaped_categorical(probs[, epsilon])

squeeze_features(protein)

Remove singleton and repeated dimensions in protein features.

summarize_clusters(protein)

Produce profile and deletion_matrix_mean within each cluster.

unsorted_segment_sum(data, segment_ids, ...)

Computes the sum along segments of a tensor.

add_constant_field(protein, key, value)
add_distillation_flag(protein, distillation)
atom37_to_frames(protein, eps=1e-08)
atom37_to_torsion_angles(protein, prefix='')

Convert coordinates to torsion angles.

This function is extremely sensitive to floating point imprecisions and should be run with double precision whenever possible.

Parameters:

containing (Dict) –

  • (prefix)aatype:

    [*, N_res] residue indices

  • (prefix)all_atom_positions:

    [*, N_res, 37, 3] atom positions (in atom37 format)

  • (prefix)all_atom_mask:

    [*, N_res, 37] atom position mask

Returns:

“(prefix)torsion_angles_sin_cos” ([*, N_res, 7, 2])

Torsion angles

”(prefix)alt_torsion_angles_sin_cos” ([*, N_res, 7, 2])

Alternate torsion angles (accounting for 180-degree symmetry)

”(prefix)torsion_angles_mask” ([*, N_res, 7])

Torsion angles mask

Return type:

The same dictionary updated with the following features

block_delete_msa(protein, config)
cast_to_64bit_ints(protein)
correct_msa_restypes(protein)

Correct MSA restype to have the same order as rc.

crop_extra_msa(protein, max_extra_msa)
crop_templates(protein, max_templates)
curry1(f)

Supply all arguments but the first.

delete_extra_msa(protein)
fix_templates_aatype(protein)
get_backbone_frames(protein)
get_chi_angles(protein)
get_chi_atom_indices()

Returns atom indices needed to compute chi angles for all residue types.

Returns:

A tensor of shape [residue_types=21, chis=4, atoms=4]. The residue types are in the order specified in rc.restypes + unknown residue type at the end. For chi angles which are not defined on the residue, the positions indices are by default set to 0.

make_all_atom_aatype(protein)
make_atom14_masks(protein)

Construct denser atom positions (14 dimensions instead of 37).

make_atom14_masks_np(batch)
make_atom14_positions(protein)

Constructs denser atom positions (14 dimensions instead of 37).

make_fixed_size(protein, shape_schema, msa_cluster_size, extra_msa_size, num_res=0, num_templates=0)

Guess at the MSA and sequence dimension to make fixed size.

make_hhblits_profile(protein)

Compute the HHblits MSA profile if not already present.

make_masked_msa(protein, config, replace_fraction, seed)

Create data for BERT on raw MSA.

make_msa_feat(protein)

Create and concatenate MSA features.

make_msa_mask(protein)

Mask features are all ones, but will later be zero-padded.

make_one_hot(x, num_classes)
make_pseudo_beta(protein, prefix='')

Create pseudo-beta (alpha for glycine) position and mask.

make_seq_mask(protein)
make_template_mask(protein)
nearest_neighbor_clusters(protein, gap_agreement_weight=0.0)
pseudo_beta_fn(aatype, all_atom_positions, all_atom_mask)

Create pseudo beta features.

random_crop_to_size(protein, crop_size, max_templates, shape_schema, subsample_templates=False, seed=None)

Crop randomly to crop_size, or keep as is if shorter than that.

randomly_replace_msa_with_unknown(protein, replace_proportion)

Replace a portion of the MSA with ‘X’.

sample_msa(protein, max_seq, keep_extra, seed=None)

Sample MSA randomly, remaining sequences are stored are stored as extra_*.

sample_msa_distillation(protein, max_seq)
select_feat(protein, feature_list)
shaped_categorical(probs, epsilon=1e-10)
squeeze_features(protein)

Remove singleton and repeated dimensions in protein features.

summarize_clusters(protein)

Produce profile and deletion_matrix_mean within each cluster.

unsorted_segment_sum(data, segment_ids, num_segments)

Computes the sum along segments of a tensor. Similar to tf.unsorted_segment_sum, but only supports 1-D indices.

Parameters:
  • data – A tensor whose segments are to be summed.

  • segment_ids – The 1-D segment indices tensor.

  • num_segments – The number of segments.

Returns:

A tensor of same data type as the data argument.