openfold.data.msa_pairing

Pairing logic for multimer data pipeline.

Functions

block_diag(*arrs[, pad_value])

Like scipy.linalg.block_diag but with an optional padding value.

create_paired_features(chains)

Returns the original chains with paired NUM_SEQ features.

deduplicate_unpaired_sequences(np_chains)

Removes unpaired sequences which duplicate a paired sequence.

merge_chain_features(np_chains_list, ...)

Merges features for multiple chains to single FeatureDict.

pad_features(feature, feature_name)

Add a 'padding' row at the end of the features list.

pair_sequences(examples)

Returns indices for paired MSA sequences across chains.

reorder_paired_rows(all_paired_msa_rows_dict)

Creates a list of indices of paired MSA rows across chains.

block_diag(*arrs, pad_value=0.0)

Like scipy.linalg.block_diag but with an optional padding value.

Parameters:
Return type:

ndarray

create_paired_features(chains)

Returns the original chains with paired NUM_SEQ features.

Parameters:

chains (Iterable[Mapping[str, ndarray]]) – A list of feature dictionaries for each chain.

Returns:

A list of feature dictionaries with sequence features including only rows to be paired.

Return type:

List[Mapping[str, ndarray]]

deduplicate_unpaired_sequences(np_chains)

Removes unpaired sequences which duplicate a paired sequence.

Parameters:

np_chains (List[Mapping[str, ndarray]])

Return type:

List[Mapping[str, ndarray]]

merge_chain_features(np_chains_list, pair_msa_sequences, max_templates)

Merges features for multiple chains to single FeatureDict.

Parameters:
  • np_chains_list (List[Mapping[str, ndarray]]) – List of FeatureDicts for each chain.

  • pair_msa_sequences (bool) – Whether to merge paired MSAs.

  • max_templates (int) – The maximum number of templates to include.

Returns:

Single FeatureDict for entire complex.

Return type:

Mapping[str, ndarray]

pad_features(feature, feature_name)

Add a ‘padding’ row at the end of the features list.

The padding row will be selected as a ‘paired’ row in the case of partial alignment - for the chain that doesn’t have paired alignment.

Parameters:
  • feature (ndarray) – The feature to be padded.

  • feature_name (str) – The name of the feature to be padded.

Returns:

The feature with an additional padding row.

Return type:

ndarray

pair_sequences(examples)

Returns indices for paired MSA sequences across chains.

Parameters:

examples (List[Mapping[str, ndarray]])

Return type:

Dict[int, ndarray]

reorder_paired_rows(all_paired_msa_rows_dict)

Creates a list of indices of paired MSA rows across chains.

Parameters:

all_paired_msa_rows_dict (Dict[int, ndarray]) – a mapping from the number of paired chains to the paired indices.

Returns:

a list of lists, each containing indices of paired MSA rows across chains. The paired-index lists are ordered by:

  1. the number of chains in the paired alignment, i.e, all-chain pairings will come first.

  2. e-values

Return type:

ndarray