openfold.data.msa_pairing¶

Pairing logic for multimer data pipeline.

Functions

`block_diag`(*arrs[, pad_value])	Like scipy.linalg.block_diag but with an optional padding value.
`create_paired_features`(chains)	Returns the original chains with paired NUM_SEQ features.
`deduplicate_unpaired_sequences`(np_chains)	Removes unpaired sequences which duplicate a paired sequence.
`merge_chain_features`(np_chains_list, ...)	Merges features for multiple chains to single FeatureDict.
`pad_features`(feature, feature_name)	Add a 'padding' row at the end of the features list.
`pair_sequences`(examples)	Returns indices for paired MSA sequences across chains.
`reorder_paired_rows`(all_paired_msa_rows_dict)	Creates a list of indices of paired MSA rows across chains.

block_diag(*arrs, pad_value=0.0)¶

Like scipy.linalg.block_diag but with an optional padding value.

Parameters:

Return type:

ndarray

create_paired_features(chains)¶

Returns the original chains with paired NUM_SEQ features.

Parameters:: chains (Iterable[Mapping[str, ndarray]]) – A list of feature dictionaries for each chain.
Returns:: A list of feature dictionaries with sequence features including only rows to be paired.
Return type:: List[Mapping[str, ndarray]]

deduplicate_unpaired_sequences(np_chains)¶

Removes unpaired sequences which duplicate a paired sequence.

merge_chain_features(np_chains_list, pair_msa_sequences, max_templates)¶

Merges features for multiple chains to single FeatureDict.

Parameters:

np_chains_list (List[Mapping[str, ndarray]]) – List of FeatureDicts for each chain.
pair_msa_sequences (bool) – Whether to merge paired MSAs.
max_templates (int) – The maximum number of templates to include.

Returns:

Single FeatureDict for entire complex.

Return type:

Mapping[str, ndarray]

pad_features(feature, feature_name)¶

Add a ‘padding’ row at the end of the features list.

The padding row will be selected as a ‘paired’ row in the case of partial alignment - for the chain that doesn’t have paired alignment.

Parameters:

Returns:

The feature with an additional padding row.

Return type:

ndarray

pair_sequences(examples)¶

Returns indices for paired MSA sequences across chains.

reorder_paired_rows(all_paired_msa_rows_dict)¶

Creates a list of indices of paired MSA rows across chains.

Parameters:

all_paired_msa_rows_dict (Dict[int, ndarray]) – a mapping from the number of paired chains to the paired indices.

Returns:

a list of lists, each containing indices of paired MSA rows across chains. The paired-index lists are ordered by:

the number of chains in the paired alignment, i.e, all-chain pairings will come first.

e-values

Return type:

ndarray