openfold.np.protein

Protein data type.

Classes

Protein(atom_positions, aatype, atom_mask, ...)

Protein structure representation.

Functions

add_pdb_headers(prot, pdb_str)

Add pdb headers to an existing PDB string.

from_pdb_string(pdb_str[, chain_id])

Takes a PDB string and constructs a Protein object.

from_prediction(features, result[, ...])

Assembles a protein from a prediction.

from_proteinnet_string(proteinnet_str)

get_pdb_headers(prot[, chain_id])

ideal_atom_mask(prot)

Computes an ideal atom mask.

to_modelcif(prot)

Converts a Protein instance to a ModelCIF string.

to_pdb(prot)

Converts a Protein instance to a PDB string.

class Protein(atom_positions, aatype, atom_mask, residue_index, b_factors, chain_index=None, remark=None, parents=None, parents_chain_index=None)

Protein structure representation.

Parameters:
aatype: ndarray
atom_mask: ndarray
atom_positions: ndarray
b_factors: ndarray
chain_index: ndarray | None = None
parents: Sequence[str] | None = None
parents_chain_index: Sequence[int] | None = None
remark: str | None = None
residue_index: ndarray
add_pdb_headers(prot, pdb_str)

Add pdb headers to an existing PDB string. Useful during multi-chain recycling

Parameters:
Return type:

str

from_pdb_string(pdb_str, chain_id=None)

Takes a PDB string and constructs a Protein object.

WARNING: All non-standard residue types will be converted into UNK. All

non-standard atoms will be ignored.

Parameters:
  • pdb_str (str) – The contents of the pdb file

  • chain_id (str | None) – If None, then the whole pdb file is parsed. If chain_id is specified (e.g. A), then only that chain is parsed.

Returns:

A new Protein parsed from the pdb contents.

Return type:

Protein

from_prediction(features, result, b_factors=None, remove_leading_feature_dimension=True, remark=None, parents=None, parents_chain_index=None)

Assembles a protein from a prediction.

Parameters:
  • features (Mapping[str, ndarray]) – Dictionary holding model inputs.

  • result (Mapping[str, Any]) – Dictionary holding model outputs.

  • b_factors (ndarray | None) – (Optional) B-factors to use for the protein.

  • remove_leading_feature_dimension (bool) – Whether to remove the leading dimension of the features values

  • chain_index – (Optional) Chain indices for multi-chain predictions

  • remark (str | None) – (Optional) Remark about the prediction

  • parents (Sequence[str] | None) – (Optional) List of template names

  • parents_chain_index (Sequence[int] | None)

Returns:

A protein instance.

Return type:

Protein

from_proteinnet_string(proteinnet_str)
Parameters:

proteinnet_str (str)

Return type:

Protein

get_pdb_headers(prot, chain_id=0)
Parameters:
Return type:

Sequence[str]

ideal_atom_mask(prot)

Computes an ideal atom mask.

Protein.atom_mask typically is defined according to the atoms that are reported in the PDB. This function computes a mask according to heavy atoms that should be present in the given sequence of amino acids.

Parameters:

prot (Protein) – Protein whose fields are numpy.ndarray objects.

Returns:

An ideal atom mask.

Return type:

ndarray

to_modelcif(prot)

Converts a Protein instance to a ModelCIF string. Chains with identical modelled coordinates will be treated as the same polymer entity. But note that if chains differ in modelled regions, no attempt is made at identifying them as a single polymer entity.

Parameters:

prot (Protein) – The protein to convert to PDB.

Returns:

ModelCIF string.

Return type:

str

to_pdb(prot)

Converts a Protein instance to a PDB string.

Parameters:

prot (Protein) – The protein to convert to PDB.

Returns:

PDB string.

Return type:

str