openfold.data.mmcif_parsing¶

Parses the mmCIF file format.

Classes

`AtomSite`(residue_name, author_chain_id, ...)
`MmcifObject`(file_id, header, structure, ...)	Representation of a parsed mmCIF file.
`Monomer`(id, num)
`ParsingResult`(mmcif_object, errors)	Returned by the parse function.
`ResidueAtPosition`(position, name, ...)
`ResiduePosition`(chain_id, residue_number, ...)

Functions

`get_atom_coords`(mmcif_object, chain_id[, ...])
`get_release_date`(parsed_info)	Returns the oldest revision date.
`mmcif_loop_to_dict`(prefix, index, parsed_info)	Extracts loop associated with a prefix from mmCIF data as a dictionary.
`mmcif_loop_to_list`(prefix, parsed_info)	Extracts loop associated with a prefix from mmCIF data as a list.
`parse`(*, file_id, mmcif_string[, ...])	Entry point, parses an mmcif_string.

Exceptions

ParseError

An error indicating that an mmCIF file could not be parsed.

exception ParseError¶

Bases: Exception

An error indicating that an mmCIF file could not be parsed.

class AtomSite(residue_name: str, author_chain_id: str, mmcif_chain_id: str, author_seq_num: str, mmcif_seq_num: int, insertion_code: str, hetatm_atom: str, model_num: int)¶

Parameters:

residue_name (str)
author_chain_id (str)
mmcif_chain_id (str)
author_seq_num (str)
mmcif_seq_num (int)
insertion_code (str)
hetatm_atom (str)
model_num (int)

author_chain_id: str¶

author_seq_num: str¶

hetatm_atom: str¶

insertion_code: str¶

mmcif_chain_id: str¶

mmcif_seq_num: int¶

model_num: int¶

residue_name: str¶

class MmcifObject(file_id, header, structure, chain_to_seqres, seqres_to_structure, raw_string)¶

Representation of a parsed mmCIF file.

Contains:

file_id: A meaningful name, e.g. a pdb_id. Should be unique amongst all: files being processed.

header: Biopython header. structure: Biopython structure. chain_to_seqres: Dict mapping chain_id to 1 letter amino acid sequence. E.g.

{‘A’: ‘ABCDEFG’}

seqres_to_structure: Dict; for each chain_id contains a mapping between

SEQRES index and a ResidueAtPosition. e.g. {‘A’: {0: ResidueAtPosition,: 1: ResidueAtPosition, …}}

raw_string: The raw string used to construct the MmcifObject.

Parameters:

file_id (str)
header (Mapping[str, Any])
structure (Structure)
chain_to_seqres (Mapping[str, str])
seqres_to_structure (Mapping[str, Mapping[int, ResidueAtPosition]])
raw_string (Any)

chain_to_seqres: Mapping[str, str]¶

file_id: str¶

header: Mapping[str, Any]¶

raw_string: Any¶

seqres_to_structure: Mapping[str, Mapping[int, ResidueAtPosition]]¶

structure: Structure¶

class Monomer(id: str, num: int)¶

Parameters:

id (str)
num (int)

id: str¶

num: int¶

class ParsingResult(mmcif_object, errors)¶

Returned by the parse function.

Contains:

mmcif_object: A MmcifObject, may be None if no chain could be successfully: parsed.

errors: A dict mapping (file_id, chain_id) to any exception generated.

Parameters:

mmcif_object (MmcifObject | None)
errors (Mapping[Tuple[str, str], Any])

errors: Mapping[Tuple[str, str], Any]¶

mmcif_object: MmcifObject | None¶

class ResidueAtPosition(position: ResiduePosition | None, name: str, is_missing: bool, hetflag: str)¶

Parameters:

position (ResiduePosition | None)
name (str)
is_missing (bool)
hetflag (str)

hetflag: str¶

is_missing: bool¶

name: str¶

position: ResiduePosition | None¶

class ResiduePosition(chain_id: str, residue_number: int, insertion_code: str)¶

Parameters:

chain_id (str)
residue_number (int)
insertion_code (str)

chain_id: str¶

insertion_code: str¶

residue_number: int¶

get_atom_coords(mmcif_object, chain_id, _zero_center_positions=False)¶

Parameters:

mmcif_object (MmcifObject)
chain_id (str)
_zero_center_positions (bool)

Return type:

Tuple[ndarray, ndarray]

get_release_date(parsed_info)¶

Returns the oldest revision date.

Parameters:: parsed_info (Mapping[str, Sequence[str]])
Return type:: str

mmcif_loop_to_dict(prefix, index, parsed_info)¶

Extracts loop associated with a prefix from mmCIF data as a dictionary.

Parameters:

prefix (str) – Prefix shared by each of the data items in the loop. e.g. ‘_entity_poly_seq.’, where the data items are _entity_poly_seq.num, _entity_poly_seq.mon_id. Should include the trailing period.
index (str) – Which item of loop data should serve as the key.
parsed_info (Mapping[str, Sequence[str]]) – A dict of parsed mmCIF data, e.g. _mmcif_dict from a Biopython parser.

Returns:

Returns a dict of dicts; each dict represents 1 entry from an mmCIF loop, indexed by the index column.

Return type:

Mapping[str, Mapping[str, str]]

mmcif_loop_to_list(prefix, parsed_info)¶

Extracts loop associated with a prefix from mmCIF data as a list.

Reference for loop_ in mmCIF:: http://mmcif.wwpdb.org/docs/tutorials/mechanics/pdbx-mmcif-syntax.html

Parameters:

prefix (str) – Prefix shared by each of the data items in the loop. e.g. ‘_entity_poly_seq.’, where the data items are _entity_poly_seq.num, _entity_poly_seq.mon_id. Should include the trailing period.
parsed_info (Mapping[str, Sequence[str]]) – A dict of parsed mmCIF data, e.g. _mmcif_dict from a Biopython parser.

Returns:

Returns a list of dicts; each dict represents 1 entry from an mmCIF loop.

Return type:

Sequence[Mapping[str, str]]

parse(*, file_id, mmcif_string, catch_all_errors=True)¶

Entry point, parses an mmcif_string.

Parameters:

file_id (str) – A string identifier for this file. Should be unique within the collection of files being processed.
mmcif_string (str) – Contents of an mmCIF file.
catch_all_errors (bool) – If True, all exceptions are caught and error messages are returned as part of the ParsingResult. If False exceptions will be allowed to propagate.

Returns:

A ParsingResult.

Return type:

ParsingResult