openfold.data.mmcif_parsing

Parses the mmCIF file format.

Classes

AtomSite(residue_name, author_chain_id, ...)

MmcifObject(file_id, header, structure, ...)

Representation of a parsed mmCIF file.

Monomer(id, num)

ParsingResult(mmcif_object, errors)

Returned by the parse function.

ResidueAtPosition(position, name, ...)

ResiduePosition(chain_id, residue_number, ...)

Functions

get_atom_coords(mmcif_object, chain_id[, ...])

get_release_date(parsed_info)

Returns the oldest revision date.

mmcif_loop_to_dict(prefix, index, parsed_info)

Extracts loop associated with a prefix from mmCIF data as a dictionary.

mmcif_loop_to_list(prefix, parsed_info)

Extracts loop associated with a prefix from mmCIF data as a list.

parse(*, file_id, mmcif_string[, ...])

Entry point, parses an mmcif_string.

Exceptions

ParseError

An error indicating that an mmCIF file could not be parsed.

exception ParseError

Bases: Exception

An error indicating that an mmCIF file could not be parsed.

class AtomSite(residue_name: str, author_chain_id: str, mmcif_chain_id: str, author_seq_num: str, mmcif_seq_num: int, insertion_code: str, hetatm_atom: str, model_num: int)
Parameters:
  • residue_name (str)

  • author_chain_id (str)

  • mmcif_chain_id (str)

  • author_seq_num (str)

  • mmcif_seq_num (int)

  • insertion_code (str)

  • hetatm_atom (str)

  • model_num (int)

author_chain_id: str
author_seq_num: str
hetatm_atom: str
insertion_code: str
mmcif_chain_id: str
mmcif_seq_num: int
model_num: int
residue_name: str
class MmcifObject(file_id, header, structure, chain_to_seqres, seqres_to_structure, raw_string)

Representation of a parsed mmCIF file.

Contains:
file_id: A meaningful name, e.g. a pdb_id. Should be unique amongst all

files being processed.

header: Biopython header. structure: Biopython structure. chain_to_seqres: Dict mapping chain_id to 1 letter amino acid sequence. E.g.

{‘A’: ‘ABCDEFG’}

seqres_to_structure: Dict; for each chain_id contains a mapping between
SEQRES index and a ResidueAtPosition. e.g. {‘A’: {0: ResidueAtPosition,

1: ResidueAtPosition, …}}

raw_string: The raw string used to construct the MmcifObject.

Parameters:
chain_to_seqres: Mapping[str, str]
file_id: str
header: Mapping[str, Any]
raw_string: Any
seqres_to_structure: Mapping[str, Mapping[int, ResidueAtPosition]]
structure: Structure
class Monomer(id: str, num: int)
Parameters:
id: str
num: int
class ParsingResult(mmcif_object, errors)

Returned by the parse function.

Contains:
mmcif_object: A MmcifObject, may be None if no chain could be successfully

parsed.

errors: A dict mapping (file_id, chain_id) to any exception generated.

Parameters:
errors: Mapping[Tuple[str, str], Any]
mmcif_object: MmcifObject | None
class ResidueAtPosition(position: ResiduePosition | None, name: str, is_missing: bool, hetflag: str)
Parameters:
hetflag: str
is_missing: bool
name: str
position: ResiduePosition | None
class ResiduePosition(chain_id: str, residue_number: int, insertion_code: str)
Parameters:
  • chain_id (str)

  • residue_number (int)

  • insertion_code (str)

chain_id: str
insertion_code: str
residue_number: int
get_atom_coords(mmcif_object, chain_id, _zero_center_positions=False)
Parameters:
Return type:

Tuple[ndarray, ndarray]

get_release_date(parsed_info)

Returns the oldest revision date.

Parameters:

parsed_info (Mapping[str, Sequence[str]])

Return type:

str

mmcif_loop_to_dict(prefix, index, parsed_info)

Extracts loop associated with a prefix from mmCIF data as a dictionary.

Parameters:
  • prefix (str) – Prefix shared by each of the data items in the loop. e.g. ‘_entity_poly_seq.’, where the data items are _entity_poly_seq.num, _entity_poly_seq.mon_id. Should include the trailing period.

  • index (str) – Which item of loop data should serve as the key.

  • parsed_info (Mapping[str, Sequence[str]]) – A dict of parsed mmCIF data, e.g. _mmcif_dict from a Biopython parser.

Returns:

Returns a dict of dicts; each dict represents 1 entry from an mmCIF loop, indexed by the index column.

Return type:

Mapping[str, Mapping[str, str]]

mmcif_loop_to_list(prefix, parsed_info)

Extracts loop associated with a prefix from mmCIF data as a list.

Reference for loop_ in mmCIF:

http://mmcif.wwpdb.org/docs/tutorials/mechanics/pdbx-mmcif-syntax.html

Parameters:
  • prefix (str) – Prefix shared by each of the data items in the loop. e.g. ‘_entity_poly_seq.’, where the data items are _entity_poly_seq.num, _entity_poly_seq.mon_id. Should include the trailing period.

  • parsed_info (Mapping[str, Sequence[str]]) – A dict of parsed mmCIF data, e.g. _mmcif_dict from a Biopython parser.

Returns:

Returns a list of dicts; each dict represents 1 entry from an mmCIF loop.

Return type:

Sequence[Mapping[str, str]]

parse(*, file_id, mmcif_string, catch_all_errors=True)

Entry point, parses an mmcif_string.

Parameters:
  • file_id (str) – A string identifier for this file. Should be unique within the collection of files being processed.

  • mmcif_string (str) – Contents of an mmCIF file.

  • catch_all_errors (bool) – If True, all exceptions are caught and error messages are returned as part of the ParsingResult. If False exceptions will be allowed to propagate.

Returns:

A ParsingResult.

Return type:

ParsingResult