openfold.data.data_pipeline¶
Classes
|
Runs alignment tools and saves the results |
|
Assembles input features. |
|
Runs the alignment tools and assembles the input features. |
Functions
|
Add features to distinguish between chains. |
|
Reshapes and modifies monomer features for multimer models. |
|
Encodes a number as a string, using reverse spreadsheet style naming. |
|
|
|
|
|
|
|
Constructs a feature dict of MSA features. |
|
|
|
|
|
Construct a feature dict of sequence features. |
process a single fasta file using features derived from a single template rather than an alignment |
|
|
|
|
|
|
Runs an MSA tool, checking if output already exists first. |
|
|
|
- class AlignmentRunner(jackhmmer_binary_path=None, hhblits_binary_path=None, uniref90_database_path=None, mgnify_database_path=None, bfd_database_path=None, uniref30_database_path=None, uniclust30_database_path=None, uniprot_database_path=None, template_searcher=None, use_small_bfd=None, no_cpus=None, uniref_max_hits=10000, mgnify_max_hits=5000, uniprot_max_hits=50000)¶
Runs alignment tools and saves the results
- Parameters:
jackhmmer_binary_path (str | None)
hhblits_binary_path (str | None)
uniref90_database_path (str | None)
mgnify_database_path (str | None)
bfd_database_path (str | None)
uniref30_database_path (str | None)
uniclust30_database_path (str | None)
uniprot_database_path (str | None)
use_small_bfd (bool | None)
no_cpus (int | None)
uniref_max_hits (int)
mgnify_max_hits (int)
uniprot_max_hits (int)
- class DataPipeline(template_featurizer)¶
Assembles input features.
- Parameters:
template_featurizer (TemplateHitFeaturizer | None)
- process_core(core_path, alignment_dir, alignment_index=None, seqemb_mode=False)¶
Assembles features for a protein in a ProteinNet .core file.
- process_fasta(fasta_path, alignment_dir, alignment_index=None, seqemb_mode=False)¶
Assembles features for a single sequence in a FASTA file
- process_mmcif(mmcif, alignment_dir, chain_id=None, alignment_index=None, seqemb_mode=False)¶
Assembles features for a specific chain in an mmCIF object.
If chain_id is None, it is assumed that there is only one chain in the object. Otherwise, a ValueError is thrown.
- Parameters:
mmcif (MmcifObject)
alignment_dir (str)
chain_id (str | None)
alignment_index (Any | None)
seqemb_mode (bool)
- Return type:
- process_multiseq_fasta(fasta_path, super_alignment_dir, ri_gap=200)¶
Assembles features for a multi-sequence FASTA. Uses Minkyung Baek’s hack from Twitter (a.k.a. AlphaFold-Gap).
- process_pdb(pdb_path, alignment_dir, is_distillation=True, chain_id=None, _structure_index=None, alignment_index=None, seqemb_mode=False)¶
Assembles features for a protein in a PDB file.
- class DataPipelineMultimer(monomer_data_pipeline)¶
Runs the alignment tools and assembles the input features.
- Parameters:
monomer_data_pipeline (DataPipeline)
- get_mmcif_features(mmcif_object, chain_id)¶
- Parameters:
mmcif_object (MmcifObject)
chain_id (str)
- Return type:
- process_fasta(fasta_path, alignment_dir, alignment_index=None)¶
Creates features.
- process_mmcif(mmcif, alignment_dir, alignment_index=None)¶
- Parameters:
mmcif (MmcifObject)
alignment_dir (str)
alignment_index (Any | None)
- Return type:
- add_assembly_features(all_chain_features)¶
Add features to distinguish between chains.
- Parameters:
all_chain_features (MutableMapping[str, MutableMapping[str, ndarray]]) – A dictionary which maps chain_id to a dictionary of features for each chain.
- Returns:
- A dictionary which maps strings of the form
<seq_id>_<sym_id> to the corresponding chain features. E.g. two chains from a homodimer would have keys A_1 and A_2. Two chains from a heterodimer would have keys A_1 and B_1.
- Return type:
all_chain_features
- convert_monomer_features(monomer_features, chain_id)¶
Reshapes and modifies monomer features for multimer models.
- Parameters:
monomer_features (MutableMapping[str, ndarray])
chain_id (str)
- Return type:
- int_id_to_str_id(num)¶
Encodes a number as a string, using reverse spreadsheet style naming.
- make_dummy_msa_feats(input_sequence)¶
- Return type:
- make_mmcif_features(mmcif_object, chain_id)¶
- Parameters:
mmcif_object (MmcifObject)
chain_id (str)
- Return type:
- make_msa_features(msas)¶
Constructs a feature dict of MSA features.
- Parameters:
- Return type:
- make_pdb_features(protein_object, description, is_distillation=True, confidence_threshold=50.0)¶
- make_protein_features(protein_object, description, _is_distillation=False)¶
- make_sequence_features(sequence, description, num_res)¶
Construct a feature dict of sequence features.
- make_sequence_features_with_custom_template(sequence, mmcif_path, pdb_id, chain_id, kalign_binary_path)¶
process a single fasta file using features derived from a single template rather than an alignment
- make_template_features(input_sequence, hits, template_featurizer)¶
- pad_msa(np_example, min_num_seq)¶
- run_msa_tool(msa_runner, fasta_path, msa_out_path, msa_format, max_sto_sequences=None)¶
Runs an MSA tool, checking if output already exists first.
- unify_template_features(template_feature_list)¶
- Parameters:
template_feature_list (Sequence[MutableMapping[str, ndarray]])
- Return type: