openfold.data.parsers¶
Functions for parsing various file formats.
Classes
|
|
|
Class representing a parsed MSA file |
|
Class representing a template hit. |
Functions
|
Converts MSA in Stockholm format to the A3M format. |
|
Remove duplicate sequences (ignoring insertions wrt query). |
|
Parses sequences and deletion matrix from a3m format alignment. |
|
Parse target to e-value mapping parsed from Jackhmmer tblout string. |
|
Parses FASTA string and returns list of strings with amino-acid sequences. |
|
Parses the content of an entire HHR file. |
|
Parses an a3m string produced by hmmsearch. |
|
Gets parsed template hits from the raw string output by the tool. |
|
Parses sequences and deletion matrix from stockholm format alignment. |
Removes empty columns (dashes-only) from a Stockholm MSA. |
|
|
Reads + truncates a Stockholm file while preventing excessive RAM usage. |
- class Msa(sequences, deletion_matrix, descriptions)¶
Class representing a parsed MSA file
- Parameters:
- class TemplateHit(index, name, aligned_cols, sum_probs, query, hit_sequence, indices_query, indices_hit)¶
Class representing a template hit.
- Parameters:
- convert_stockholm_to_a3m(stockholm_format, max_sequences=None, remove_first_row_gaps=True)¶
Converts MSA in Stockholm format to the A3M format.
- deduplicate_stockholm_msa(stockholm_msa)¶
Remove duplicate sequences (ignoring insertions wrt query).
- parse_a3m(a3m_string)¶
Parses sequences and deletion matrix from a3m format alignment.
- Parameters:
a3m_string (str) – The string contents of a a3m file. The first sequence in the file should be the query sequence.
- Returns:
- A list of sequences that have been aligned to the query. These
might contain duplicates.
- The deletion matrix for the alignment as a list of lists. The element
at deletion_matrix[i][j] is the number of residues deleted from the aligned sequence i at residue position j.
- Return type:
A tuple of
- parse_e_values_from_tblout(tblout)¶
Parse target to e-value mapping parsed from Jackhmmer tblout string.
- parse_fasta(fasta_string)¶
Parses FASTA string and returns list of strings with amino-acid sequences.
- Parameters:
fasta_string (str) – The string contents of a FASTA file.
- Returns:
A list of sequences.
- A list of sequence descriptions taken from the comment lines. In the
same order as the sequences.
- Return type:
A tuple of two lists
- parse_hhr(hhr_string)¶
Parses the content of an entire HHR file.
- Parameters:
hhr_string (str)
- Return type:
- parse_hmmsearch_a3m(query_sequence, a3m_string, skip_first=True)¶
Parses an a3m string produced by hmmsearch.
- Parameters:
- Returns:
A sequence of TemplateHit results.
- Return type:
- parse_hmmsearch_sto(output_string, input_sequence)¶
Gets parsed template hits from the raw string output by the tool.
- Parameters:
- Return type:
- parse_stockholm(stockholm_string)¶
Parses sequences and deletion matrix from stockholm format alignment.
- Parameters:
stockholm_string (str) – The string contents of a stockholm file. The first sequence in the file should be the query sequence.
- Returns:
- A list of sequences that have been aligned to the query. These
might contain duplicates.
- The deletion matrix for the alignment as a list of lists. The element
at deletion_matrix[i][j] is the number of residues deleted from the aligned sequence i at residue position j.
- The names of the targets matched, including the jackhmmer subsequence
suffix.
- Return type:
A tuple of
- remove_empty_columns_from_stockholm_msa(stockholm_msa)¶
Removes empty columns (dashes-only) from a Stockholm MSA.