Skip to content

Commit

Permalink
[augur ancestral] create annotation block
Browse files Browse the repository at this point in the history
For a detailed write-up of the bug which motivated this commit, see
nextstrain#881.

By storing the (nucleotide) genome annotation in the node-data produced
from augur-ancestral we make this information available for export.
Previously this information was only exported by `augur translate` which
was problematic for workflows which didn't perform translation.

No changes are needed to `augur export v2` (which may now process
multiple "annotations" blocks) due to the behavior of
`NodeData.deep_add_or_update` which will recurse into dicts in
annotation blocks and when confronted with non-dict values which already
exist overwrite them. This poses a potential problem where two node-data
JSONs which (e.g.) define different `annotations['nuc']` coordinates
will not raise any error and the output coodinates are dependent on
the order the node-data JSONs were provided to `augur export v2`.

Closes nextstrain#881.
  • Loading branch information
jameshadfield authored and victorlin committed Jun 30, 2022
1 parent 93dc8d1 commit af35e77
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 0 deletions.
2 changes: 2 additions & 0 deletions augur/ancestral.py
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,8 @@ def run(args):
if anc_seqs.get("mask") is not None:
anc_seqs["mask"] = "".join(['1' if x else '0' for x in anc_seqs["mask"]])

anc_seqs['annotations'] = {'nuc': {'start': 1, 'end': len(anc_seqs['reference']['nuc']), 'strand': '+'}}

out_name = get_json_name(args, '.'.join(args.alignment.split('.')[:-1]) + '_mutations.json')
write_json(anc_seqs, out_name)
print("ancestral mutations written to", out_name, file=sys.stdout)
Expand Down
7 changes: 7 additions & 0 deletions tests/builds/zika/results/nt_muts.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
{
"annotations": {
"nuc": {
"end": 10769,
"start": 1,
"strand": "+"
}
},
"generated_by": {
"program": "augur",
"version": "7.0.2"
Expand Down
10 changes: 10 additions & 0 deletions tests/functional/ancestral.t
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,16 @@ The default is to infer ambiguous bases, so there should not be N bases in the i
$ grep N "$TMP/ancestral_sequences.fasta"
>NODE_0000000

Check that the reference length was correctly exported as the nuc annotation
$ grep -A 6 'annotations' "$TMP/ancestral_mutations.json"
"annotations": {
"nuc": {
"end": 10769,
"start": 1,
"strand": "+"
}
},

Infer ancestral sequences for the given tree and alignment, explicitly requesting that ambiguous bases are inferred.
There should not be N bases in the inferred output sequences.

Expand Down

0 comments on commit af35e77

Please sign in to comment.