Entropy panel unavailable if mutations aren't translated #881

jameshadfield · 2022-04-03T23:46:52Z

Current Behaviour

For the entropy panel to be displayed in auspice we have three requirements:

JSON.panels includes "entropy"
JSON.meta.genome_annotations exists and includes at least a {nuc: {start: INT, end: INT}} object.
Some mutations to be annotated on branches in the tree.

These typically come from an augur workflow with steps:

(i) augur ancestral (does not create a nodeDataJSON.annotations object)
(ii) augur translate (creates a nodeDataJSON.annotations object)
(iii) augur export (takes care of requirement 1 unless you opt-out, and converts the nodeDataJSON.annotations object to JSON.meta.genome_annotations).

However if you choose not to translate mutations (step ii) then no annotations object is available for export, and thus requirement 2 is not met and the entropy panel is not displayed.

Expected behavior

A pipeline using steps (i, iii) should be valid. In other words, translating mutations should be optional.

Possible solution

The solution is not as simple as just adding an annotations block to the node-data JSON produced by augur ancestral, as augur export v2 assumes that there will only be one of these blocks.

The most consistent would be for

augur ancestral creates a nodeDataJSON.annotations object with nuc information
augur translate creates a nodeDataJSON.annotations object with per-gene information
augur export v2 accepts multiple annotations blocks and combines them, accepting identical duplicate elements and exiting if there are conflicts.

The text was updated successfully, but these errors were encountered:

corneliusroemer · 2022-05-25T13:50:30Z

I got extremely confused by this bug. I encountered it as the following error in a workflow that uses augur translate and just has a .gff as input, not a .gb with nuc annotation.

The error is:

Validating schema of 'auspice/monkeypox_global.json'...
        ERROR: 'nuc' is a required property. Trace: properties - meta - properties - genome_annotations - required
Validation of 'auspice/monkeypox_global.json' failed.

------------------------
Validation of auspice/monkeypox_global.json failed. Please check this in a local instance of `auspice`, as it is not expected to display correctly. 
------------------------

Thrown by augur export. I couldn't figure out how to get it to just read nuc from the .gff, which I assumed was possible. Couldn't believe it isn't. But that's the way it seems to be 😬

This is a very serious bug, it really makes using the same genemap for Nextclade and Nextclade reference build of monkeypox impossible which is not good, discrepancies arise like this.

corneliusroemer · 2022-05-25T13:52:52Z

Unless the problem described by me is not a real bug, or not identical, feel free to readjust pain score. But based on my use case in addition to the original from @jameshadfield I've upgraded this Crash, blocking and some users. It's crashing my workflow, there's no sane workaround and it cost me a lot of time. I also can imagine this happening to others if they don't use a .gb as input for translate but .gff which is theoretically supported.

corneliusroemer · 2022-05-25T14:01:01Z

The workaround for me is to:

edit the line

augur/augur/utils.py

Line 150 in 5916362

limit_info = dict( gff_type = ['gene'] )

to

limit_info = dict( gff_type = ['gene', 'source', 'nuc'] )

Use the following nuc annotation in the genemap.gff:

MT903344.1	Genbank	source	1	197233	.	+	.	locus_tag=nuc

I'll put in a PR, hope that doesn't break anything. Should I open this as a separate issue @jameshadfield ?

For a detailed write-up of the bug which motivated this commit, see #881. By storing the (nucleotide) genome annotation in the node-data produced from augur-ancestral we make this information available for export. Previously this information was only exported by `augur translate` which was problematic for workflows which didn't perform translation. No changes are needed to `augur export v2` (which may now process multiple "annotations" blocks) due to the behavior of `NodeData.deep_add_or_update` which will recurse into dicts in annotation blocks and when confronted with non-dict values which already exist overwrite them. This poses a potential problem where two node-data JSONs which (e.g.) define different `annotations['nuc']` coordinates will not raise any error and the output coodinates are dependent on the order the node-data JSONs were provided to `augur export v2`. Closes #881.

jameshadfield · 2022-06-01T06:13:34Z

#961 will close the original bug identified in this PR, but it won't address the gff + augur translate bug - let's use #953 to track that one.

For a detailed write-up of the bug which motivated this commit, see #881. By storing the (nucleotide) genome annotation in the node-data produced from augur-ancestral we make this information available for export. Previously this information was only exported by `augur translate` which was problematic for workflows which didn't perform translation. No changes are needed to `augur export v2` (which may now process multiple "annotations" blocks) due to the behavior of `NodeData.deep_add_or_update` which will recurse into dicts in annotation blocks and when confronted with non-dict values which already exist overwrite them. This poses a potential problem where two node-data JSONs which (e.g.) define different `annotations['nuc']` coordinates will not raise any error and the output coodinates are dependent on the order the node-data JSONs were provided to `augur export v2`. Closes #881.

For a detailed write-up of the bug which motivated this commit, see nextstrain#881. By storing the (nucleotide) genome annotation in the node-data produced from augur-ancestral we make this information available for export. Previously this information was only exported by `augur translate` which was problematic for workflows which didn't perform translation. No changes are needed to `augur export v2` (which may now process multiple "annotations" blocks) due to the behavior of `NodeData.deep_add_or_update` which will recurse into dicts in annotation blocks and when confronted with non-dict values which already exist overwrite them. This poses a potential problem where two node-data JSONs which (e.g.) define different `annotations['nuc']` coordinates will not raise any error and the output coodinates are dependent on the order the node-data JSONs were provided to `augur export v2`. Closes nextstrain#881.

jameshadfield added the bug Something isn't working label Apr 3, 2022

j23414 added this to Nextstrain planning (archived) Apr 4, 2022

j23414 moved this to New in Nextstrain planning (archived) Apr 4, 2022

victorlin moved this from New to Backlog in Nextstrain planning (archived) Apr 13, 2022

This was referenced May 25, 2022

fix: allows nuc annotation to be pulled in through .gff #950

Closed

Validate annotations produced from ancestral + translate #951

Open

BUG: augur translate not faithful to gff3 standard, can't annotate nucs, which are required. #953

Closed

jameshadfield mentioned this issue Jun 1, 2022

genome annotation improvements #961

Merged

jameshadfield closed this as completed in #961 Jun 22, 2022

Repository owner moved this from Backlog to Done in Nextstrain planning (archived) Jun 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Entropy panel unavailable if mutations aren't translated #881

Entropy panel unavailable if mutations aren't translated #881

jameshadfield commented Apr 3, 2022

corneliusroemer commented May 25, 2022

corneliusroemer commented May 25, 2022

corneliusroemer commented May 25, 2022

jameshadfield commented Jun 1, 2022 •

edited

Loading

Entropy panel unavailable if mutations aren't translated #881

Entropy panel unavailable if mutations aren't translated #881

Comments

jameshadfield commented Apr 3, 2022

Current Behaviour

Expected behavior

Possible solution

corneliusroemer commented May 25, 2022

corneliusroemer commented May 25, 2022

corneliusroemer commented May 25, 2022

jameshadfield commented Jun 1, 2022 • edited Loading

jameshadfield commented Jun 1, 2022 •

edited

Loading