Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entropy panel unavailable if mutations aren't translated #881

Closed
jameshadfield opened this issue Apr 3, 2022 · 4 comments · Fixed by #961
Closed

Entropy panel unavailable if mutations aren't translated #881

jameshadfield opened this issue Apr 3, 2022 · 4 comments · Fixed by #961
Labels
bug Something isn't working

Comments

@jameshadfield
Copy link
Member

Current Behaviour

For the entropy panel to be displayed in auspice we have three requirements:

  1. JSON.panels includes "entropy"
  2. JSON.meta.genome_annotations exists and includes at least a {nuc: {start: INT, end: INT}} object.
  3. Some mutations to be annotated on branches in the tree.

These typically come from an augur workflow with steps:

(i) augur ancestral (does not create a nodeDataJSON.annotations object)
(ii) augur translate (creates a nodeDataJSON.annotations object)
(iii) augur export (takes care of requirement 1 unless you opt-out, and converts the nodeDataJSON.annotations object to JSON.meta.genome_annotations).

However if you choose not to translate mutations (step ii) then no annotations object is available for export, and thus requirement 2 is not met and the entropy panel is not displayed.

Expected behavior

A pipeline using steps (i, iii) should be valid. In other words, translating mutations should be optional.

Possible solution

The solution is not as simple as just adding an annotations block to the node-data JSON produced by augur ancestral, as augur export v2 assumes that there will only be one of these blocks.

The most consistent would be for

  • augur ancestral creates a nodeDataJSON.annotations object with nuc information
  • augur translate creates a nodeDataJSON.annotations object with per-gene information
  • augur export v2 accepts multiple annotations blocks and combines them, accepting identical duplicate elements and exiting if there are conflicts.
@jameshadfield jameshadfield added the bug Something isn't working label Apr 3, 2022
@victorlin victorlin moved this from New to Backlog in Nextstrain planning (archived) Apr 13, 2022
@corneliusroemer
Copy link
Member

I got extremely confused by this bug. I encountered it as the following error in a workflow that uses augur translate and just has a .gff as input, not a .gb with nuc annotation.

The error is:

Validating schema of 'auspice/monkeypox_global.json'...
        ERROR: 'nuc' is a required property. Trace: properties - meta - properties - genome_annotations - required
Validation of 'auspice/monkeypox_global.json' failed.

------------------------
Validation of auspice/monkeypox_global.json failed. Please check this in a local instance of `auspice`, as it is not expected to display correctly. 
------------------------

Thrown by augur export. I couldn't figure out how to get it to just read nuc from the .gff, which I assumed was possible. Couldn't believe it isn't. But that's the way it seems to be 😬

This is a very serious bug, it really makes using the same genemap for Nextclade and Nextclade reference build of monkeypox impossible which is not good, discrepancies arise like this.

@corneliusroemer
Copy link
Member

Unless the problem described by me is not a real bug, or not identical, feel free to readjust pain score. But based on my use case in addition to the original from @jameshadfield I've upgraded this Crash, blocking and some users. It's crashing my workflow, there's no sane workaround and it cost me a lot of time. I also can imagine this happening to others if they don't use a .gb as input for translate but .gff which is theoretically supported.

@corneliusroemer
Copy link
Member

The workaround for me is to:

  1. edit the line
    limit_info = dict( gff_type = ['gene'] )
    to
limit_info = dict( gff_type = ['gene', 'source', 'nuc'] )
  1. Use the following nuc annotation in the genemap.gff:
MT903344.1	Genbank	source	1	197233	.	+	.	locus_tag=nuc

I'll put in a PR, hope that doesn't break anything. Should I open this as a separate issue @jameshadfield ?

jameshadfield added a commit that referenced this issue Jun 1, 2022
For a detailed write-up of the bug which motivated this commit, see
#881.

By storing the (nucleotide) genome annotation in the node-data produced
from augur-ancestral we make this information available for export.
Previously this information was only exported by `augur translate` which
was problematic for workflows which didn't perform translation.

No changes are needed to `augur export v2` (which may now process
multiple "annotations" blocks) due to the behavior of
`NodeData.deep_add_or_update` which will recurse into dicts in
annotation blocks and when confronted with non-dict values which already
exist overwrite them. This poses a potential problem where two node-data
JSONs which (e.g.) define different `annotations['nuc']` coordinates
will not raise any error and the output coodinates are dependent on
the order the node-data JSONs were provided to `augur export v2`.

Closes #881.
@jameshadfield
Copy link
Member Author

jameshadfield commented Jun 1, 2022

#961 will close the original bug identified in this PR, but it won't address the gff + augur translate bug - let's use #953 to track that one.

jameshadfield added a commit that referenced this issue Jun 21, 2022
For a detailed write-up of the bug which motivated this commit, see
#881.

By storing the (nucleotide) genome annotation in the node-data produced
from augur-ancestral we make this information available for export.
Previously this information was only exported by `augur translate` which
was problematic for workflows which didn't perform translation.

No changes are needed to `augur export v2` (which may now process
multiple "annotations" blocks) due to the behavior of
`NodeData.deep_add_or_update` which will recurse into dicts in
annotation blocks and when confronted with non-dict values which already
exist overwrite them. This poses a potential problem where two node-data
JSONs which (e.g.) define different `annotations['nuc']` coordinates
will not raise any error and the output coodinates are dependent on
the order the node-data JSONs were provided to `augur export v2`.

Closes #881.
Repository owner moved this from Backlog to Done in Nextstrain planning (archived) Jun 22, 2022
victorlin pushed a commit to victorlin/augur that referenced this issue Jun 30, 2022
For a detailed write-up of the bug which motivated this commit, see
nextstrain#881.

By storing the (nucleotide) genome annotation in the node-data produced
from augur-ancestral we make this information available for export.
Previously this information was only exported by `augur translate` which
was problematic for workflows which didn't perform translation.

No changes are needed to `augur export v2` (which may now process
multiple "annotations" blocks) due to the behavior of
`NodeData.deep_add_or_update` which will recurse into dicts in
annotation blocks and when confronted with non-dict values which already
exist overwrite them. This poses a potential problem where two node-data
JSONs which (e.g.) define different `annotations['nuc']` coordinates
will not raise any error and the output coodinates are dependent on
the order the node-data JSONs were provided to `augur export v2`.

Closes nextstrain#881.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

2 participants