Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference sequence not removed when calling augur align #454

Closed
trvrb opened this issue Mar 15, 2020 · 5 comments
Closed

Reference sequence not removed when calling augur align #454

trvrb opened this issue Mar 15, 2020 · 5 comments
Labels
bug Something isn't working needs triage Needs triage by a Nextstrain team member

Comments

@trvrb
Copy link
Member

trvrb commented Mar 15, 2020

This should be able to be replicated by running snakemake -p results/aligned.fasta in https://github.com/nextstrain/ncov.

This calls augur align with --remove-reference. However, reference MN908947 still appears in output.

This is also replicated by running the same set of commands from https://github.com/nextstrain/zika, wherein PF13/251013_18 still appears in results/aligned.fasta.

@trvrb trvrb added the bug Something isn't working label Mar 15, 2020
@trvrb trvrb changed the title Reference sequence is not removed when calling augur align --remove-reference Reference sequence not removed when calling augur align Mar 15, 2020
@huddlej huddlej added good first issue A relatively isolated issue appropriate for first-time contributors please take this issue Extra attention is needed labels Mar 15, 2020
@jameshadfield jameshadfield removed the please take this issue Extra attention is needed label Mar 17, 2020
@godotgildor
Copy link

I'm having trouble reproducing this using the Zika example.

If I run:
snakemake -p results/aligned.fasta --cores
in the zika repository, looking in the results/aligned.fasta I do not see an entry for PF13/251013-18.

If I edit the Snakefile to remove the --remove-reference and then run
snakemake -p results/aligned.fasta --cores
the resulting results/aligned.fasta includes the entry:
>KX369547.1 Zika virus strain PF13/251013-18, complete genome

Did I miss something?

@huddlej
Copy link
Contributor

huddlej commented Mar 20, 2020

@godotgildor The Zika issue only occurs with the full Zika build and not with the tutorial build. See the complete description of this issue for more details.

huddlej pushed a commit that referenced this issue Mar 24, 2020
Fixes a previously missed issue related to #454 where the reference was not
removed when no gaps were found relative to the reference.
@tolot27
Copy link
Contributor

tolot27 commented Mar 24, 2020

@godotgildor Please can you rerun your test with the most recent version? The issue should be fixed now.

@huddlej huddlej added the needs triage Needs triage by a Nextstrain team member label Mar 25, 2020
@huddlej huddlej removed the good first issue A relatively isolated issue appropriate for first-time contributors label Mar 25, 2020
@huddlej
Copy link
Contributor

huddlej commented Mar 25, 2020

The solution to Trevor's original issue is partially data-dependent. A fix that would help from the augur side, though, is to change the name field used for reference sequences from id to name. This would be a breaking change in the sense that any build out in the world that relies on the GenBank record's id instead of name would break. However, this change would also avoid the kind of issue Trevor described and also allow users to provide FASTA references instead of GenBank files in the future.

I'm labeling this as an issue to discuss within the Nextstrain team for now. It doesn't need any further development until then.

@trvrb
Copy link
Member Author

trvrb commented Jan 22, 2021

I'm going to close this immediate issue as the real issue is as @huddlej says one of naming of Genbank reference sequences. I don't think high priority.

@trvrb trvrb closed this as completed Jan 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Needs triage by a Nextstrain team member
Projects
None yet
Development

No branches or pull requests

5 participants