Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

augur/clades.py: change clade assignment from mutations to sequences … #288

Merged
merged 4 commits into from
May 25, 2019

Conversation

rneher
Copy link
Member

@rneher rneher commented May 23, 2019

…to avoid unassigned nodes when reference sequence with respect to which mutations are defined is not part of the build. This still needs to be checked with vcf (cc @emmahodcroft ). there are also now obsolete parts that could be deleted (such as get_node_alleles)

rneher added 2 commits May 23, 2019 17:25
…to avoid unassigned nodes when reference sequence with respect to which mutations are defined is not part of the build
…imilar patterns between fasta and vcf workflows. still needs testing
@trvrb
Copy link
Member

trvrb commented May 23, 2019

Thanks for this @rneher. I had thought that it wasn't safe to assume that the root node would be populated with a full nucleotide sequence and aa_sequences. But I see you can also input these separate as --reference. I have to admit that I'm now a bit confused about all the different ways we're passing around root sequence vs reference sequence vs JSON vs FASTA. I can try to gain clarity.

@rneher
Copy link
Member Author

rneher commented May 23, 2019

what I had pushed earlier worked for flu, but would have broken vcf workflows. the latest commit is not going to work, but is my attempt to unify them. I'll have to test and fix this tomorrow (somehow the ssh access to the cluster doesn't work).

@rneher
Copy link
Member Author

rneher commented May 23, 2019

My current thinking was to implement something that

  • without a reference/root sequence works (or not) as before
  • if the nt-muts/aa-muts jsons have sequences, it will use those to disambiguate states that not diverse
  • if a reference is passed in for vcf, it'll use those instead. this is not yet implemented
    This can all be achieved by re-interpreting the mutation dictionaries as sequence look-ups that fall back to the reference if no mutation has been observed at a particular position.

…workflows were the optionally pull in the sequence of the root.
@rneher
Copy link
Member Author

rneher commented May 24, 2019

this should work now.

@rneher rneher merged commit 2f4b86e into master May 25, 2019
@rneher rneher deleted the fix_clade_assignment branch May 25, 2019 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants