-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too many publications associated with a single author breaks augur export #571
Comments
@cmloreth and I have seen this recently too |
How about a simple appending of
|
Thanks, @trvrb! Your suggestion sounds good. I'll try this out. For my own context, the reason why the numeric disambiguation is not preferred is partially because the authors are listed in auspice with the number of strains displayed in the tree like so:
If we used numeric disambiguation, these authors would be listed like:
which is much more confusing. |
Adds a function to convert a zero-indexed article count for a given author to a alphabetical disambiguation suffix. For example, two articles by Bedford et al. get disambiguated as "Bedford et al. A" and "Bedford et al. B". The new function is slightly overkill in that it supports an infinite number of articles per author. We could easily unroll the first two loops into redundant code that only works for a fixed number of articles (702), but the idea here is that we never have to touch this function again. Fixes #571
Current Behavior
Running augur export v2 with recent ncov data can produce a case when an author has more than associated 52 publications. In this case, the index of the author tuple (see traceback below) exceeds the letters provided in the alphabet string.
Expected behavior
The above error should not happen.
How to reproduce
Download the attached data in ncov_issue.zip and run the following command with augur version 8.0.0 (the latest).
Possible solution
The solution should ideally be future-proof enough to not break in a data-dependent manner.
Context
The current author disambiguation approach was introduced in f0e4b1c. The implementation previous to this appears to have used the numeric approach I suggest above, so maybe the GISAID id or URL approach are better?
The text was updated successfully, but these errors were encountered: