Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flexibly disambiguate multiple publications by the same author #581

Merged
merged 1 commit into from
Jun 27, 2020

Conversation

huddlej
Copy link
Contributor

@huddlej huddlej commented Jun 27, 2020

Adds a function to convert a zero-indexed article count for a given author to a
alphabetical disambiguation suffix. For example, two articles by Bedford et
al. get disambiguated as "Bedford et al. A" and "Bedford et al. B".

The new function is slightly overkill in that it supports an infinite number of
articles per author. We could easily unroll the first two loops into redundant
code that only works for a fixed number of articles (702), but the idea here is
that we never have to touch this function again.

One disadvantage of the alphabetical code is that articles with "AA" get sorted
next to "A" and before "B". A real world example is shown below where the author
had 85 (!) different publications.

image

Fixes #571

Adds a function to convert a zero-indexed article count for a given author to a
alphabetical disambiguation suffix. For example, two articles by Bedford et
al. get disambiguated as "Bedford et al. A" and "Bedford et al. B".

The new function is slightly overkill in that it supports an infinite number of
articles per author. We could easily unroll the first two loops into redundant
code that only works for a fixed number of articles (702), but the idea here is
that we never have to touch this function again.

Fixes #571
@huddlej huddlej requested a review from trvrb June 27, 2020 02:54
@codecov
Copy link

codecov bot commented Jun 27, 2020

Codecov Report

Merging #581 into master will increase coverage by 0.16%.
The diff coverage is 86.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #581      +/-   ##
==========================================
+ Coverage   25.10%   25.27%   +0.16%     
==========================================
  Files          33       33              
  Lines        5178     5191      +13     
  Branches     1299     1300       +1     
==========================================
+ Hits         1300     1312      +12     
- Misses       3830     3831       +1     
  Partials       48       48              
Impacted Files Coverage Δ
augur/export_v2.py 10.00% <86.66%> (+1.95%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 08fb068...8d66284. Read the comment docs.

@rneher rneher self-requested a review June 27, 2020 15:07
Copy link
Member

@rneher rneher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, @huddlej. This looks good to me. I don't think we'll run into this A, AA, AB, B, C issue all too often. Having this generic solution is good IMHO.

@huddlej
Copy link
Contributor Author

huddlej commented Jun 27, 2020

Great, thank you, @rneher! Merging now.

@huddlej huddlej merged commit 402796b into master Jun 27, 2020
@huddlej huddlej deleted the fix-author-disambiguation branch June 27, 2020 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Too many publications associated with a single author breaks augur export
2 participants