Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix weighted frequencies when weight keys are unrepresented #420

Merged
merged 1 commit into from
Dec 11, 2019

Conversation

trvrb
Copy link
Member

@trvrb trvrb commented Dec 11, 2019

This is a rebase of @huddlej's PR #418 onto the new v6 master branch.

Fixes a bug in weighted KDE estimation of tree frequencies when one or more weight attributes is not represented by any tips on the tree. For example, when a tree is missing any samples from Africa but the frequencies are being weighted by region, the total of the estimated frequencies will not sum to 1.

This PR adds unit tests to confirm the original issue and adds logic to KDE frequency estimation to correct the issue. This PR also updates the Travis CI config to source the canonical Nextstrain environment.

Closes #268.

@trvrb
Copy link
Member Author

trvrb commented Dec 11, 2019

This works well for me. Thanks @huddlej!

@trvrb trvrb merged commit a456cba into master Dec 11, 2019
@trvrb trvrb deleted the fix-weighted-frequencies-rebase branch December 11, 2019 05:50
huddlej added a commit to nextstrain/seasonal-flu that referenced this pull request Dec 16, 2019
Resolves an issue with KDE frequency weighting where the regions used to weight
frequencies were represented differently in the metadata and the weights
JSON. For example, the metadata contains prettified regions like "North America"
while the weights JSON referred to this region as "north america". This issue
was revealed after introducing code to augur frequencies that eliminated all
unrepresented weight attributes from consideration. When no weights were
represented by the given tips (because of the casing difference shown above),
weighted frequency estimation failed with a ValueError.

[1] nextstrain/augur#420
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Weighted KDE frequencies do not sum to 1 when representative weight attributes are missing
1 participant