-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Count insertion/deletion events once in pairwise distances #698
Conversation
@benjaminotter No worries if you're too busy to check this out, but I thought you might be interested in this Augur module that calculates pairwise distances between sequences. If you find this interesting, you may want to take a shot at issue #693 which adds a similar kind of functionality to |
Codecov Report
@@ Coverage Diff @@
## master #698 +/- ##
==========================================
+ Coverage 31.28% 31.51% +0.23%
==========================================
Files 41 41
Lines 5655 5674 +19
Branches 1367 1373 +6
==========================================
+ Hits 1769 1788 +19
Misses 3812 3812
Partials 74 74
Continue to review full report at Codecov.
|
@benjaminotter Would you be up for trying to resolve the merge conflicts in this PR? The code this PR modifies overlaps with the code you added to support ignored characters. Tests already exist, so when you get the conflict resolved, you can do the usual |
@huddlej, the latest commit resolves the merge conflicts. I used the github interface to do so, which merged |
Counts insertion/deletion (indel) events between sequences as single events instead of counting each gap character independently. Adds logic to support user-defined weights for gaps between specific characters, aggregating weights across all potential mismatches in a sequence-specific distance map, and aggregating weights across all sites in an indel event.
Adds a test and code for the test to handle an edge case where the default weight is greater than any of the sequence-specific weights and the event is an indel.
e392e09
to
00fe015
Compare
Thank you for making these changes and testing them out, @benjaminotter! I'm sorry I forgot to describe the process we use to resolve these types of conflicts. We generally avoid merge commits from Your edits here were very helpful in confirming the changes that needed to be made, though, and I've used them to perform a rebase onto master and reconfirm that tests passed locally. I'll merge this once it completes the CI. If you are interested in learning more about how we use git rebase, the git book has a nice chapter on rewriting history and we can also have a quick video chat where I can demo an example of how I use rebase. I didn't get any training in this area of software development during my computer science degree, but it's been super helpful to learn and empowers you to do all kinds of quick git surgery that you'd normally avoid. |
Description of proposed changes
Counts insertion/deletion (indel) events between sequences as single events instead of counting each gap character independently. Adds logic to support user-defined weights for gaps between specific characters, aggregating weights across all potential mismatches in a sequence-specific distance map, and aggregating weights across all sites in an indel event.
Related issue(s)
Fixes #692
Testing
Adds doctests for the new expected behavior.