Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing groups for included strains #1000

Merged
merged 1 commit into from
Jul 6, 2022
Merged

Conversation

huddlej
Copy link
Contributor

@huddlej huddlej commented Jul 6, 2022

Description of proposed changes

Fixes a subtle bug when grouping with subsampled max sequences and
force-included strains that could potentially be considered for
subsampling. In the first pass through the metadata, we remove
force-included strains from consideration for subsampling. However, we
do not similarly remove these strains in the second pass through the
metadata. If these strains have not been filtered earlier, we consider
them for subsampling in the second pass and look for a queue with their
group-by attributes. Since queue creation depends on the group
attributes we find in the first pass, it is possible for the group
attributes of the force-included strains to not have a queue.

This commit adds a functional test to recreate the bug (originally
discovered in the ncov workflow) and updates the logic in the second
pass to remove force-included strains prior to building subsampling
groups.

Related issues

Testing

  • Adds a functional test for the issue
  • Test by CI

@codecov
Copy link

codecov bot commented Jul 6, 2022

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 59.24%. Comparing base (0d06558) to head (bee9375).
Report is 1316 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1000   +/-   ##
=======================================
  Coverage   59.24%   59.24%           
=======================================
  Files          50       50           
  Lines        6266     6267    +1     
  Branches     1588     1588           
=======================================
+ Hits         3712     3713    +1     
  Misses       2292     2292           
  Partials      262      262           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@huddlej huddlej requested a review from victorlin July 6, 2022 18:19
Fixes a subtle bug when grouping with subsampled max sequences and
force-included strains that could potentially be considered for
subsampling. In the first pass through the metadata, we remove
force-included strains from consideration for subsampling. However, we
do not similarly remove these strains in the second pass through the
metadata. If these strains have not been filtered earlier, we consider
them for subsampling in the second pass and look for a queue with their
group-by attributes. Since queue creation depends on the group
attributes we find in the first pass, it is possible for the group
attributes of the force-included strains to not have a queue.

This commit adds a functional test to recreate the bug (originally
discovered in the ncov workflow) and updates the logic in the second
pass to remove force-included strains prior to building subsampling
groups.
@huddlej huddlej force-pushed the fix-missing-group branch from 22b1685 to bee9375 Compare July 6, 2022 20:03
@huddlej huddlej added this to the Patch release 16.0.3 milestone Jul 6, 2022
@huddlej huddlej added the bug Something isn't working label Jul 6, 2022
@huddlej huddlej merged commit f85df58 into master Jul 6, 2022
@huddlej huddlej deleted the fix-missing-group branch July 6, 2022 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

2 participants