Fix missing groups for included strains #1000
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of proposed changes
Fixes a subtle bug when grouping with subsampled max sequences and
force-included strains that could potentially be considered for
subsampling. In the first pass through the metadata, we remove
force-included strains from consideration for subsampling. However, we
do not similarly remove these strains in the second pass through the
metadata. If these strains have not been filtered earlier, we consider
them for subsampling in the second pass and look for a queue with their
group-by attributes. Since queue creation depends on the group
attributes we find in the first pass, it is possible for the group
attributes of the force-included strains to not have a queue.
This commit adds a functional test to recreate the bug (originally
discovered in the ncov workflow) and updates the logic in the second
pass to remove force-included strains prior to building subsampling
groups.
Related issues
Testing