Remove max_count/max_lineage 'voting' logic from usher_parsing #521
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Finally getting around to something I've been meaning to do since #492: removing the logic that overrides usher's tie-breaker logic with the plurality of lineage placements in case of multiple placements in different lineages. For example, usher might find 3 equally parsimony-optimal placements (EPPs), one in BA.5 and two in BA.5.2 -- and initially I thought that would mean it's more likely that the sequence fits in BA.5.2, but with increasing amplicon dropout problems over time, sometimes it simply means that the sequence happens to have Ns in places that allow it to be placed in different parts of BA.5.2 even if it doesn't necessarily have the BA.5.2-defining mutation. The more uncertain the placement is, the more speculative the "voting" is, and the better usher's tie-breaker (which I think favors the branch with more descendants, usually the more basal branch) seems to do.
I tested this on GISAID seqs with IDs in the range EPI_ISL_15340000-15349999 and it behaved as expected, leaving most assignments unchanged but no longer assigning the lineage with the most EPPs in several cases.
@rmcolq feel free to review the changes or not depending on time / interest. I will merge it in a couple days if I don't hear otherwise.
After this is merged, may I tag a pre-release?
If the next pangolin-data release does not include the pangoLEARN *.joblib files then it will require pangolin v4.3, so I think it would be better to release pangolin v4.3 at least a day before the next pangolin-data release (which is still probably at least a week away). I don't anticipate any problems from using pangolin v4.3 with the current release of pangolin-data.