-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
It seems like the static stride scheduler in the WeightedRoundRobinLoadBalancer is flakey/not thread-safe for a different reason as the bug described in #10366 (and fixed in #10370).
In rare cases, WeightedRoundRobinLoadBalancerTest.pickFromOtherThread requires two pass throughs when we have multiple threads, but we should ever only need at most one pass through for a pick. But more importantly, it times out in rare cases.
The issue looks like it's not with scheduler itself but with the testing. It seems like the assert statement keeping track of the iterations is what causes this issue, as removing it solves all instances of the timeouts. I am guessing that the scheduler may not actually require two pass throughs even with multiple threads since the sequence is atomically increased, but that something involving the counting logic is not thread-safe.