-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Running WeightedRoundRobinLoadBalancerTesis.pickByWeight_avgWeight_zeroCpuUtilization_withEps_customErrorUtilizationPenalty on my laptop once took 14.969s. That's surprising for this type of test. I'd have expected a max of a ~1s. It seems the static stride isn't quite right and we have to loop over all the entries many times. The worst-case for static stride should be looping through all addresses once. If we are doing it more than that, something is wrong.
It appears the test was the first to run, so it will be slower than other tests because of class loading at the like. But I wouldn't expect 15 seconds slow.
I did a quick hack to see if it was iterating too many times, and it was iterating way too many times.
diff --git a/xds/src/main/java/io/grpc/xds/WeightedRoundRobinLoadBalancer.java b/xds/src/main/java/io/grpc/xds/WeightedRoundRobinLoadBalancer.java
index d5d8c4d9e..792aa5450 100644
--- a/xds/src/main/java/io/grpc/xds/WeightedRoundRobinLoadBalancer.java
+++ b/xds/src/main/java/io/grpc/xds/WeightedRoundRobinLoadBalancer.java
@@ -433,7 +433,9 @@ final class WeightedRoundRobinLoadBalancer extends RoundRobinLoadBalancer {
* an offset that varies per backend index is also included to the calculation.
*/
int pick() {
+ int i = 0;
while (true) {
+ i++;
long sequence = this.nextSequence();
int backendIndex = (int) (sequence % this.sizeDivisor);
long generation = sequence / this.sizeDivisor;
@@ -442,6 +444,8 @@ final class WeightedRoundRobinLoadBalancer extends RoundRobinLoadBalancer {
if ((weight * generation + offset) % K_MAX_WEIGHT < K_MAX_WEIGHT - weight) {
continue;
}
+ if (i > 2*scaledWeights.length)
+ throw new RuntimeException(String.format("%d > 2*%d\n", i, scaledWeights.length));
return backendIndex;
}
}io.grpc.xds.WeightedRoundRobinLoadBalancerTest > pickByWeight_avgWeight_zeroCpuUtilization_withEps_customErrorUtilizationPenalty FAILED
java.lang.RuntimeException: 98300 > 2*3
I wonder if we have an off-by-one caused by rounding, or some such.
CC @YifeiZhuang
I noticed because TSAN timed out after ~4 minutes. TSAN will be slow, but not 4 minutes slow.
Starting full thread dump ...
"main" Id=1 RUNNABLE
at [email protected]/jdk.internal.misc.Unsafe.getIntVolatile(Native Method)
at [email protected]/jdk.internal.misc.Unsafe.getAndAddInt(Unsafe.java:2343)
at [email protected]/java.util.concurrent.atomic.AtomicInteger.getAndIncrement(AtomicInteger.java:182)
at app//io.grpc.xds.WeightedRoundRobinLoadBalancer$StaticStrideScheduler.nextSequence(WeightedRoundRobinLoadBalancer.java:397)
at app//io.grpc.xds.WeightedRoundRobinLoadBalancer$StaticStrideScheduler.pick(WeightedRoundRobinLoadBalancer.java:437)
at app//io.grpc.xds.WeightedRoundRobinLoadBalancer$WeightedRoundRobinPicker.pickSubchannel(WeightedRoundRobinLoadBalancer.java:279)
at app//io.grpc.xds.WeightedRoundRobinLoadBalancerTest.pickByWeight(WeightedRoundRobinLoadBalancerTest.java:328)
at app//io.grpc.xds.WeightedRoundRobinLoadBalancerTest.pickByWeight_avgWeight_zeroCpuUtilization_withEps_customErrorUtilizationPenalty(WeightedRoundRobinLoadBalancerTest.java:463)