implemented and tested static stride scheduler for weighted round robin load balancing policy #10272

tonyjongyoonan · 2023-06-09T22:01:09Z

The core of the Weighted Round Robin load balancer policy on the client side is a stride scheduler originally implemented by an EDFScheduler. However, the mutex lock required by the EDFScheduler has been a frequent source of thread contention at high request rates and a block on other cost saving efforts.

The Static Stride Scheduler is a generator of a practically equivalent sequence of picks as the current EDFScheduler. It removes the need for a priority queue (and thus a lock) and improves latency at high request rates.

Weighted Round Robin LB Policy

go/static-stride-scheduler

linux-foundation-easycla · 2023-06-09T22:01:13Z

The committers listed above are authorized under a signed CLA.

✅ login: tonyjongyoonan / name: Tony An (b5cf7b0, 44a5158, 32973d4, 4bde79d, d99d51a, 13a9fd8, 3d9e625, 2b054d3, 4addda3, 4a820a7, dc7960a, acd3425, ba1a0b7, 9f4a60d, e11e542, c76cbe5, 903d2ac, d5a0629, 5982115, 88a8e48, f9cae20, dba0778, 46a463e, 142b499, 5523337, 365ba8d, 442bea8, 2a0e489, 1731862, 5e5127f, f0421d2, ec46fe3, 8ddf284, 347b46f, 4a4762e, f526bf6, 21ceb85, 03de3f9, e2eb7f9, 4072907, f749e52, be21c23)

ejona86 · 2023-06-09T22:13:54Z

You can continue pushing to your branch and update the PR as you go.

xds/src/main/java/io/grpc/xds/WeightedRoundRobinLoadBalancer.java

ejona86 · 2023-06-12T15:40:25Z

xds/src/main/java/io/grpc/xds/WeightedRoundRobinLoadBalancer.java

+  static final class StaticStrideScheduler {
+    private Vector<Long> scaledWeights;
+    private int sizeDivisor;
+    private long sequence;


This should be an AtomicInteger. If you haven't seen https://github.com/grpc/proposal/blob/master/A58-client-side-weighted-round-robin-lb-policy.md#earliest-deadline-first-edf-scheduler , there's a few things there. We don't have to do the identical approach, but we'd avoid doing things differently when unnecessary.

lambda () -> uint32 next_seq_fn is "a function you call to get an int32. The equivalent of that in Java is AtomicInteger.getAndIncrement(). One important part of that function in the design is that it is shared across scheduler instances. If you re-create the scheduler for the same input data, you get the same results, so the code doesn't have to worry about poor weighting if it is re-created frequently. Since we didn't have that sort of state previously, the old scheduler had to randomize itself.

xds/src/main/java/io/grpc/xds/WeightedRoundRobinLoadBalancer.java

YifeiZhuang · 2023-06-15T19:12:48Z

xds/src/main/java/io/grpc/xds/WeightedRoundRobinLoadBalancer.java

+    }
+
+    private int nextSequence() {
+      return this.sequence.getAndUpdate(seq -> ((seq + 1) % UINT32_MAX));


After the seq reaches Integer.MAX_VALUE , it rolls over and this becomes -1. The seq should be non negative.

The staticScheduler lives in the picker, and there might be multiple instances of picker in the system, so ideally the seq should be global among all the pickers. Alternatively, we need to randomize the weight during the static scheduler construction time.

Should be fixed.

Since you are storing 32 bits, instead of AtomicLong, use AtomicInteger. Then, you just need to cast up to long as you read it out. That can be done with Integer.toUnsignedLong() or seq & UINT32_MAX. Masking instead of modulus makes it easier to avoid negative results. Then, instead of getAndUpdate(), just use getAndIncrement(). So this becomes:

private final AtomicInteger sequence; private long nextSequence() { return Integer.toUnsignedLong(sequence.getAndIncrement()); }

Note that I am paying attention to & UINT32_MAX being different from % UINT32_MAX. But we don't care about the difference here.

I was under the impression that AtomicInteger was signed, which would mean that it cannot support the max value for an unsigned 32 bit integer. Is this not the case?

It still has 32 bits and addition (the increment) is the same for signed and unsigned numbers. So the bits are what we'd hope they would be, but Java just won't treat them right in math. When casting to a larger integer size, then only difference between signed and unsigned is whether you sign extend: whether you fill the new bits on the left with 0s or 1s. For unsigned, you always use 0s. For signed, you copy the sign bit to the upper bits. So if you want unsigned conversion but only have signed, you do a signed conversion and then force the top bits to be 0s with the bitwise AND.

We could use AtomicLong and mask out the lower bits, but if we are masking the bits then AtomicInteger is just as good.

xds/src/main/java/io/grpc/xds/WeightedRoundRobinLoadBalancer.java

YifeiZhuang · 2023-06-26T18:32:39Z

xds/src/main/java/io/grpc/xds/WeightedRoundRobinLoadBalancer.java

+      // scales weights s.t. max(weights) == K_MAX_WEIGHT, meanWeight is scaled accordingly
+      int[] scaledWeights = new int[numChannels];
+      for (int i = 0; i < numChannels; i++) {
+        if (weights[i] < 0.0001) {


similarly this should be compared to 0

xds/src/main/java/io/grpc/xds/WeightedRoundRobinLoadBalancer.java

xds/src/test/java/io/grpc/xds/WeightedRoundRobinLoadBalancerTest.java

xds/src/main/java/io/grpc/xds/WeightedRoundRobinLoadBalancer.java

ejona86 · 2023-06-27T14:58:31Z

xds/src/main/java/io/grpc/xds/WeightedRoundRobinLoadBalancer.java

+    private final AtomicInteger sequence;
+    private static final int K_MAX_WEIGHT = 0xFFFF;
+
+    StaticStrideScheduler(float[] weights, Random random) {


We can do it in a later PR (by you or someone else), but we will want to replace this Random with AtomicInteger sequence. The only reason to have the random now is because we couldn't carry state between re-creations of the scheduler. But the only mutable state now is the integer, so we can follow the gRFC.

xds/src/main/java/io/grpc/xds/WeightedRoundRobinLoadBalancer.java

YifeiZhuang · 2023-06-29T00:03:34Z

xds/src/main/java/io/grpc/xds/WeightedRoundRobinLoadBalancer.java

+   * in which each object is chosen periodically with frequency proportional to its weight.
+   * <p>
+   * Specifically, each backend is given a deadline equal to the multiplicative inverse of 
+   * its weight. The place of each backend in its deadline is tracked, and each call to


This (L335-L337) is describing the priority queue implementation of EDF and is no longer the case here, but sounds like it is how current implementation is.

xds/src/main/java/io/grpc/xds/WeightedRoundRobinLoadBalancer.java

YifeiZhuang

LGTM.
Please get @ejona86 approval then merge.
Please consult @temawi to watch for interop test dashboard after merge.

ejona86 · 2023-07-05T16:41:51Z

xds/src/main/java/io/grpc/xds/WeightedRoundRobinLoadBalancer.java

-   * For example, if items A and B are added with weights 0.5 and 0.2, successive chooses return:
+  /*
+   * The Static Stride Scheduler is an implementation of an earliest deadline first (EDF) scheduler
+   * in which each object is chosen periodically with frequency proportional to its weight.


I see why some of that stuff was removed, but the comment now has the same problem an earlier version of this code has: it name-drops EDF without actually saying how it is mapped to a scheduling problem. Let's just replace this line with "in which each object's deadline is the multiplicative inverse of the object's weight." I think that is the most important part of the mapping.

ejona86 · 2023-07-05T16:44:37Z

xds/src/main/java/io/grpc/xds/WeightedRoundRobinLoadBalancer.java


-    /**
-     * Picks the next WRR object.
+    long getSequence() {


Annotate with com.google.common.annotations.VisibleForTesting

implemented static stride scheduler class/algorithm (currently unused)

b5cf7b0

go/static-stride-scheduler

tonyjongyoonan requested a review from YifeiZhuang June 9, 2023 22:01

tonyjongyoonan closed this Jun 9, 2023

ejona86 reviewed Jun 12, 2023

View reviewed changes

tonyjongyoonan reopened this Jun 12, 2023

tonyjongyoonan added 8 commits June 12, 2023 13:25

added imports, fixed small errors

44a5158

changed to array, final class vars, atomic integer

32973d4

added static stride scheduler test cases

4bde79d

fixing test case datatypes

d99d51a

added check argument for edge case: no weights inputted

13a9fd8

replaced edf scheduler with static stride scheduler

3d9e625

added test case

2b054d3

fixed style errors, renamed schedulers

4addda3

tonyjongyoonan changed the title ~~implemented static stride scheduler class/algorithm~~ DRAFT: implemented static stride scheduler class/algorithm Jun 12, 2023

tonyjongyoonan added 12 commits June 12, 2023 16:17

quick fix

4a820a7

bug fix attempt

dc7960a

fixed verbose work in updateWeightSS(), float equality

acd3425

fixed pickChannel() by removing kOffset, fixed atomic integer

ba1a0b7

added example test cases

9f4a60d

types changed from long to int

e11e542

added sss test cases

c76cbe5

added more edge cases (negative/zero weights)

903d2ac

compile fix

d5a0629

more than 3 channels test case

5982115

fixed negative case

88a8e48

end to end test cases

f9cae20

YifeiZhuang reviewed Jun 15, 2023

View reviewed changes

fixed sequence

dba0778

YifeiZhuang reviewed Jun 22, 2023

View reviewed changes

xds/src/main/java/io/grpc/xds/WeightedRoundRobinLoadBalancer.java Outdated Show resolved Hide resolved

xds/src/main/java/io/grpc/xds/WeightedRoundRobinLoadBalancer.java Outdated Show resolved Hide resolved

tonyjongyoonan linked an issue Jun 23, 2023 that may be closed by this pull request

Implement static stride in weighted round robin #10180

Closed

tonyjongyoonan assigned tonyjongyoonan and unassigned tonyjongyoonan Jun 23, 2023

tonyjongyoonan added the enhancement label Jun 23, 2023

tonyjongyoonan changed the title ~~implemented and tested static stride scheduler class/algorithm~~ implemented and tested static stride scheduler for weighted round robin load balancing policy Jun 23, 2023

tonyjongyoonan added 3 commits June 23, 2023 10:49

fixed sequence, changed long --> int, added comments

ec46fe3

fixed sequence, changed types, added comments

8ddf284

fixing weird commit 1

347b46f

tonyjongyoonan force-pushed the static-stride-scheduler branch from a232101 to 347b46f Compare June 23, 2023 21:20

tonyjongyoonan and others added 3 commits June 23, 2023 14:26

more syntax changes

4a4762e

Delete settings.json

f526bf6

modified test

21ceb85

tonyjongyoonan added the kokoro:force-run Add this label to a PR to tell Kokoro to re-run all tests. Not generally necessary label Jun 23, 2023

grpc-kokoro removed the kokoro:force-run Add this label to a PR to tell Kokoro to re-run all tests. Not generally necessary label Jun 23, 2023

YifeiZhuang reviewed Jun 26, 2023

View reviewed changes

fixed feedback

03de3f9

ejona86 reviewed Jun 27, 2023

View reviewed changes

changed weights to short from int

e2eb7f9

YifeiZhuang reviewed Jun 29, 2023

View reviewed changes

fixed message and short logic

4072907

YifeiZhuang approved these changes Jul 1, 2023

View reviewed changes

tonyjongyoonan requested a review from ejona86 July 1, 2023 06:59

removed obselete comment

f749e52

ejona86 approved these changes Jul 5, 2023

View reviewed changes

changed annotations

be21c23

tonyjongyoonan merged commit 0b53dd7 into grpc:master Jul 6, 2023

tonyjongyoonan deleted the static-stride-scheduler branch July 6, 2023 17:03

tonyjongyoonan mentioned this pull request Aug 7, 2023

xds: replace random with atomic sequence in WRR #10458

Merged

github-actions bot locked as resolved and limited conversation to collaborators Nov 3, 2023

implemented and tested static stride scheduler for weighted round robin load balancing policy #10272

implemented and tested static stride scheduler for weighted round robin load balancing policy #10272

Uh oh!

Conversation

tonyjongyoonan commented Jun 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linux-foundation-easycla bot commented Jun 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ejona86 commented Jun 9, 2023

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

YifeiZhuang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tonyjongyoonan commented Jun 9, 2023 •

edited

Loading

linux-foundation-easycla bot commented Jun 9, 2023 •

edited

Loading