Add round robin policy for the cc > 3#8263
Add round robin policy for the cc > 3#8263knative-prow-robot merged 2 commits intoknative:masterfrom
Conversation
Add a round robin policy for the CC > 3. This is the simply one, i.e not the least loaded. I can iterate over that, but the preliminary results look quite good as is.
Benchmark results |
|
Somehow choice2 is the slowest. 🤷 Might need a better RNG. |
|
/cc @mattmoor |
| revBreaker = newInfiniteBreaker(logger) | ||
| lbp = randomChoice2Policy | ||
| } else { | ||
| case containerConcurrency <= 3: |
There was a problem hiding this comment.
Do we have benchmarks that warrant this switch? I.e. have we tried round robin for the lower CC values?
There was a problem hiding this comment.
well the benchmark above shows that full first is better even on code level. Given that pods might be shared (if we don't divide evenly) in case of lower cc we prefer to use the pods at the tail less, since they might be shared, causing queueing.
pkg/activator/net/lb_policy.go
Outdated
| rrp := roundRobinPolicyT{} | ||
| return func(ctx context.Context, targets []*podTracker) (func(), *podTracker) { | ||
| rrp.mu.Lock() | ||
| defer rrp.mu.Unlock() |
There was a problem hiding this comment.
Can we get away without locking? We could try to get away using atomics potentially, though the benchmarks don't really warrant that I suppose.
There was a problem hiding this comment.
I tried, I was not happy with provable semantics for parallel requests.
There was a problem hiding this comment.
Fair enough, we can iterate if necessary 🤷
There was a problem hiding this comment.
the problem, is that we start moving indices either independently or in interleaving fashion. In theory with enough requests it will still average out, but it's much harder to reason about
|
This is ready |
|
I am gonna run the tests with cc=100 as well |
|
The following is the coverage report on the affected files.
|
|
https://mako.dev/run?run_key=6272698575486976&~act=1&~ac=1 — some random spike, but otherwise looks reasonable |
|
The following jobs failed:
Failed non-flaky tests preventing automatic retry of pull-knative-serving-autotls-tests: |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: markusthoemmes, vagababov The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm |
|
/retest |
As of knative/serving#8263 the activator no longer has this behaviour when in the path, and instead does nice round-robin load balancing across the replicas so this warning is no longer needed.
As of knative/serving#8263 the activator no longer has this behaviour when in the path, and instead does nice round-robin load balancing across the replicas so this warning is no longer needed.
Add a round robin policy for the CC > 3.
This is the simply one, i.e not the least loaded. I can iterate over that,
but the preliminary results look quite good as is.
Sample runs with the new policy:
https://mako.dev/run?run_key=5913306583793664&~act=1&scatter=1
With the old:
https://mako.dev/run?run_key=4860661639151616&~act=1&scatter=1
(there's some spurious spike)
/assign @julz @markusthoemmes
For #7664