Add round robin policy for the cc > 3 by vagababov · Pull Request #8263 · knative/serving

vagababov · 2020-06-09T06:24:10Z

Add a round robin policy for the CC > 3.
This is the simply one, i.e not the least loaded. I can iterate over that,
but the preliminary results look quite good as is.

Sample runs with the new policy:
https://mako.dev/run?run_key=5913306583793664&~act=1&scatter=1
With the old:
https://mako.dev/run?run_key=4860661639151616&~act=1&scatter=1
(there's some spurious spike)

/assign @julz @markusthoemmes
For #7664

Add a round robin policy for the CC > 3. This is the simply one, i.e not the least loaded. I can iterate over that, but the preliminary results look quite good as is.

vagababov · 2020-06-09T06:24:20Z

BenchmarkPolicy/random-power-of-2-choice-1-trackers-sequential-4                17963541                65.1 ns/op            32 B/op          1 allocs/op
BenchmarkPolicy/random-power-of-2-choice-1-trackers-parallel-4                           9266247               129 ns/op              32 B/op          1 allocs/op
BenchmarkPolicy/random-power-of-2-choice-2-trackers-sequential-4                22165728                55.0 ns/op            16 B/op          1 allocs/op
BenchmarkPolicy/random-power-of-2-choice-2-trackers-parallel-4                  13074493                87.8 ns/op            16 B/op          1 allocs/op
BenchmarkPolicy/random-power-of-2-choice-3-trackers-sequential-4                11433304               105 ns/op              16 B/op          1 allocs/op
BenchmarkPolicy/random-power-of-2-choice-3-trackers-parallel-4                   5732670               206 ns/op              16 B/op          1 allocs/op
BenchmarkPolicy/random-power-of-2-choice-10-trackers-sequential-4               10806351               112 ns/op              16 B/op          1 allocs/op
BenchmarkPolicy/random-power-of-2-choice-10-trackers-parallel-4                  5453574               221 ns/op              16 B/op          1 allocs/op
BenchmarkPolicy/random-power-of-2-choice-100-trackers-sequential-4              10693460               112 ns/op              16 B/op          1 allocs/op
BenchmarkPolicy/random-power-of-2-choice-100-trackers-parallel-4                 5213464               238 ns/op              16 B/op          1 allocs/op
BenchmarkPolicy/first-available-1-trackers-sequential-4                         199398207                6.01 ns/op            0 B/op          0 allocs/op
BenchmarkPolicy/first-available-1-trackers-parallel-4                           776109547                1.55 ns/op            0 B/op          0 allocs/op
BenchmarkPolicy/first-available-2-trackers-sequential-4                         199918666                6.01 ns/op            0 B/op          0 allocs/op
BenchmarkPolicy/first-available-2-trackers-parallel-4                           766629129                1.61 ns/op            0 B/op          0 allocs/op
BenchmarkPolicy/first-available-3-trackers-sequential-4                         199665427                6.02 ns/op            0 B/op          0 allocs/op
BenchmarkPolicy/first-available-3-trackers-parallel-4                           686974896                5.75 ns/op            0 B/op          0 allocs/op
BenchmarkPolicy/first-available-10-trackers-sequential-4                        200096355                6.02 ns/op            0 B/op          0 allocs/op
BenchmarkPolicy/first-available-10-trackers-parallel-4                          726126270                1.58 ns/op            0 B/op          0 allocs/op
BenchmarkPolicy/first-available-100-trackers-sequential-4                       198068689                6.02 ns/op            0 B/op          0 allocs/op
BenchmarkPolicy/first-available-100-trackers-parallel-4                         779410185                2.26 ns/op            0 B/op          0 allocs/op
BenchmarkPolicy/round-robin-1-trackers-sequential-4                             30659104                39.0 ns/op             0 B/op          0 allocs/op
BenchmarkPolicy/round-robin-1-trackers-parallel-4                               16495440                74.4 ns/op             0 B/op          0 allocs/op
BenchmarkPolicy/round-robin-2-trackers-sequential-4                             30653740                38.8 ns/op             0 B/op          0 allocs/op
BenchmarkPolicy/round-robin-2-trackers-parallel-4                               16868802                74.6 ns/op             0 B/op          0 allocs/op
BenchmarkPolicy/round-robin-3-trackers-sequential-4                             30827364                38.8 ns/op             0 B/op          0 allocs/op
BenchmarkPolicy/round-robin-3-trackers-parallel-4                               16899624                69.4 ns/op             0 B/op          0 allocs/op
BenchmarkPolicy/round-robin-10-trackers-sequential-4                            30818992                38.9 ns/op             0 B/op          0 allocs/op
BenchmarkPolicy/round-robin-10-trackers-parallel-4                              16370354                73.1 ns/op             0 B/op          0 allocs/op
BenchmarkPolicy/round-robin-100-trackers-sequential-4                           30874807                38.9 ns/op             0 B/op          0 allocs/op
BenchmarkPolicy/round-robin-100-trackers-parallel-4                             19958426                73.4 ns/op             0 B/op          0 allocs/op

Benchmark results

vagababov · 2020-06-09T06:26:22Z

Somehow choice2 is the slowest. 🤷

Might need a better RNG.

vagababov · 2020-06-09T06:26:56Z

/cc @mattmoor

julz

super, super cool. This'll make higher CC cases with activator in path way way nicer. The new graph looks nice.

Couple of test nits but otherwise looks good to me,

pkg/activator/net/lb_policy_test.go

pkg/activator/net/lb_policy.go

markusthoemmes

I like :)

pkg/activator/net/lb_policy.go

markusthoemmes · 2020-06-09T07:20:03Z

pkg/activator/net/throttler.go

 		revBreaker = newInfiniteBreaker(logger)
 		lbp = randomChoice2Policy
-	} else {
+	case containerConcurrency <= 3:


Do we have benchmarks that warrant this switch? I.e. have we tried round robin for the lower CC values?

well the benchmark above shows that full first is better even on code level. Given that pods might be shared (if we don't divide evenly) in case of lower cc we prefer to use the pods at the tail less, since they might be shared, causing queueing.

markusthoemmes · 2020-06-09T07:21:51Z

pkg/activator/net/lb_policy.go

+	rrp := roundRobinPolicyT{}
+	return func(ctx context.Context, targets []*podTracker) (func(), *podTracker) {
+		rrp.mu.Lock()
+		defer rrp.mu.Unlock()


Can we get away without locking? We could try to get away using atomics potentially, though the benchmarks don't really warrant that I suppose.

I tried, I was not happy with provable semantics for parallel requests.

Fair enough, we can iterate if necessary 🤷

the problem, is that we start moving indices either independently or in interleaving fashion. In theory with enough requests it will still average out, but it's much harder to reason about

vagababov · 2020-06-09T15:56:58Z

This is ready

vagababov · 2020-06-09T15:59:15Z

I am gonna run the tests with cc=100 as well

knative-metrics-robot · 2020-06-09T16:02:26Z

The following is the coverage report on the affected files.
Say /test pull-knative-serving-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/activator/net/lb_policy.go	90.5%	94.1%	3.6
pkg/activator/net/throttler.go	91.4%	91.5%	0.1

vagababov · 2020-06-09T16:24:59Z

https://mako.dev/run?run_key=6272698575486976&~act=1&~ac=1 — some random spike, but otherwise looks reasonable

knative-test-reporter-robot · 2020-06-09T16:26:59Z

The following jobs failed:

Test name	Triggers	Retries
pull-knative-serving-autotls-tests		0/3

Failed non-flaky tests preventing automatic retry of pull-knative-serving-autotls-tests:

test/e2e/autotls.TestAutoTLS
test/e2e/autotls.TestAutoTLS/HTTP01

markusthoemmes

/lgtm
/approve

knative-prow-robot · 2020-06-09T16:32:20Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: markusthoemmes, vagababov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~pkg/activator/OWNERS~~ [markusthoemmes,vagababov]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

julz · 2020-06-09T16:33:53Z

/lgtm

vagababov · 2020-06-09T16:41:20Z

/retest

As of knative/serving#8263 the activator no longer has this behaviour when in the path, and instead does nice round-robin load balancing across the replicas so this warning is no longer needed.

Add round robin policy for the cc > 3

2c8b95b

Add a round robin policy for the CC > 3. This is the simply one, i.e not the least loaded. I can iterate over that, but the preliminary results look quite good as is.

knative-prow-robot assigned julz and markusthoemmes Jun 9, 2020

googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Jun 9, 2020

knative-prow-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 9, 2020

knative-prow-robot requested review from markusthoemmes and taragu June 9, 2020 06:24

knative-prow-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. area/autoscale area/networking labels Jun 9, 2020

knative-prow-robot requested a review from mattmoor June 9, 2020 06:26

julz reviewed Jun 9, 2020

View reviewed changes

pkg/activator/net/lb_policy_test.go Outdated Show resolved Hide resolved

pkg/activator/net/lb_policy.go Show resolved Hide resolved

markusthoemmes reviewed Jun 9, 2020

View reviewed changes

more tests

a472149

knative-prow-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 9, 2020

markusthoemmes approved these changes Jun 9, 2020

View reviewed changes

knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 9, 2020

knative-prow-robot merged commit f1b6395 into knative:master Jun 9, 2020

vagababov deleted the 20200608-rr-policy branch June 23, 2020 23:17

julz mentioned this pull request Jul 17, 2020

Remove warning about activator load balancing behaviour knative/docs#2683

Merged

julz mentioned this pull request Jul 23, 2020

Add julz to autoscaling approvers #8771

Merged

3 tasks

Conversation

vagababov commented Jun 9, 2020

Uh oh!

vagababov commented Jun 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vagababov commented Jun 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vagababov commented Jun 9, 2020

Uh oh!

julz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

markusthoemmes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

markusthoemmes Jun 9, 2020

Choose a reason for hiding this comment

Uh oh!

vagababov Jun 9, 2020

Choose a reason for hiding this comment

Uh oh!

markusthoemmes Jun 9, 2020

Choose a reason for hiding this comment

Uh oh!

vagababov Jun 9, 2020

Choose a reason for hiding this comment

Uh oh!

markusthoemmes Jun 9, 2020

Choose a reason for hiding this comment

Uh oh!

vagababov Jun 9, 2020

Choose a reason for hiding this comment

Uh oh!

vagababov commented Jun 9, 2020

Uh oh!

vagababov commented Jun 9, 2020

Uh oh!

knative-metrics-robot commented Jun 9, 2020

Uh oh!

vagababov commented Jun 9, 2020

Uh oh!

knative-test-reporter-robot commented Jun 9, 2020

Uh oh!

markusthoemmes left a comment

Choose a reason for hiding this comment

Uh oh!

knative-prow-robot commented Jun 9, 2020

Uh oh!

julz commented Jun 9, 2020

Uh oh!

vagababov commented Jun 9, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

vagababov commented Jun 9, 2020 •

edited

Loading

vagababov commented Jun 9, 2020 •

edited

Loading

julz left a comment •

edited

Loading