Skip to content

Add round robin policy for the cc > 3#8263

Merged
knative-prow-robot merged 2 commits intoknative:masterfrom
vagababov:20200608-rr-policy
Jun 9, 2020
Merged

Add round robin policy for the cc > 3#8263
knative-prow-robot merged 2 commits intoknative:masterfrom
vagababov:20200608-rr-policy

Conversation

@vagababov
Copy link
Contributor

Add a round robin policy for the CC > 3.
This is the simply one, i.e not the least loaded. I can iterate over that,
but the preliminary results look quite good as is.

Sample runs with the new policy:
https://mako.dev/run?run_key=5913306583793664&~act=1&scatter=1
With the old:
https://mako.dev/run?run_key=4860661639151616&~act=1&scatter=1
(there's some spurious spike)

/assign @julz @markusthoemmes
For #7664

Add a round robin policy for the CC > 3.
This is the simply one, i.e not the least loaded. I can iterate over that,
but the preliminary results look quite good as is.
@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Jun 9, 2020
@knative-prow-robot knative-prow-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 9, 2020
@vagababov
Copy link
Contributor Author

vagababov commented Jun 9, 2020

BenchmarkPolicy/random-power-of-2-choice-1-trackers-sequential-4                17963541                65.1 ns/op            32 B/op          1 allocs/op
BenchmarkPolicy/random-power-of-2-choice-1-trackers-parallel-4                           9266247               129 ns/op              32 B/op          1 allocs/op
BenchmarkPolicy/random-power-of-2-choice-2-trackers-sequential-4                22165728                55.0 ns/op            16 B/op          1 allocs/op
BenchmarkPolicy/random-power-of-2-choice-2-trackers-parallel-4                  13074493                87.8 ns/op            16 B/op          1 allocs/op
BenchmarkPolicy/random-power-of-2-choice-3-trackers-sequential-4                11433304               105 ns/op              16 B/op          1 allocs/op
BenchmarkPolicy/random-power-of-2-choice-3-trackers-parallel-4                   5732670               206 ns/op              16 B/op          1 allocs/op
BenchmarkPolicy/random-power-of-2-choice-10-trackers-sequential-4               10806351               112 ns/op              16 B/op          1 allocs/op
BenchmarkPolicy/random-power-of-2-choice-10-trackers-parallel-4                  5453574               221 ns/op              16 B/op          1 allocs/op
BenchmarkPolicy/random-power-of-2-choice-100-trackers-sequential-4              10693460               112 ns/op              16 B/op          1 allocs/op
BenchmarkPolicy/random-power-of-2-choice-100-trackers-parallel-4                 5213464               238 ns/op              16 B/op          1 allocs/op
BenchmarkPolicy/first-available-1-trackers-sequential-4                         199398207                6.01 ns/op            0 B/op          0 allocs/op
BenchmarkPolicy/first-available-1-trackers-parallel-4                           776109547                1.55 ns/op            0 B/op          0 allocs/op
BenchmarkPolicy/first-available-2-trackers-sequential-4                         199918666                6.01 ns/op            0 B/op          0 allocs/op
BenchmarkPolicy/first-available-2-trackers-parallel-4                           766629129                1.61 ns/op            0 B/op          0 allocs/op
BenchmarkPolicy/first-available-3-trackers-sequential-4                         199665427                6.02 ns/op            0 B/op          0 allocs/op
BenchmarkPolicy/first-available-3-trackers-parallel-4                           686974896                5.75 ns/op            0 B/op          0 allocs/op
BenchmarkPolicy/first-available-10-trackers-sequential-4                        200096355                6.02 ns/op            0 B/op          0 allocs/op
BenchmarkPolicy/first-available-10-trackers-parallel-4                          726126270                1.58 ns/op            0 B/op          0 allocs/op
BenchmarkPolicy/first-available-100-trackers-sequential-4                       198068689                6.02 ns/op            0 B/op          0 allocs/op
BenchmarkPolicy/first-available-100-trackers-parallel-4                         779410185                2.26 ns/op            0 B/op          0 allocs/op
BenchmarkPolicy/round-robin-1-trackers-sequential-4                             30659104                39.0 ns/op             0 B/op          0 allocs/op
BenchmarkPolicy/round-robin-1-trackers-parallel-4                               16495440                74.4 ns/op             0 B/op          0 allocs/op
BenchmarkPolicy/round-robin-2-trackers-sequential-4                             30653740                38.8 ns/op             0 B/op          0 allocs/op
BenchmarkPolicy/round-robin-2-trackers-parallel-4                               16868802                74.6 ns/op             0 B/op          0 allocs/op
BenchmarkPolicy/round-robin-3-trackers-sequential-4                             30827364                38.8 ns/op             0 B/op          0 allocs/op
BenchmarkPolicy/round-robin-3-trackers-parallel-4                               16899624                69.4 ns/op             0 B/op          0 allocs/op
BenchmarkPolicy/round-robin-10-trackers-sequential-4                            30818992                38.9 ns/op             0 B/op          0 allocs/op
BenchmarkPolicy/round-robin-10-trackers-parallel-4                              16370354                73.1 ns/op             0 B/op          0 allocs/op
BenchmarkPolicy/round-robin-100-trackers-sequential-4                           30874807                38.9 ns/op             0 B/op          0 allocs/op
BenchmarkPolicy/round-robin-100-trackers-parallel-4                             19958426                73.4 ns/op             0 B/op          0 allocs/op


Benchmark results

@knative-prow-robot knative-prow-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. area/autoscale area/networking labels Jun 9, 2020
@vagababov
Copy link
Contributor Author

vagababov commented Jun 9, 2020

Somehow choice2 is the slowest. 🤷

Might need a better RNG.

@vagababov
Copy link
Contributor Author

/cc @mattmoor

Copy link
Contributor

@julz julz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super, super cool. This'll make higher CC cases with activator in path way way nicer. The new graph looks nice.

Couple of test nits but otherwise looks good to me,

Copy link
Contributor

@markusthoemmes markusthoemmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like :)

revBreaker = newInfiniteBreaker(logger)
lbp = randomChoice2Policy
} else {
case containerConcurrency <= 3:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have benchmarks that warrant this switch? I.e. have we tried round robin for the lower CC values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well the benchmark above shows that full first is better even on code level. Given that pods might be shared (if we don't divide evenly) in case of lower cc we prefer to use the pods at the tail less, since they might be shared, causing queueing.

rrp := roundRobinPolicyT{}
return func(ctx context.Context, targets []*podTracker) (func(), *podTracker) {
rrp.mu.Lock()
defer rrp.mu.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get away without locking? We could try to get away using atomics potentially, though the benchmarks don't really warrant that I suppose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried, I was not happy with provable semantics for parallel requests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, we can iterate if necessary 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem, is that we start moving indices either independently or in interleaving fashion. In theory with enough requests it will still average out, but it's much harder to reason about

@vagababov
Copy link
Contributor Author

This is ready

@knative-prow-robot knative-prow-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 9, 2020
@vagababov
Copy link
Contributor Author

I am gonna run the tests with cc=100 as well

@knative-metrics-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-knative-serving-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/activator/net/lb_policy.go 90.5% 94.1% 3.6
pkg/activator/net/throttler.go 91.4% 91.5% 0.1

@vagababov
Copy link
Contributor Author

https://mako.dev/run?run_key=6272698575486976&~act=1&~ac=1 — some random spike, but otherwise looks reasonable

@knative-test-reporter-robot

The following jobs failed:

Test name Triggers Retries
pull-knative-serving-autotls-tests 0/3

Failed non-flaky tests preventing automatic retry of pull-knative-serving-autotls-tests:

test/e2e/autotls.TestAutoTLS
test/e2e/autotls.TestAutoTLS/HTTP01

Copy link
Contributor

@markusthoemmes markusthoemmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 9, 2020
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: markusthoemmes, vagababov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@julz
Copy link
Contributor

julz commented Jun 9, 2020

/lgtm

@vagababov
Copy link
Contributor Author

/retest

@knative-prow-robot knative-prow-robot merged commit f1b6395 into knative:master Jun 9, 2020
@vagababov vagababov deleted the 20200608-rr-policy branch June 23, 2020 23:17
julz added a commit to julz/docs that referenced this pull request Jul 17, 2020
As of knative/serving#8263 the activator no longer has this behaviour when in the path, and instead does nice round-robin load balancing across the replicas so this warning is no longer needed.
knative-prow-robot pushed a commit to knative/docs that referenced this pull request Jul 17, 2020
As of knative/serving#8263 the activator no longer has this behaviour when in the path, and instead does nice round-robin load balancing across the replicas so this warning is no longer needed.
@julz julz mentioned this pull request Jul 23, 2020
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/autoscale area/networking cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants