upstream: locality weighted load balancing. by htuch · Pull Request #2892 · envoyproxy/envoy

htuch · 2018-03-23T21:24:38Z

Underlying issue: #2725

Risk Level: Low (only enabled when explicitly configured).
Testing: Unit tests for EDS, LoadBalancerImpl and UpstreamImpl. The load stats integration test
provides end-to-end validation.
Docs Changes: envoyproxy/data-plane-api#565
Release Notes: envoyproxy/data-plane-api#579

Signed-off-by: Harvey Tuch htuch@google.com

htuch · 2018-03-23T21:25:56Z

@alyssawilk @zuercher This is ready for review, I'm particularly interested in @zuercher's thoughts on how this plays with subset LB (I haven't given that a lot of thought, that's the only missing bit AFAICT). For context, this rounds out locality balancing for the non-affinity case. I'll be following up next week with support for affinity balancers.

zuercher · 2018-03-23T22:25:01Z

I'll try to take a look at this tonight or tomorrow, but I'll be out of the office all next week.

zuercher · 2018-03-25T05:13:52Z

source/common/upstream/subset_lb.cc

I think not passing any LocalityWeights here will end up disabling this feature. It seems sufficient to just pass along original_host_set_.localityWeights(), though. That way the underlying load balancers have all the information to make the locality weighted load balancing decisions.

Risk Level: Low (only enabled when explicitly configured). Testing: Unit tests for EDS, LoadBalancerImpl and UpstreamImpl. The load stats integration test provides end-to-end validation. Docs Changes: envoyproxy/data-plane-api#565 Release Notes: Signed-off-by: Harvey Tuch <htuch@google.com>

alyssawilk

Given today's meeting slate, review load, and rotation I did a first pass on code only - I owe you a review of tests.

Looks good overall - the usual passel of nits follow :-P

alyssawilk · 2018-03-26T13:16:28Z

source/common/upstream/eds.cc

I seem to remember you offering to make more of this structured data rather than vectors of pairs and such? It's optional (you're not making things much worse than they were) but it'd be nice!

This one I'll leave as is, since it's local to this function, so it would be probably less readable to start defining new types. I've been generally trying to avoid the contagion of vectors at the interface level.

alyssawilk · 2018-03-26T13:18:15Z

source/common/upstream/eds.cc

Do we consider a weight change to be a change to the host or should this comment be updated?

It's a bit more nuanced in that weight changes for hosts don't cause us to recompute, but weight changes for localities do cause us to rebuild the schedule. This has to do with plumbing more than anything, we can do the same lazy weight updates at the locality level. I've updated the log and will leave some TODOs to optimize this later.

alyssawilk · 2018-03-26T13:21:58Z

include/envoy/upstream/upstream.h

if performing locality weighted balancing? If locality weighted balancing is configured?

alyssawilk · 2018-03-26T13:22:20Z

source/common/upstream/load_balancer_impl.cc

This is such a nice clean change given the prior refactor - thanks!

alyssawilk · 2018-03-26T13:30:21Z

source/common/upstream/upstream_impl.cc

TODO for follow-up totally reasonable but I'd expect the host list to change super frequently and the locality list to change infrequently, and the weights to change infrequently. Do we really want to rebuild this every time?

This also seems like a nice utility function to break out for unit testing.

I agree; as mentioned above, we should probably lazily rebuild as we age entries out of the scheduler during picks. I think it's better as a TODO to tune as this starts to matter (rebuilding with few localities is still really cheap). We would need to first verify that the locality indexes haven't changed (i.e. the map from host vector per locality to absolute locality), then we could just update the local map; this might cost more than just rebuilding the scheduler for a small number of localities.

alyssawilk · 2018-03-26T13:31:51Z

source/common/upstream/upstream_impl.cc

The internal scheduling structures always auto-add after pick. I think it makes the code a bit easier to use but it's definitely not normal priority queue logic. Your call if you want to leave it this way for C++ consistency but I figured I'd at least mention it.

Yeah, this is actually deliberate, as we use it in host-level picking for the lazy weight updates. I'll leave it as is, since as discussed above, we'd like to do this at the locality level as well.

alyssawilk · 2018-03-26T14:06:02Z

source/common/upstream/upstream_impl.h

lazyreview: I'm sure we have coverage of this but fo we have functional unit testing? It'd be nice to have, even if we're just querying the scheduler or some such.

We have this coverage from HostSetImplLocalityTest. I think these tests are fairly unit/functional level, but they do include the scheduler. LMK if you want me to make this method public and I can do the finer grained test.

alyssawilk · 2018-03-26T14:10:31Z

source/common/upstream/upstream_impl.cc

Comments would be nice. I got confused reading this code, sanity checked against the priority weights, realized it was the same calculation and got sorted out by my comments over there ;-)

Also we should consider defining 1.4 as a kOverProvisioning factor so if we make it configurable we remember to update it in both place.

Signed-off-by: Harvey Tuch <htuch@google.com>

htuch · 2018-03-28T22:40:25Z

@zuercher I've left a TODO for subset LB, as I think the situation is a bit more complicated than just plumbing through the original locality weights. Here's the copy+paste of the comment:

  // We pass in an empty list of locality weights here. This effectively disables locality balancing
  // for subset LB.
  // TODO(htuch): We should consider adding locality awareness here, but we need to do some design
  // work first, and this might not even be a desirable thing to do. Consider for example a
  // situation in which you have 50/50 split across two localities X/Y which have 100 hosts each
  // without subsetting. If the subset LB results in X having only 1 host selected but Y having 100,
  // then a lot more load is being dumped on the single host in X than originally anticipated in the
  // load balancing assignment delivered via EDS. It might seem you want to further weight by subset
  // size in order for this to make sense. However, while the original X/Y weightings can be
  // respected this way, those weightings were made by a management server that was not taking into
  // consideration subsets (e.g. LRS only reports at locality level).

alyssawilk · 2018-03-29T13:09:44Z

builds look unhappy. Still worth me taking another look?

htuch · 2018-03-29T14:02:41Z

@alyssawilk yeah, please take a look, I'm sure it's something minor, will dig into this.

Signed-off-by: Harvey Tuch <htuch@google.com>

alyssawilk

LGTM modulo 2 nits.

alyssawilk · 2018-03-29T16:56:16Z

source/common/upstream/upstream_impl.cc

+// Priority levels are considered overprovisioned with this factor. This means that we don't
+// consider a priority level unhealthy until the ratio of healthy hosts multiplied by
+// kOverProvisioningFactor drops below 1.0.
+static constexpr double kOverProvisioningFactor = 1.4;


If we're defining these in 2 places can we have a unit test making sure they stay in sync?

Do we need them to stay in sync? I think you might conceivably want to use different setting for priority vs. locality level..

At the end of the day they're both talking about how many extra hosts you have. If we ever allow configuring per-hostset we'd have to then adapt the priority level over-provisioning to take that into account. So yes, I think they need to stay in sync.

OK, will fix.

alyssawilk · 2018-03-29T17:14:23Z

test/common/upstream/eds_test.cc

+  EXPECT_EQ(nullptr, cluster_->prioritySet().hostSetsPerPriority()[0]->localityWeights());
+}
+
+// Validate that onConfigUpdate() propagatees locality weights to the host set when locality


Signed-off-by: Harvey Tuch <htuch@google.com>

…579) See envoyproxy/envoy#2892. Signed-off-by: Harvey Tuch <htuch@google.com>

…#579) See envoyproxy/envoy#2892. Signed-off-by: Harvey Tuch <htuch@google.com>

htuch mentioned this pull request Mar 23, 2018

docs: release notes and unhide fields for locality based balancing. envoyproxy/data-plane-api#579

Merged

zuercher reviewed Mar 25, 2018

View reviewed changes

alyssawilk self-assigned this Mar 26, 2018

alyssawilk reviewed Mar 26, 2018

View reviewed changes

htuch added 7 commits March 28, 2018 16:18

Merge remote-tracking branch 'upstream/master' into locality-weighted-lb

faf4981

Fix unique_ptr snafu that only showed up on OS X build.

5e2acf4

Signed-off-by: Harvey Tuch <htuch@google.com>

Eager vs. lazy locality weight adjustment.

58328f8

Signed-off-by: Harvey Tuch <htuch@google.com>

Cleanup overprovisioning and add tests.

b96b20f

Signed-off-by: Harvey Tuch <htuch@google.com>

Remove some handling for cases that can't happen due to invariants.

0d8a161

Signed-off-by: Harvey Tuch <htuch@google.com>

TODO for subset LB locality weighting.

6128d37

Signed-off-by: Harvey Tuch <htuch@google.com>

Merge remote-tracking branch 'upstream/master' into locality-weighted-lb

118336d

htuch force-pushed the locality-weighted-lb branch from 29fc19d to 118336d Compare March 28, 2018 22:38

Fix upstream_impl_test TSAN fail.

2a2123d

Signed-off-by: Harvey Tuch <htuch@google.com>

alyssawilk reviewed Mar 29, 2018

View reviewed changes

htuch added 7 commits March 29, 2018 14:03

Merge remote-tracking branch 'upstream/master' into locality-weighted-lb

d370fbf

Fix typo.

7213832

Signed-off-by: Harvey Tuch <htuch@google.com>

Merge remote-tracking branch 'upstream/master' into locality-weighted-lb

17e6dbc

Use data-plane-api at HEAD.

317693b

Signed-off-by: Harvey Tuch <htuch@google.com>

Bump data plane SHA further.

be7c53c

Signed-off-by: Harvey Tuch <htuch@google.com>

Share kOverProvisioningFactor definition.

d8426ec

Signed-off-by: Harvey Tuch <htuch@google.com>

Merge remote-tracking branch 'upstream/master' into locality-weighted-lb

6fed65a

alyssawilk approved these changes Mar 30, 2018

View reviewed changes

htuch merged commit 208d59e into envoyproxy:master Mar 30, 2018

htuch deleted the locality-weighted-lb branch March 30, 2018 14:19

htuch added a commit to envoyproxy/data-plane-api that referenced this pull request Apr 10, 2018

docs: release notes and unhide fields for locality based balancing. (#…

2c1c23d

…579) See envoyproxy/envoy#2892. Signed-off-by: Harvey Tuch <htuch@google.com>

htuch mentioned this pull request Apr 30, 2018

Subset load balancer: Add a fallback policy that uses priority/locality weighting #3123

Closed

Elite1015 pushed a commit to Elite1015/data-plane-api that referenced this pull request Feb 23, 2025

docs: release notes and unhide fields for locality based balancing. (…

ba04dff

…#579) See envoyproxy/envoy#2892. Signed-off-by: Harvey Tuch <htuch@google.com>

Conversation

htuch commented Mar 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

htuch commented Mar 23, 2018

Uh oh!

zuercher commented Mar 23, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alyssawilk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

htuch commented Mar 28, 2018

Uh oh!

alyssawilk commented Mar 29, 2018

Uh oh!

htuch commented Mar 29, 2018

Uh oh!

alyssawilk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

htuch commented Mar 23, 2018 •

edited

Loading