Skip to content

upstream: locality weighted load balancing.#2892

Merged
htuch merged 16 commits intoenvoyproxy:masterfrom
htuch:locality-weighted-lb
Mar 30, 2018
Merged

upstream: locality weighted load balancing.#2892
htuch merged 16 commits intoenvoyproxy:masterfrom
htuch:locality-weighted-lb

Conversation

@htuch
Copy link
Copy Markdown
Member

@htuch htuch commented Mar 23, 2018

Underlying issue: #2725

Risk Level: Low (only enabled when explicitly configured).
Testing: Unit tests for EDS, LoadBalancerImpl and UpstreamImpl. The load stats integration test
provides end-to-end validation.
Docs Changes: envoyproxy/data-plane-api#565
Release Notes: envoyproxy/data-plane-api#579

Signed-off-by: Harvey Tuch htuch@google.com

@htuch
Copy link
Copy Markdown
Member Author

htuch commented Mar 23, 2018

@alyssawilk @zuercher This is ready for review, I'm particularly interested in @zuercher's thoughts on how this plays with subset LB (I haven't given that a lot of thought, that's the only missing bit AFAICT). For context, this rounds out locality balancing for the non-affinity case. I'll be following up next week with support for affinity balancers.

@zuercher
Copy link
Copy Markdown
Member

I'll try to take a look at this tonight or tomorrow, but I'll be out of the office all next week.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not passing any LocalityWeights here will end up disabling this feature. It seems sufficient to just pass along original_host_set_.localityWeights(), though. That way the underlying load balancers have all the information to make the locality weighted load balancing decisions.

Risk Level: Low (only enabled when explicitly configured).
Testing: Unit tests for EDS, LoadBalancerImpl and UpstreamImpl. The load stats integration test
provides end-to-end validation.
Docs Changes: envoyproxy/data-plane-api#565
Release Notes:

Signed-off-by: Harvey Tuch <htuch@google.com>
@alyssawilk alyssawilk self-assigned this Mar 26, 2018
Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given today's meeting slate, review load, and rotation I did a first pass on code only - I owe you a review of tests.

Looks good overall - the usual passel of nits follow :-P

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seem to remember you offering to make more of this structured data rather than vectors of pairs and such? It's optional (you're not making things much worse than they were) but it'd be nice!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I'll leave as is, since it's local to this function, so it would be probably less readable to start defining new types. I've been generally trying to avoid the contagion of vectors at the interface level.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we consider a weight change to be a change to the host or should this comment be updated?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit more nuanced in that weight changes for hosts don't cause us to recompute, but weight changes for localities do cause us to rebuild the schedule. This has to do with plumbing more than anything, we can do the same lazy weight updates at the locality level. I've updated the log and will leave some TODOs to optimize this later.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if performing locality weighted balancing? If locality weighted balancing is configured?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is such a nice clean change given the prior refactor - thanks!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for follow-up totally reasonable but I'd expect the host list to change super frequently and the locality list to change infrequently, and the weights to change infrequently. Do we really want to rebuild this every time?

This also seems like a nice utility function to break out for unit testing.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree; as mentioned above, we should probably lazily rebuild as we age entries out of the scheduler during picks. I think it's better as a TODO to tune as this starts to matter (rebuilding with few localities is still really cheap). We would need to first verify that the locality indexes haven't changed (i.e. the map from host vector per locality to absolute locality), then we could just update the local map; this might cost more than just rebuilding the scheduler for a small number of localities.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal scheduling structures always auto-add after pick. I think it makes the code a bit easier to use but it's definitely not normal priority queue logic. Your call if you want to leave it this way for C++ consistency but I figured I'd at least mention it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is actually deliberate, as we use it in host-level picking for the lazy weight updates. I'll leave it as is, since as discussed above, we'd like to do this at the locality level as well.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lazyreview: I'm sure we have coverage of this but fo we have functional unit testing? It'd be nice to have, even if we're just querying the scheduler or some such.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this coverage from HostSetImplLocalityTest. I think these tests are fairly unit/functional level, but they do include the scheduler. LMK if you want me to make this method public and I can do the finer grained test.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments would be nice. I got confused reading this code, sanity checked against the priority weights, realized it was the same calculation and got sorted out by my comments over there ;-)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we should consider defining 1.4 as a kOverProvisioning factor so if we make it configurable we remember to update it in both place.

@htuch htuch force-pushed the locality-weighted-lb branch from 29fc19d to 118336d Compare March 28, 2018 22:38
@htuch
Copy link
Copy Markdown
Member Author

htuch commented Mar 28, 2018

@zuercher I've left a TODO for subset LB, as I think the situation is a bit more complicated than just plumbing through the original locality weights. Here's the copy+paste of the comment:

  // We pass in an empty list of locality weights here. This effectively disables locality balancing
  // for subset LB.
  // TODO(htuch): We should consider adding locality awareness here, but we need to do some design
  // work first, and this might not even be a desirable thing to do. Consider for example a
  // situation in which you have 50/50 split across two localities X/Y which have 100 hosts each
  // without subsetting. If the subset LB results in X having only 1 host selected but Y having 100,
  // then a lot more load is being dumped on the single host in X than originally anticipated in the
  // load balancing assignment delivered via EDS. It might seem you want to further weight by subset
  // size in order for this to make sense. However, while the original X/Y weightings can be
  // respected this way, those weightings were made by a management server that was not taking into
  // consideration subsets (e.g. LRS only reports at locality level).

@alyssawilk
Copy link
Copy Markdown
Contributor

builds look unhappy. Still worth me taking another look?

@htuch
Copy link
Copy Markdown
Member Author

htuch commented Mar 29, 2018

@alyssawilk yeah, please take a look, I'm sure it's something minor, will dig into this.

Signed-off-by: Harvey Tuch <htuch@google.com>
Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo 2 nits.

// Priority levels are considered overprovisioned with this factor. This means that we don't
// consider a priority level unhealthy until the ratio of healthy hosts multiplied by
// kOverProvisioningFactor drops below 1.0.
static constexpr double kOverProvisioningFactor = 1.4;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're defining these in 2 places can we have a unit test making sure they stay in sync?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need them to stay in sync? I think you might conceivably want to use different setting for priority vs. locality level..

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the end of the day they're both talking about how many extra hosts you have. If we ever allow configuring per-hostset we'd have to then adapt the priority level over-provisioning to take that into account. So yes, I think they need to stay in sync.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will fix.

EXPECT_EQ(nullptr, cluster_->prioritySet().hostSetsPerPriority()[0]->localityWeights());
}

// Validate that onConfigUpdate() propagatees locality weights to the host set when locality
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

propogates

@htuch htuch merged commit 208d59e into envoyproxy:master Mar 30, 2018
@htuch htuch deleted the locality-weighted-lb branch March 30, 2018 14:19
htuch added a commit to envoyproxy/data-plane-api that referenced this pull request Apr 10, 2018
Elite1015 pushed a commit to Elite1015/data-plane-api that referenced this pull request Feb 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants