Add per service locality weight setting by hzxuzhonghu · Pull Request #726 · istio/api

hzxuzhonghu · 2018-12-06T09:57:56Z

This is to support envoy locality-weighted-load-balancing

hzxuzhonghu · 2018-12-06T09:59:16Z

/assign @rshriram

rshriram

/lgtm

istio-testing · 2018-12-29T04:40:48Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hzxuzhonghu, rshriram

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [rshriram]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rshriram · 2018-12-29T04:41:00Z

@hzxuzhonghu sorry I missed this. This looks good.

costinm · 2018-12-29T05:24:36Z

Some comments on how this will be used ?
My understanding is that the weights come from the endpoints - based on load or other workload-specific info, and is just enabled in the cluster config.

Can you explain a bit what kind of envoy config will be generated, and how can this even be implemented ? It's clear we can't generate the 80% or whatever split - it's based on load info (which we may be able to get), endpoint health, etc.

Is there a doc about this ( on how it impact Istio ) ?

And please, if an API change has not be approved by the WG/TOC please add some doc
making it clear this is a proposed/experimental API and not covered by 'beta' guarantee. It'll be
pretty hard to separate otherwise.

hzxuzhonghu · 2018-12-29T06:52:04Z

@costinm https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/load_balancing/locality_weight

It will generate Cluster.CommonLbConfig for Envoy.

costinm · 2018-12-29T17:04:19Z

That seems to be an empty proto, that indicates to envoy to use the load info from EDS. What I'm trying to figure out is how does this DestinationRule translate into an envoy config that can provide what the API is claiming. Is it going to affect the EDS results ? But how ? And secondary - what is the use case for user to specify "80% to local zone, 20% to remote zone". Never seen this use case - all configs I've seen want as much as possible local, then fallback to zone with extra capacity in order of latency.

…

On Fri, Dec 28, 2018 at 10:52 PM Zhonghu Xu ***@***.***> wrote: @costinm <https://github.com/costinm> https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/load_balancing/locality_weight It will generate Cluster.CommonLbConfig <https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/cds.proto#cluster-commonlbconfig> for Envoy. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#726 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAFI6rio-IkyL87W3kI_J9JuekCDkANjks5u9xEUgaJpZM4ZGK1f> .

costinm · 2018-12-29T17:05:54Z

To be clear: we really want to implement locality based LB, and many thanks for starting - I just want to understand what is the plan, and if this is the right API.

…

On Sat, Dec 29, 2018 at 9:04 AM Costin Manolache ***@***.***> wrote: That seems to be an empty proto, that indicates to envoy to use the load info from EDS. What I'm trying to figure out is how does this DestinationRule translate into an envoy config that can provide what the API is claiming. Is it going to affect the EDS results ? But how ? And secondary - what is the use case for user to specify "80% to local zone, 20% to remote zone". Never seen this use case - all configs I've seen want as much as possible local, then fallback to zone with extra capacity in order of latency. On Fri, Dec 28, 2018 at 10:52 PM Zhonghu Xu ***@***.***> wrote: > @costinm <https://github.com/costinm> > https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/load_balancing/locality_weight > > It will generate Cluster.CommonLbConfig > <https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/cds.proto#cluster-commonlbconfig> > for Envoy. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#726 (comment)>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AAFI6rio-IkyL87W3kI_J9JuekCDkANjks5u9xEUgaJpZM4ZGK1f> > . >

rshriram · 2018-12-31T20:27:03Z

this requirement came from folks at Intuit and Atlassian. The ability to specify amount of load to each region or zero load to a region.

Weights are not auto adjusted based on number of endpoints per region or endpoint load. Its purely human driven with some company level policies that determine how much traffic needs to be spilled over to the remote region (0 or 100%). In other words, people wanted control over how traffic gets spilled to the remote regions.

costinm · 2018-12-31T21:50:14Z

How is it different than existing split, with labels for zone/region ? The envoy feature seems to be intended for real load balancing, not exact split. And how will this api interact with the real lb ? Design doc or discussions before api change would help, in particular for things with broad implications...

…

On Mon, Dec 31, 2018, 12:27 Shriram Rajagopalan ***@***.*** wrote: this requirement came from folks at Intuit and Atlassian. The ability to specify amount of load to each region or zero load to a region. Weights are not auto adjusted based on number of endpoints per region or endpoint load. Its purely human driven with some company level policies that determine how much traffic needs to be spilled over to the remote region (0 or 100%). In other words, people wanted control over how traffic gets spilled to the remote regions. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#726 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAFI6py0rlnrGVYUX77drDJ51dWYUOo3ks5u-nMXgaJpZM4ZGK1f> .

hzxuzhonghu · 2019-01-02T01:40:29Z

Locality weighted load balancing is configured by setting locality_weighted_lb_config in the cluster configuration and providing weights in LocalityLbEndpoints via load_balancing_weight.

The implementation is simple,

How is it different than existing split, with labels for zone/region ?

Actually, the existing split does not take effect I think. We dont set config as envoy expected.

costinm · 2019-01-02T04:10:34Z

On Tue, Jan 1, 2019 at 5:40 PM Zhonghu Xu ***@***.***> wrote: Locality weighted load balancing is configured by setting locality_weighted_lb_config in the cluster configuration and providing weights in LocalityLbEndpoints via load_balancing_weight. The implementation is simple,

Some details ? I don't see any simple option based on the envoy docs on locality_weighted_lb_config that could achieve what the API seems to promise.

How is it different than existing split, with labels for zone/region ? Actually, the existing split does not take effect I think. We dont set config as envoy expected.

I meant: the current traffic split we do for subsets ( based on labels, etc). If the workloads are labeled to reflect zone/etc - why wouldn't the destination rule split satisfy the requirement ? I assume it's because you want different splits by source - is this something we want for traffic split/DestinationRule in general ? But my primary concern is to not confuse users - we still want real locality-based load balancing, taking into account load, endpoints, etc. And the envoy option doesn't seem to match the percent based split in the API you are proposing.

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#726 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAFI6v17LKxp0-KQpr1HTRflJyVEdtRGks5u_A4NgaJpZM4ZGK1f> .

hzxuzhonghu · 2019-01-02T06:20:09Z

If the workloads are labeled to reflect zone/etc - why wouldn't
the destination rule split satisfy the requirement ? I assume it's because
you want different splits by source - is this something we
want for traffic split/DestinationRule in general ?

That's right. As you said DestinationRule can not achieve sourced routing. We may have workloads reside in multi region/zones, and access workloads in other multi region/zones. We need to control traffic based on both region/zone and the workloads number within it.

Some details ?

Envoy docs requires setting Cluster.CommonLbConfig.LocalityWeightedLbConfig and combined endpoint.LocalityLbEndpoints.load_balancing_weight . currently LocalityWeightedLbConfig is more like a flag(https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/cds.proto#cluster-commonlbconfig-localityweightedlbconfig), and load_balancing_weight has already been set in previous prs.

costinm · 2019-01-02T16:41:00Z

On Tue, Jan 1, 2019 at 10:20 PM Zhonghu Xu ***@***.***> wrote: If the workloads are labeled to reflect zone/etc - why wouldn't the destination rule split satisfy the requirement ? I assume it's because you want different splits by source - is this something we want for traffic split/DestinationRule in general ? That's right. As you said DestinationRule can not achieve sourced routing. We may have workloads reside in multi region/zones, and access workloads in other multi region/zones. We need to control traffic based on both region/zone and the workloads number within it.

Is there a doc describing the use cases, requirements - and how the implementation will work ? I'm as confused as before - and I suspect other devs are also not familiar with this, and I don't remember any discussion. Maybe a cleaner solution is to extend DestinationRule to allow source-based config. What you described so far doesn't sound like load balancing, but traffic split. In general "load balancing" implies "load" will be a factor.

Some details ? Envoy docs requires setting Cluster.CommonLbConfig.LocalityWeightedLbConfig and combined endpoint.LocalityLbEndpoints.load_balancing_weight . currently LocalityWeightedLbConfig is more like a flag( https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/cds.proto#cluster-commonlbconfig-localityweightedlbconfig), and load_balancing_weight has already been set in previous prs.

Without a document or discussion on the list or in WG - I don't think many people know about them. We need to improve the review process to make sure all PRs have clear indication if it is a bug fix or adds a new feature - and in the later case link to some doc. We can't require all developers to review all PRs. Even if I saw the PRs - without comments in the PR it's hard to understand what the final picture will be and that they were related to this feature. I've seen a bunch of PRs related to zero vpn, using load weight to adjust the gateway - but they can't be used in the general case, and we can't have a feature like this require zvpn and gateways. As I mentioned, 'load balancing' is a very complicated subject and the API in this PR doesn't fit any pattern or use I've seen - so docs please...

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#726 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAFI6qsn0cMU_Y7EmIQph4mfl_UoYizEks5u_E-bgaJpZM4ZGK1f> .

rshriram · 2019-01-02T17:29:08Z

You are overthinking the problem. Envoy has all the knobs required to do differential load balancing between endpoints in the same cluster. We have had locality-aware routing (though non functional) for a while. All this does is specify what portion needs to be local vs what should be shed based on user specified parameters. This has no dependency on knowing what endpoints exist in the remote cluster (i.e. pilot talking to remote api servers). Nor does it have any dependency on load computation based on the endpoints.

Forcing users to create different versions of the same binary, just because they are in different regions is not a good strategy (it gets complicated when you want to do real version routing).

Besides, the main goal of doing this (the way its done) is so that we define a top level destinationRule (*.svc.cluster.local) that specifies how traffic should be distributed balanced across regions. Everything else will inherit it -- this is how destination rules work today.

The reason I didn't insist on a doc is because the impact is very scoped to just the specific use case we are tackling (for intuit/Atlassian etc. who wanted some manual override). IF not specified, traffic distribution will be same as what exists today. There is nothing here that prevents you from using the Google internal EDS load assignment servers that compute endpoint load across clusters in different regions and assign weights. This is literally a basic manual override, and actually implementing the AZ aware load balancing thing that we have been claiming for a while.

linsun · 2019-01-02T18:43:46Z

how does envoy side car know the region and zone info within istio env? I don't recall setting up instruction for this.

also, does this new traffic policy allow user to configure this most common user case @costinm outlined earlier? I can set to 100% to local region/zone but how do I configure the fallback?

all configs I've seen want as much as possible local, then fallback to zone with extra capacity in order of latency

rshriram · 2019-01-02T21:48:40Z

Service entry has locality field. And you can add region/az annotation to k8s nodes. We have been parsing these values for two years. Not using them though. Basically no user config. Just make sure cluster nodes are annotated per standard kubernetes docs.

In terms of the fallback, the functionality exists in a convoluted way in Envoy and works only if active health checking or outlier detection is enabled or retry with priority levels are enabled. This needs more work in Envoy. Alternatively you can write your own automation to change the values in the destination rule during such incidents.

linsun · 2019-01-03T15:50:45Z

thank you @rshriram for the clarification!!

hzxuzhonghu added 6 commits December 6, 2018 17:06

Add locality weight setting

e7a1c0f

Add locality weight setting

c91201b

generate

e12daee

Add locality weight setting

fbb99f6

generate

018a708

make proto commit

3a38117

istio-testing requested review from frankbu and mandarjog December 6, 2018 09:57

googlebot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Dec 6, 2018

istio-testing assigned rshriram Dec 6, 2018

rshriram approved these changes Dec 29, 2018

View reviewed changes

istio-testing added the lgtm label Dec 29, 2018

istio-testing added the approved label Dec 29, 2018

rshriram merged commit 08a19da into istio:release-1.1 Dec 29, 2018

hzxuzhonghu deleted the locality-weight branch December 29, 2018 07:13

Conversation

hzxuzhonghu commented Dec 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hzxuzhonghu commented Dec 6, 2018

Uh oh!

rshriram left a comment

Choose a reason for hiding this comment

Uh oh!

istio-testing commented Dec 29, 2018

Uh oh!

rshriram commented Dec 29, 2018

Uh oh!

costinm commented Dec 29, 2018

Uh oh!

hzxuzhonghu commented Dec 29, 2018

Uh oh!

costinm commented Dec 29, 2018 via email

Uh oh!

costinm commented Dec 29, 2018 via email

Uh oh!

rshriram commented Dec 31, 2018

Uh oh!

costinm commented Dec 31, 2018 via email

Uh oh!

hzxuzhonghu commented Jan 2, 2019

Uh oh!

costinm commented Jan 2, 2019 via email

Uh oh!

hzxuzhonghu commented Jan 2, 2019

Uh oh!

costinm commented Jan 2, 2019 via email

Uh oh!

rshriram commented Jan 2, 2019

Uh oh!

linsun commented Jan 2, 2019

Uh oh!

rshriram commented Jan 2, 2019

Uh oh!

linsun commented Jan 3, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

hzxuzhonghu commented Dec 6, 2018 •

edited

Loading