-
Notifications
You must be signed in to change notification settings - Fork 5.2k
contrib: implement Peak EWMA load balancing policy #40653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
contrib: implement Peak EWMA load balancing policy #40653
Conversation
|
Hi @rroblak, welcome and thank you for your contribution. We will try to review your Pull Request as quickly as possible. In the meantime, please take a look at the contribution guidelines if you have not done so already. |
|
This looks really good, thanks for working on this! Will it be possible to support things like localityLbConfig/zoneAwareLbConfig or slow start mode? I'm not suggesting that needs to be included here (also not a maintainer) but I'm curious if any of the core logic here would restrict that. |
07e1f68 to
3e5e9d4
Compare
I was just about to ask about this aspect as well ... also see my comment in the istio/istio#35102 (comment). With Peak EWMA being about maintaining a low latency, crossing or not crossing a zone barrier comes into question. |
|
Thanks for the great questions, @jukie and @frittentheke! localityLbConfig/zoneAwareLbConfigThis implementation should be compatible with localityLbConfig/zoneAwareLbConfig. They could act as a pre-filter to select a host set, similar to how the current implementation uses host health filtering to only consider healthy hosts. Then the Peak EWMA policy would P2C on that subset to choose the fastest host. That said, I haven't looked at the object/data model for localityLbConfig/zoneAwareLbConfig so I'd need to dig through to be 100% sure. Needless to say, however, locality/zone-awareness highlights one of the strengths— and elegance— of Peak EWMA in comparison to load balancing algorithms that don't consider request latency as an input: given an undifferentiated pool of upstream hosts Peak EWMA will dynamically weight traffic toward the upstream hosts with the lowest RTT, which is in practice is the local zone first, followed by increasingly distant zones. In my experience running this the past few years across data centers, Peak EWMA simplifies configuration and also mitigates partial-failure issues where an upstream host's RTT increases substantially but not enough to cause the host to be marked unhealthy. In this scenario, Peak EWMA will significantly reduce traffic to the affected upstream host (even if it's in the local zone), whereas fixed configurations will continue to send requests to such a host. Slow StartRegarding slow start, in a way it's already implemented via the If we wanted to incorporate the common LB config slow start params we'd need think a bit about how to do that since they would overlap with the Hope that helps! Let me know your thoughts. |
|
@rroblak CI won't run because of the DCO check failing. You can follow the instructions here to fix it and let the baseline tests run. |
|
/wait |
2a81114 to
f53dd24
Compare
Adds Peak EWMA (Exponentially Weighted Moving Average) load balancing policy that uses Power of Two Choices algorithm with real-time RTT measurements for latency-aware request routing. Key components: - Load balancer: envoy.load_balancing_policies.peak_ewma - HTTP filter: envoy.filters.http.peak_ewma for RTT measurement - Configuration: decay_time, aggregation_interval, max_samples_per_host, default_rtt, penalty_value Implementation uses lock-free atomic ring buffers for RTT sample collection and host-attached storage pattern. Draws from Finagle's Peak EWMA algorithm while avoiding locks, and patterns after Envoy's existing client-side WRR load balancing implementation for main/worker thread coordination. Fixes envoyproxy#20907 Signed-off-by: Ryan Oblak <[email protected]>
f53dd24 to
95b56a5
Compare
|
friendly ping @tonya11en |
|
I'm waiting for an end-user to chime in on the original issue before reviewing. We need an end-user sponsor that is willing to use this. |
|
/wait Pending an end-user of this extension |
|
Alright, we have an end-user (#20907 (comment)). I'll start combing through this, just give me until EOW if you don't mind. |
tonya11en
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an enormous PR, so it'll take a couple passes for me to parse all of it. I made a few comments.
| // RTT sample recorded successfully | ||
| } | ||
| } else { | ||
| // Host missing Peak EWMA data - should not happen after initialization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably detect whether it's happening after initialization and at least emit a warning log. Also, explain the circumstances in which this would happen.
| *upstream_timing.first_upstream_rx_byte_received_ - | ||
| *upstream_timing.first_upstream_tx_byte_sent_); | ||
|
|
||
| // Record RTT sample in host-attached atomic storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Terminate your comments with a period, please.
| namespace LoadBalancingPolicies { | ||
| namespace PeakEwma { | ||
|
|
||
| double Cost::compute(double rtt_ewma_ms, double active_requests, double default_rtt_ms) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assert the params are non-negative.
| : cost_stat_(scope.gaugeFromString("peak_ewma." + host->address()->asString() + ".cost", | ||
| Stats::Gauge::ImportMode::NeverImport)), | ||
| ewma_rtt_stat_( | ||
| scope.gaugeFromString("peak_ewma." + host->address()->asString() + ".ewma_rtt_ms", | ||
| Stats::Gauge::ImportMode::NeverImport)), | ||
| active_requests_stat_( | ||
| scope.gaugeFromString("peak_ewma." + host->address()->asString() + ".active_requests", | ||
| Stats::Gauge::ImportMode::NeverImport)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, this will cause a cardinality explosion by including the host addresses in the metric name. If folks use this LB policy without knowing this, it'll potentially bring down their TSDB or cause them to run out of quota with their metrics vendor.
However, this is going into contrib so I'm not going to hold the PR up over it. If you're ok with this, then just be sure to mention it in the docs or make it opt-in.
| void Observability::report( | ||
| const absl::flat_hash_map<Upstream::HostConstSharedPtr, | ||
| std::unique_ptr<GlobalHostStats>>& /* all_host_stats */) { | ||
| // Stats are published during aggregation - this is a placeholder for consistency | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why you need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. That was vestigial from previous development iterations. I removed it and refactored the code that is used into peak_ewma_lb.cc.
| // Write index wrapped around | ||
| return (max_samples - last_processed) + current_write; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this case for if the current_write variable overflows?
|
|
||
| // Process all new samples in chronological order | ||
| size_t processed_index = last_processed; | ||
| for (size_t i = 0; i < num_new_samples; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you handle num_new_samples being larger than the max samples? I'm either missing something or this loop will process some samples more than once under high load.
| /contrib/peak_ewma/filters/http/ @rroblak @UNOWNED | ||
| /contrib/peak_ewma/load_balancing_policies/ @rroblak @UNOWNED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may as well just own /contrib/peak_ewma.
|
/wait |
|
CI can't run until the DCO check succeeds. I'll approve once it's all green. |
|
/wait |
Signed-off-by: Ryan Oblak <[email protected]>
Signed-off-by: Ryan Oblak <[email protected]>
…rver avoidance Signed-off-by: Ryan Oblak <[email protected]>
Signed-off-by: Ryan Oblak <[email protected]>
Signed-off-by: Ryan Oblak <[email protected]>
Signed-off-by: Ryan Oblak <[email protected]>
Signed-off-by: Ryan Oblak <[email protected]>
3814720 to
0378af9
Compare
Commit Message
Adds Peak EWMA (Exponentially Weighted Moving Average) load balancing policy that uses Power of Two Choices algorithm with real-time RTT measurements for latency-aware request routing.
Key components:
envoy.load_balancing_policies.peak_ewmaenvoy.filters.http.peak_ewma for RTT measurementdecay_time,aggregation_interval,max_samples_per_host,default_rtt,penalty_valueImplementation uses lock-free atomic ring buffers for RTT sample collection and host-attached storage pattern. Draws from Finagle's Peak EWMA algorithm while avoiding locks, and patterns after Envoy's existing client-side WRR load balancing implementation for main/worker thread coordination.
Fixes #20907
Additional Description
This PR implements a new contrib load balancing policy based on the Peak EWMA (Exponentially Weighted Moving Average) algorithm, which provides latency-aware request routing using real-time RTT measurements.
This addresses the feature request in #20907 for a Peak EWMA load balancer implementation.
Performance Validation
Benchmark results demonstrate Peak EWMA's effectiveness at avoiding slow servers.
Test setup: 10 clients, 10 upstream servers (1 server 10x slower than others):
Peak EWMA demonstrates a dramatically lower tail latency than existing Envoy load balancing algos.
Risk Level
Medium
Testing
Docs Changes
docs/root/api-v3/config/contrib/load_balancing_policies/peak_ewma/peak_ewma.rstRelease Notes
Added to contrib extensions metadata. This is a new contrib extension so requires no changes to main release notes.
Platform Specific Features
N/A
Runtime Guard
N/A - This is a new contrib extension that must be explicitly configured
Issues
Fixes #20907
Deprecated
N/A
API Changes
This adds new contrib API surfaces:
envoy.extensions.load_balancing_policies.peak_ewma.v3alpha.PeakEwma- Load balancer configurationenvoy.extensions.filters.http.peak_ewma.v3alpha.PeakEwmaConfig- HTTP filter configurationBoth are marked as
work_in_progressfollowing contrib extension patterns. No changes to existing APIs.