Skip to content

Conversation

@rroblak
Copy link
Contributor

@rroblak rroblak commented Aug 8, 2025

Commit Message

Adds Peak EWMA (Exponentially Weighted Moving Average) load balancing policy that uses Power of Two Choices algorithm with real-time RTT measurements for latency-aware request routing.

Key components:

  • Load balancer: envoy.load_balancing_policies.peak_ewma
  • HTTP filter: envoy.filters.http.peak_ewma for RTT measurement
  • Configuration: decay_time, aggregation_interval, max_samples_per_host, default_rtt, penalty_value

Implementation uses lock-free atomic ring buffers for RTT sample collection and host-attached storage pattern. Draws from Finagle's Peak EWMA algorithm while avoiding locks, and patterns after Envoy's existing client-side WRR load balancing implementation for main/worker thread coordination.

Fixes #20907

Additional Description

This PR implements a new contrib load balancing policy based on the Peak EWMA (Exponentially Weighted Moving Average) algorithm, which provides latency-aware request routing using real-time RTT measurements.

This addresses the feature request in #20907 for a Peak EWMA load balancer implementation.

Performance Validation

Benchmark results demonstrate Peak EWMA's effectiveness at avoiding slow servers.

Test setup: 10 clients, 10 upstream servers (1 server 10x slower than others):

Algorithm Average P50 P75 P90 P95 P99 Min Max Std Dev
round_robin 19.08 8.46 10.01 101.79 103.39 105.58 5.79 108.02 29.8
least_request 11.00 8.92 10.71 13.10 15.38 103.00 5.98 109.35 11.72
random 18.66 8.47 10.64 65.51 103.28 106.18 5.95 112.42 28.67
peak_ewma 10.16 9.66 11.13 13.29 14.52 18.56 6.31 26.16 2.45

Peak EWMA demonstrates a dramatically lower tail latency than existing Envoy load balancing algos.

Risk Level

Medium

Testing

  • Unit Tests: Comprehensive coverage for all components (Cost, Observability, HostData, Config)
  • Integration Tests: End-to-end load balancing behavior with latency simulation
  • HTTP Filter Tests: RTT measurement and sample recording functionality
  • Manual Testing: Verified 100% traffic routing to fast servers vs slow servers

Docs Changes

  • Added comprehensive API documentation in docs/root/api-v3/config/contrib/load_balancing_policies/peak_ewma/peak_ewma.rst
  • Documented all 5 config parameters with detailed explanations and examples
  • Added statistics section following standard Envoy format
  • Proto files include extensive inline documentation

Release Notes

Added to contrib extensions metadata. This is a new contrib extension so requires no changes to main release notes.

Platform Specific Features

N/A

Runtime Guard

N/A - This is a new contrib extension that must be explicitly configured

Issues

Fixes #20907

Deprecated

N/A

API Changes

This adds new contrib API surfaces:

  • envoy.extensions.load_balancing_policies.peak_ewma.v3alpha.PeakEwma - Load balancer configuration
  • envoy.extensions.filters.http.peak_ewma.v3alpha.PeakEwmaConfig - HTTP filter configuration

Both are marked as work_in_progress following contrib extension patterns. No changes to existing APIs.

@repokitteh-read-only
Copy link

Hi @rroblak, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #40653 was opened by rroblak.

see: more, trace.

@jukie
Copy link
Contributor

jukie commented Aug 9, 2025

This looks really good, thanks for working on this! Will it be possible to support things like localityLbConfig/zoneAwareLbConfig or slow start mode? I'm not suggesting that needs to be included here (also not a maintainer) but I'm curious if any of the core logic here would restrict that.

@rroblak rroblak force-pushed the rroblak/peak-ewma-squashed branch from 07e1f68 to 3e5e9d4 Compare August 9, 2025 05:32
@mathetake mathetake requested a review from tonya11en August 11, 2025 19:59
@frittentheke
Copy link

This looks really good, thanks for working on this! Will it be possible to support things like localityLbConfig/zoneAwareLbConfig or slow start mode? I'm not suggesting that needs to be included here (also not a maintainer) but I'm curious if any of the core logic here would restrict that.

I was just about to ask about this aspect as well ... also see my comment in the istio/istio#35102 (comment).

With Peak EWMA being about maintaining a low latency, crossing or not crossing a zone barrier comes into question.
Even though potentially more expensive, there could be a trade-off to make (or not)

@rroblak
Copy link
Contributor Author

rroblak commented Aug 19, 2025

Thanks for the great questions, @jukie and @frittentheke!

localityLbConfig/zoneAwareLbConfig

This implementation should be compatible with localityLbConfig/zoneAwareLbConfig. They could act as a pre-filter to select a host set, similar to how the current implementation uses host health filtering to only consider healthy hosts. Then the Peak EWMA policy would P2C on that subset to choose the fastest host.

That said, I haven't looked at the object/data model for localityLbConfig/zoneAwareLbConfig so I'd need to dig through to be 100% sure.

Needless to say, however, locality/zone-awareness highlights one of the strengths— and elegance— of Peak EWMA in comparison to load balancing algorithms that don't consider request latency as an input: given an undifferentiated pool of upstream hosts Peak EWMA will dynamically weight traffic toward the upstream hosts with the lowest RTT, which is in practice is the local zone first, followed by increasingly distant zones.

In my experience running this the past few years across data centers, Peak EWMA simplifies configuration and also mitigates partial-failure issues where an upstream host's RTT increases substantially but not enough to cause the host to be marked unhealthy. In this scenario, Peak EWMA will significantly reduce traffic to the affected upstream host (even if it's in the local zone), whereas fixed configurations will continue to send requests to such a host.

Slow Start

Regarding slow start, in a way it's already implemented via the default_rtt and penalty_value parameters. Those ensure that new hosts only receive a trickle of requests until more data is gathered on their respective request latencies. This again highlights Peak EWMA's elegance in contrast to a fixed slow start config.

If we wanted to incorporate the common LB config slow start params we'd need think a bit about how to do that since they would overlap with the default_rtt and penalty_value params I defined. It's likely possible, but not immediately obvious to me how to do it.

Hope that helps! Let me know your thoughts.

@tonya11en
Copy link
Member

@rroblak CI won't run because of the DCO check failing. You can follow the instructions here to fix it and let the baseline tests run.

@tonya11en
Copy link
Member

/wait

@rroblak rroblak force-pushed the rroblak/peak-ewma-squashed branch 9 times, most recently from 2a81114 to f53dd24 Compare September 9, 2025 19:07
Adds Peak EWMA (Exponentially Weighted Moving Average) load balancing policy
that uses Power of Two Choices algorithm with real-time RTT measurements for
latency-aware request routing.

Key components:
- Load balancer: envoy.load_balancing_policies.peak_ewma
- HTTP filter: envoy.filters.http.peak_ewma for RTT measurement
- Configuration: decay_time, aggregation_interval, max_samples_per_host,
  default_rtt, penalty_value

Implementation uses lock-free atomic ring buffers for RTT sample collection
and host-attached storage pattern. Draws from Finagle's Peak EWMA algorithm
while avoiding locks, and patterns after Envoy's existing client-side WRR
load balancing implementation for main/worker thread coordination.

Fixes envoyproxy#20907

Signed-off-by: Ryan Oblak <[email protected]>
@nezdolik
Copy link
Member

friendly ping @tonya11en

@tonya11en
Copy link
Member

I'm waiting for an end-user to chime in on the original issue before reviewing. We need an end-user sponsor that is willing to use this.

@KBaichoo
Copy link
Contributor

/wait

Pending an end-user of this extension

@tonya11en
Copy link
Member

Alright, we have an end-user (#20907 (comment)). I'll start combing through this, just give me until EOW if you don't mind.

@tonya11en tonya11en added the contrib PRs for contrig label Oct 6, 2025
Copy link
Member

@tonya11en tonya11en left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an enormous PR, so it'll take a couple passes for me to parse all of it. I made a few comments.

// RTT sample recorded successfully
}
} else {
// Host missing Peak EWMA data - should not happen after initialization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably detect whether it's happening after initialization and at least emit a warning log. Also, explain the circumstances in which this would happen.

*upstream_timing.first_upstream_rx_byte_received_ -
*upstream_timing.first_upstream_tx_byte_sent_);

// Record RTT sample in host-attached atomic storage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terminate your comments with a period, please.

namespace LoadBalancingPolicies {
namespace PeakEwma {

double Cost::compute(double rtt_ewma_ms, double active_requests, double default_rtt_ms) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert the params are non-negative.

Comment on lines 10 to 17
: cost_stat_(scope.gaugeFromString("peak_ewma." + host->address()->asString() + ".cost",
Stats::Gauge::ImportMode::NeverImport)),
ewma_rtt_stat_(
scope.gaugeFromString("peak_ewma." + host->address()->asString() + ".ewma_rtt_ms",
Stats::Gauge::ImportMode::NeverImport)),
active_requests_stat_(
scope.gaugeFromString("peak_ewma." + host->address()->asString() + ".active_requests",
Stats::Gauge::ImportMode::NeverImport)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, this will cause a cardinality explosion by including the host addresses in the metric name. If folks use this LB policy without knowing this, it'll potentially bring down their TSDB or cause them to run out of quota with their metrics vendor.

However, this is going into contrib so I'm not going to hold the PR up over it. If you're ok with this, then just be sure to mention it in the docs or make it opt-in.

Comment on lines 32 to 36
void Observability::report(
const absl::flat_hash_map<Upstream::HostConstSharedPtr,
std::unique_ptr<GlobalHostStats>>& /* all_host_stats */) {
// Stats are published during aggregation - this is a placeholder for consistency
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why you need this?

Copy link
Contributor Author

@rroblak rroblak Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. That was vestigial from previous development iterations. I removed it and refactored the code that is used into peak_ewma_lb.cc.

Comment on lines 260 to 261
// Write index wrapped around
return (max_samples - last_processed) + current_write;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this case for if the current_write variable overflows?


// Process all new samples in chronological order
size_t processed_index = last_processed;
for (size_t i = 0; i < num_new_samples; ++i) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you handle num_new_samples being larger than the max samples? I'm either missing something or this loop will process some samples more than once under high load.

Comment on lines +433 to +434
/contrib/peak_ewma/filters/http/ @rroblak @UNOWNED
/contrib/peak_ewma/load_balancing_policies/ @rroblak @UNOWNED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may as well just own /contrib/peak_ewma.

@tonya11en
Copy link
Member

/wait

@tonya11en
Copy link
Member

CI can't run until the DCO check succeeds. I'll approve once it's all green.

@tonya11en
Copy link
Member

/wait

@rroblak rroblak force-pushed the rroblak/peak-ewma-squashed branch from 3814720 to 0378af9 Compare October 20, 2025 18:16
@tonya11en tonya11en enabled auto-merge (squash) October 24, 2025 19:29
@tonya11en tonya11en merged commit 4364cb5 into envoyproxy:main Oct 24, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contrib PRs for contrig

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Peak EWMA load balancing

6 participants