upstream: allow excluding hosts from lb calculations until initial health check#6794
upstream: allow excluding hosts from lb calculations until initial health check#6794snowp merged 30 commits intoenvoyproxy:masterfrom
Conversation
…sult Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
|
This isn't quite done (missing quite a few tests) but putting it up now for some early feedback. The general approach is very similar to how degraded was added: a new set of information both for all hosts and per locality is plumbed through in update hosts so that priority load and locality load calculations can use these to adjust the weight. |
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
source/common/upstream/subset_lb.cc
Outdated
| degraded_hosts, degraded_hosts_per_locality), | ||
| determineLocalityWeights(*hosts_per_locality), filtered_added, | ||
| filtered_removed); | ||
| // TODO(snowp): Right now we just pass hosts->size(), really this need to be filtering down the |
There was a problem hiding this comment.
Turns out just passing the count around won't work since the subset lb will need a list of warmed hosts to be able to tell how many of them belong to the subset. I'll update this to use another list of hosts
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
|
Alright this is now using a new host vector + new host per locality in order to be able to apply the weight adjustment to both locality weighting as well as panic/spillover. There's an unfortunate complexity here which requires O(hosts) space for each of these weight adjustments (warmed hosts, degraded hosts): not only do we need to know how many hosts are warmed, we also need to keep track of which hosts are in this state for each update so that the subset lb can subset this list on the worker threads. Open for other ideas how to accomplish this, but I'm not sure how else to do it without large refactors (e.g. if the subsetting happened on the main thread we only need to retain counts, not lists of what hosts are warmed). If these increases in memory usage for the cluster is concerning we could do some work to have both warmed and degraded lists use static empty singletons instead, which should help a bit. This is still missing quite a few tests; I'll add those in if we're happy with this approach. |
|
@snowp we want to land this at Lyft sooner rather than later so I will take a look some time this weekend and get back to you with some high level feedback? |
|
Works for me, I'll be around this weekend to respond to feedback |
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
|
Pushed a change that will hopefully address the memory usage: instead of tracking which hosts should be included we track the excluded ones. This should reduce the increase in memory usage per host set to just a few bytes (empty vector, empty hosts per locality) when this feature is not enabled and when all hosts have been health checked. |
Signed-off-by: Snow Pettersen <snowp@squareup.com>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks this is great. Some initial comments to get started. Also, WDYT about adding an integration test a la the one I just added in #6813? Hopefully using that as a guide will make it go much faster for you than it did for me. :)
/wait
api/envoy/api/v2/cds.proto
Outdated
| // | ||
| // Ignoring a host means that for any load balancing calculations that adjust weights based | ||
| // on the ratio of eligible hosts and total hosts (priority spillover, locality weighting, etc.) | ||
| // will exclude these hosts in the denominator. |
api/envoy/api/v2/cds.proto
Outdated
| // active health checking is also configured. | ||
| // | ||
| // Ignoring a host means that for any load balancing calculations that adjust weights based | ||
| // on the ratio of eligible hosts and total hosts (priority spillover, locality weighting, etc.) |
There was a problem hiding this comment.
maybe put panic mode in the parenthesis here also?
include/envoy/upstream/upstream.h
Outdated
| */ | ||
| virtual void used(bool new_used) PURE; | ||
|
|
||
| virtual bool warmed() const PURE; |
include/envoy/upstream/upstream.h
Outdated
| /* | ||
| * @return all excluded hosts contained in the set at the current time. Excluded hosts should be | ||
| * ignored when computing load balancing weights, but may overlap with hosts in hosts(). | ||
| * */ |
| host_set->healthyHostsPerLocality().clone(); | ||
| HostsPerLocalityConstSharedPtr degraded_hosts_per_locality_copy = | ||
| host_set->degradedHostsPerLocality().clone(); | ||
| ExcludedHostVectorConstSharedPtr excluded_hosts_copy( |
There was a problem hiding this comment.
Worth doing the @htuch TODO up above here soon in a follow up? This is continues to grow scarier and scarier. :)
Avoiding the copies here would I think make this function less painful to read and avoid asking for some type of param struct like we have done elsewhere?
There was a problem hiding this comment.
Wouldn't you still need to do pull out a shared ptr for each of the values and pass them along to the lambda to ensure that we're using consistent values? If we just pass along the host set I think it's possible for the TLS updates to happen concurrently with a host update on the main thread and result in potentially inconsistent values?
I think this whole thing could be simplified pretty easily by reusing the UpdateHostsParams param struct that's already being used elsewhere to call updateHosts, which would be helped by exposing shared ptrs to the underlying typed arrays. I'll give that a go in this PR and we can decide whether it makes sense to split it out as a different PR
There was a problem hiding this comment.
I think this whole thing could be simplified pretty easily by reusing the UpdateHostsParams param struct that's already being used elsewhere to call updateHosts, which would be helped by exposing shared ptrs to the underlying typed arrays. I'll give that a go in this PR and we can decide whether it makes sense to split it out as a different PR
Sorry yeah this is roughly what I meant. I think we can avoid the copies vector and also simplify the copying?
| std::move(healthy_hosts), std::move(healthy_hosts_per_locality), | ||
| std::make_shared<const DegradedHostVector>(), | ||
| HostsPerLocalityImpl::empty()); | ||
| // TODO(snowp): Move this function into test/ |
| healthy_hosts += host_set->healthyHosts().size(); | ||
| degraded_hosts += host_set->degradedHosts().size(); | ||
| } | ||
| // TODO(snowp): Stats for excluded hosts? |
| bool used() const override { return used_; } | ||
| void used(bool new_used) override { used_ = new_used; } | ||
| bool warmed() const override { | ||
| if (cluster_ != nullptr && cluster_->warmHosts()) { |
There was a problem hiding this comment.
Why bother checking this vs. just check the flag directly out of curiosity?
There was a problem hiding this comment.
As written the flag is always set, even if the config flag is not set. I guess the alternative solution is to have the health checker check the flag to determine whether the flag should be set or not and then just read the flag. I think that sounds a bit cleaner (moves the complexity out Host), so I'll try to give that a go
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
|
Current failures are related to memory usage: |
mattklein123
left a comment
There was a problem hiding this comment.
Nice, this looks awesome, thanks for the integration test. I would go ahead and bump the stats size test. I think this is an important stat to add. cc @jmarantz
/wait
| // it to healthy. This makes startup faster with a small reduction in overall reliability | ||
| // depending on the HC settings. | ||
| if (first_check_ || ++num_healthy_ == parent_.healthy_threshold_) { | ||
| host_->healthFlagClear(Host::HealthFlag::PENDING_ACTIVE_HC); |
|
Bumping the memory expectations in the stats integration test LGTM. |
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
mattklein123
left a comment
There was a problem hiding this comment.
Nice, looks great. One nit.
/wait
| } | ||
| } | ||
|
|
||
| // Clear the pending flag if it is set. By removing this we're marking the host as having been |
There was a problem hiding this comment.
nit: maybe put this logic into a helper function that can be shared here and below? This would keep the comments and reasoning in one place? WDYT?
Signed-off-by: Snow Pettersen <snowp@squareup.com>
|
FYI @htuch: this PR is similar to the work I did for degraded, just keeping you in the loop I'll merge this today unless we feel like this needs more reviews? |
|
@snowp LGTM, thanks for the heads up. My only comment is I continue to be somewhat amazed by the growth of complexity in our endpoint management; I think a lot of this is inherent due to the cool features we're adding, but I do wonder whether we can make some architectural simplifications. CC @antoniovicente who might be interested in making some contributions in this area of the code. |
|
Thank you so much, @snowp for this change! We are excited to be using this feature. |
* master: (88 commits) upstream: Null-deref on TCP health checker if setsockopt fails (envoyproxy#6793) ci: switch macOS CI to azure pipelines (envoyproxy#6889) os syscalls lib: break apart syscalls used for hot restart (envoyproxy#6880) Kafka codec: precompute request size before serialization, so we do n… (envoyproxy#6862) upstream: move static and strict_dns clusters to dedicated files (envoyproxy#6886) Rollforward of api: Add total_issued_requests to Upstream Locality and Endpoint Stats. (envoyproxy#6692) (envoyproxy#6784) fix explicit constructor in copy-initialization (envoyproxy#6884) stats: use tag iterator rather than constructing the tag-array and searching that. (envoyproxy#6853) common: use unscoped build target in generate_version_linkstamp (envoyproxy#6877) Addendum to envoyproxy#6778 (envoyproxy#6882) ci: add minimum Linux build for Azure Pipelines (envoyproxy#6881) grpc: utilities for inter-converting grpc::ByteBuffer and Buffer::Instance. (envoyproxy#6732) upstream: allow excluding hosts from lb calculations until initial health check (envoyproxy#6794) stats: prevent unused counters from leaking across hot restart (envoyproxy#6850) network filters: add `injectDataToFilterChain(data, end_stream)` method to network filter callbacks (envoyproxy#6750) delete things that snuck back in (envoyproxy#6873) config: scoped rds (2b): support delta APIs in ConfigProvider framework (envoyproxy#6781) string == string! (envoyproxy#6868) config: add mssing imports to delta_subscription_state (envoyproxy#6869) protobuf: add missing default case to enum (envoyproxy#6870) ... Signed-off-by: Michael Puncel <mpuncel@squareup.com>
This adds an option to allow hosts to be excluded in lb calculations until they have been health checked
for the first time. This will make it possible to scale up the number of hosts quickly (ie large increase
relative to current host set size) without triggering panic mode/spillover (as long as the initial health check
is succeeds).
While these hosts are excluded from the lb calculations, they are still eligible for routing when panic
mode is triggered.
Risk Level: Medium, optional feature but touches quite a bit of code
Testing: UTs
Docs Changes: Proto docs
Release Notes: n/a for now
Fixes #6653