-
Notifications
You must be signed in to change notification settings - Fork 5.5k
support override host status restriction for stateful session #19665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
7285146
67fe321
92cb9e3
8480754
8253bc5
cd6b20d
3ae80f8
bd713a3
a4a9d0a
31bc86b
95c182f
f5366ac
c0b6218
f4d16c6
37f823f
a02a7fc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -54,6 +54,11 @@ enum HealthStatus { | |
| DEGRADED = 5; | ||
| } | ||
|
|
||
| message HealthStatusSet { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you expect this message to have more fields?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is because we need to distinguish between the case where no |
||
| // An order-independent set of health status. | ||
| repeated HealthStatus statuses = 1; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could this be an empty set? If not, then add a PGV constraint.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, it can be an empty set.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That sounds like the case when the override_host_status field is set which is technically different from being set with a zero list. Is there a distinction between override_host_status not being set and it being set with an empty list of statuses?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @snowp Yep. If override_host_status is set with an empty list of statuses then the override host will be ignored. And if override_host_status is not set completely, we will give some default statuses. |
||
| } | ||
|
|
||
| // [#next-free-field: 25] | ||
| message HealthCheck { | ||
| option (udpa.annotations.versioning).previous_message_type = "envoy.api.v2.core.HealthCheck"; | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -24,9 +24,13 @@ static const std::string RuntimeZoneEnabled = "upstream.zone_routing.enabled"; | |
| static const std::string RuntimeMinClusterSize = "upstream.zone_routing.min_cluster_size"; | ||
| static const std::string RuntimePanicThreshold = "upstream.healthy_panic_threshold"; | ||
|
|
||
| static constexpr uint32_t UnhealthyStatus = 1u << static_cast<size_t>(Host::Health::Unhealthy); | ||
| static constexpr uint32_t DegradedStatus = 1u << static_cast<size_t>(Host::Health::Degraded); | ||
| static constexpr uint32_t HealthyStatus = 1u << static_cast<size_t>(Host::Health::Healthy); | ||
| constexpr uint32_t singleHealthStatusToUint(Host::Health status) { | ||
| return 1u << static_cast<size_t>(status); | ||
|
htuch marked this conversation as resolved.
Outdated
|
||
| } | ||
|
|
||
| static constexpr uint32_t DefaultOverrideHostStatus = | ||
| singleHealthStatusToUint(Host::Health::Degraded) | | ||
| singleHealthStatusToUint(Host::Health::Healthy); | ||
|
|
||
| bool tooManyPreconnects(size_t num_preconnect_picks, uint32_t healthy_hosts) { | ||
| // Currently we only allow the number of preconnected connections to equal the | ||
|
|
@@ -118,7 +122,8 @@ LoadBalancerBase::LoadBalancerBase( | |
| : stats_(stats), runtime_(runtime), random_(random), | ||
| default_healthy_panic_percent_(PROTOBUF_PERCENT_TO_ROUNDED_INTEGER_OR_DEFAULT( | ||
| common_config, healthy_panic_threshold, 100, 50)), | ||
| priority_set_(priority_set) { | ||
| priority_set_(priority_set), | ||
| override_host_status_(LoadBalancerContextBase::createOverrideHostStatus(common_config)) { | ||
| for (auto& host_set : priority_set_.hostSetsPerPriority()) { | ||
| recalculatePerPriorityState(host_set->priority(), priority_set_, per_priority_load_, | ||
| per_priority_health_, per_priority_degraded_, total_healthy_hosts_); | ||
|
|
@@ -517,20 +522,37 @@ bool ZoneAwareLoadBalancerBase::earlyExitNonLocalityRouting() { | |
| return false; | ||
| } | ||
|
|
||
| bool LoadBalancerContextBase::validateOverrideHostStatus(Host::Health health, | ||
| OverrideHostStatus status) { | ||
| switch (health) { | ||
| case Host::Health::Unhealthy: | ||
| return status & UnhealthyStatus; | ||
| case Host::Health::Degraded: | ||
| return status & DegradedStatus; | ||
| case Host::Health::Healthy: | ||
| return status & HealthyStatus; | ||
| uint32_t LoadBalancerContextBase::createOverrideHostStatus( | ||
| const envoy::config::cluster::v3::Cluster::CommonLbConfig& common_config) { | ||
| if (!common_config.has_override_host_status()) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens if there is but it is empty?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Then |
||
| return DefaultOverrideHostStatus; | ||
| } | ||
|
|
||
| uint32_t override_host_status = 0; | ||
|
|
||
| for (auto single_status : common_config.override_host_status().statuses()) { | ||
| switch (static_cast<envoy::config::core::v3::HealthStatus>(single_status)) { | ||
| case envoy::config::core::v3::HealthStatus::UNKNOWN: | ||
| case envoy::config::core::v3::HealthStatus::HEALTHY: | ||
| override_host_status |= singleHealthStatusToUint(Host::Health::Healthy); | ||
| break; | ||
| case envoy::config::core::v3::HealthStatus::UNHEALTHY: | ||
| case envoy::config::core::v3::HealthStatus::DRAINING: | ||
| case envoy::config::core::v3::HealthStatus::TIMEOUT: | ||
| override_host_status |= singleHealthStatusToUint(Host::Health::Unhealthy); | ||
| break; | ||
| case envoy::config::core::v3::HealthStatus::DEGRADED: | ||
| override_host_status |= singleHealthStatusToUint(Host::Health::Degraded); | ||
| break; | ||
| default: | ||
| break; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this may hide a case where the server sends a config with some other health status.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. get it. |
||
| } | ||
| } | ||
| return false; | ||
| return override_host_status; | ||
| } | ||
|
|
||
| HostConstSharedPtr LoadBalancerContextBase::selectOverrideHost(const HostMap* host_map, | ||
| uint32_t status, | ||
| LoadBalancerContext* context) { | ||
| if (context == nullptr) { | ||
| return nullptr; | ||
|
|
@@ -545,7 +567,7 @@ HostConstSharedPtr LoadBalancerContextBase::selectOverrideHost(const HostMap* ho | |
| return nullptr; | ||
| } | ||
|
|
||
| auto host_iter = host_map->find(override_host.value().first); | ||
| auto host_iter = host_map->find(override_host.value()); | ||
|
|
||
| // The override host cannot be found in the host map. | ||
| if (host_iter == host_map->end()) { | ||
|
|
@@ -555,17 +577,15 @@ HostConstSharedPtr LoadBalancerContextBase::selectOverrideHost(const HostMap* ho | |
| HostConstSharedPtr host = host_iter->second; | ||
| ASSERT(host != nullptr); | ||
|
|
||
| // Verify the host status. | ||
| if (LoadBalancerContextBase::validateOverrideHostStatus(host->health(), | ||
| override_host.value().second)) { | ||
| if (singleHealthStatusToUint(host->health()) & status) { | ||
| return host; | ||
| } | ||
| return nullptr; | ||
| } | ||
|
|
||
| HostConstSharedPtr ZoneAwareLoadBalancerBase::chooseHost(LoadBalancerContext* context) { | ||
| HostConstSharedPtr host = | ||
| LoadBalancerContextBase::selectOverrideHost(cross_priority_host_map_.get(), context); | ||
| HostConstSharedPtr host = LoadBalancerContextBase::selectOverrideHost( | ||
| cross_priority_host_map_.get(), override_host_status_, context); | ||
| if (host != nullptr) { | ||
| return host; | ||
| } | ||
|
|
@@ -652,8 +672,8 @@ uint32_t ZoneAwareLoadBalancerBase::tryChooseLocalLocalityHosts(const HostSet& h | |
| return random_.random() % number_of_localities; | ||
| } | ||
|
|
||
| // Random sampling to select specific locality for cross locality traffic based on the additional | ||
| // capacity in localities. | ||
| // Random sampling to select specific locality for cross locality traffic based on the | ||
| // additional capacity in localities. | ||
| uint64_t threshold = random_.random() % state.residual_capacity_[number_of_localities - 1]; | ||
|
|
||
| // This potentially can be optimized to be O(log(N)) where N is the number of localities. | ||
|
|
@@ -903,8 +923,8 @@ HostConstSharedPtr EdfLoadBalancerBase::peekAnotherHost(LoadBalancerContext* con | |
|
|
||
| // As has been commented in both EdfLoadBalancerBase::refresh and | ||
| // BaseDynamicClusterImpl::updateDynamicHostList, we must do a runtime pivot here to determine | ||
| // whether to use EDF or do unweighted (fast) selection. EDF is non-null iff the original weights | ||
| // of 2 or more hosts differ. | ||
| // whether to use EDF or do unweighted (fast) selection. EDF is non-null iff the original | ||
| // weights of 2 or more hosts differ. | ||
| if (scheduler.edf_ != nullptr) { | ||
| return scheduler.edf_->peekAgain([this](const Host& host) { return hostWeight(host); }); | ||
| } else { | ||
|
|
@@ -929,8 +949,8 @@ HostConstSharedPtr EdfLoadBalancerBase::chooseHostOnce(LoadBalancerContext* cont | |
|
|
||
| // As has been commented in both EdfLoadBalancerBase::refresh and | ||
| // BaseDynamicClusterImpl::updateDynamicHostList, we must do a runtime pivot here to determine | ||
| // whether to use EDF or do unweighted (fast) selection. EDF is non-null iff the original weights | ||
| // of 2 or more hosts differ. | ||
| // whether to use EDF or do unweighted (fast) selection. EDF is non-null iff the original | ||
| // weights of 2 or more hosts differ. | ||
| if (scheduler.edf_ != nullptr) { | ||
| auto host = scheduler.edf_->pickAndAdd([this](const Host& host) { return hostWeight(host); }); | ||
| return host; | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this field only relevant when the stateful_session extension is used?
If so, can you add this to the comment, and also add a reference to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be referring to the specific extension here at all? Or should we instead have this talk about the host override feature in general? I can imagine other extensions wanting to use host overrides as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the moment, it does. I will add some more comments. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that I do not fully grok is what is expected if a config has this field set, but doesn't have the sticky_session extension configured.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@htuch@snowp This configuration is designed for general host override. ButLoadBalancerContext::overrideHostToSelect()is internaly C++ API which is annomymous for public users and the stateful session is the only related implemenatation for now. Although we will have more extensions in the future, now we still need a specific example to help users to understand this config.how about some comments like this? cc @htuch cc @adisuissa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit lost reading this. I think there is context assumed of the reader that isn't likely to be there in most cases. Can you explain this section as if the reader had never heard of this override feature, pointing them to the relevant docs with RST links and/or some examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The override host feature is currently only available through the L4/L7 extensions. Now stateful session is the only extension that is ready.
So my intention is to use the stateful session as an example to help the reader understand the functionality and the configuration.
I will think about how to make it clearer and easier to understand. 🤔