upstream: add support for setting degraded through LoadAssignment#5649
upstream: add support for setting degraded through LoadAssignment#5649htuch merged 10 commits intoenvoyproxy:masterfrom
Conversation
Adds a DEGRADED value that can be set on a host through EDS/LoadAssignment, allowing for a host to be marked degraded without the need for active health checking. Moves the mapping of EDS flag to health flag to inside registerHostForPriority, which means that we're now consistently setting the EDS health flag for EDS/STATIC/STRICT_DNS/LOGICAL_DNS. Simplifies the check for whether the health flag value of a host has changed during EDS updates. Adds tests for the EDS mapping as well as tests to verify that we're honoring the EDS flag for non-EDS cluster types. Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
| lb_endpoint_.metadata(), lb_endpoint_.load_balancing_weight().value(), | ||
| locality_lb_endpoint_.locality(), lb_endpoint_.endpoint().health_check_config(), | ||
| locality_lb_endpoint_.priority())); | ||
| setEdsHealthFlag(*new_hosts.back(), lb_endpoint_.health_status()); |
There was a problem hiding this comment.
This is necessary because we create a new HostImpl when we resolve the target, which needs to have the health flags set. Maybe we should be passing the EDS flag to the HostImpl ctor and compute the value there instead of having to call set the flag externally everywhere?
There was a problem hiding this comment.
Yeah, I think this would make sense.
venilnoronha
left a comment
There was a problem hiding this comment.
Looks good. Just a few comments. Thanks!
| // @param existing_host the host to update. | ||
| // @param flag the health flag to update. | ||
| // @return bool whether the flag update caused the host health to change. | ||
| bool updateHealthFlags(const Host& updated_host, Host& existing_host, Host::HealthFlag flag) { |
There was a problem hiding this comment.
nit: s/updateHealthFlags/updateHealthFlag ?
test/common/upstream/eds_test.cc
Outdated
| EXPECT_EQ(Host::Health::Healthy, hosts[0]->health()); | ||
| } | ||
|
|
||
| const auto rebuild_conter = stats_.counter("cluster.name.update_no_rebuild").value(); |
There was a problem hiding this comment.
s/rebuild_conter/rebuild_counter
test/common/upstream/eds_test.cc
Outdated
| EXPECT_EQ(Host::Health::Degraded, hosts[0]->health()); | ||
| } | ||
|
|
||
| std::cerr << cluster_->prioritySet().hostSetsPerPriority()[0]->hosts().size() << std::endl; |
There was a problem hiding this comment.
left behind debug logging, ill just remove
Signed-off-by: Snow Pettersen <snowp@squareup.com>
|
@dio Wanna give this a look? |
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
|
Ping @dio |
| m(DEGRADED_ACTIVE_HC, 0x08) | ||
| m(DEGRADED_ACTIVE_HC, 0x08) \ | ||
| /* The host is currently marked as degraded by EDS. */ \ | ||
| m(DEGRADED_EDS_HEALTH, 0x10) |
There was a problem hiding this comment.
Can you remind me why we can't just use FAILED_EDS_HEALTH for this use case? AFAICT it fits the bill, and looking at the underlying issue being fixed, it seemed we just needed some plumbing around that.
There was a problem hiding this comment.
Because this is for degrading an endpoint, not marking it as unhealthy. The new value is necessary to differentiate it here https://github.com/envoyproxy/envoy/pull/5649/files#diff-583237ddb4e16f38ccda2e9affdb0ad8R210 from the host being marked as unhealthy.
Degraded docs if you're not familiar: https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/load_balancing/degraded
There was a problem hiding this comment.
Maybe it wasn't super clear, but this PR adds support for marking endpoints as degraded, and while I was in here I also made sure to fix #5637
htuch
left a comment
There was a problem hiding this comment.
LGTM; do you think any integration tests make sense? Not necessary if too much of a pain, but always nice to trust and verify.
|
|
||
| void setEdsHealthFlag(Host& host, envoy::api::v2::core::HealthStatus health_status) { | ||
| switch (health_status) { | ||
| case envoy::api::v2::core::HealthStatus::UNHEALTHY: |
There was a problem hiding this comment.
Nit: Maybe add fall-thru annotations or comments here.
| m(DEGRADED_ACTIVE_HC, 0x08) | ||
| m(DEGRADED_ACTIVE_HC, 0x08) \ | ||
| /* The host is currently marked as degraded by EDS. */ \ | ||
| m(DEGRADED_EDS_HEALTH, 0x10) |
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
| lb_endpoint_.metadata(), lb_endpoint_.load_balancing_weight().value(), | ||
| locality_lb_endpoint_.locality(), lb_endpoint_.endpoint().health_check_config(), | ||
| locality_lb_endpoint_.priority())); | ||
| setEdsHealthFlag(*new_hosts.back(), lb_endpoint_.health_status()); |
There was a problem hiding this comment.
Yeah, I think this would make sense.
| EXPECT_FALSE(cluster.info()->addedViaApi()); | ||
| } | ||
|
|
||
| TEST_F(StaticClusterImplTest, LoadAssignmentEdsHealth) { |
There was a problem hiding this comment.
Can you add a // above this explaining what it is validating?
Signed-off-by: Snow Pettersen <snowp@squareup.com>
| Host::CreateConnectionData HostImpl::createConnection( | ||
| Event::Dispatcher& dispatcher, const Network::ConnectionSocket::OptionsSharedPtr& options, | ||
| Network::TransportSocketOptionsSharedPtr transport_socket_options) const { | ||
| return {createConnection(dispatcher, *cluster_, address_, options, transport_socket_options), |
Signed-off-by: Snow Pettersen <snowp@squareup.com>
…voyproxy#5649) Adds a DEGRADED HealthStatus value that can be set on a host through LoadAssignment, allowing for a host to be marked degraded without the need for active health checking. Moves the mapping of EDS flag to health flag to inside `registerHostForPriority`, which means that we're now consistently setting the EDS health flag for EDS/STATIC/STRICT_DNS/LOGICAL_DNS. Simplifies the check for whether the health flag value of a host has changed during EDS updates. Adds tests for the EDS mapping as well as tests to verify that we're honoring the EDS flag for non-EDS cluster types. Risk Level: High, substantial refactoring of how we determine whether health flag has changed. Testing: UTs coverage for new health flag values. Docs Changes: n/a Release Notes: n/a Fixes envoyproxy#5637 envoyproxy#5063 Signed-off-by: Snow Pettersen <snowp@squareup.com> Signed-off-by: Fred Douglas <fredlas@google.com>
…15649) (#15881) Make the failover timeout a constructor argument of ConnectionGrid (#5649) Commit Message: make the failover timeout a constructor argument of ConnectionGrid Additional Description: Risk Level: low Testing: unit test ConnectivityGridTest.TimeoutThenSuccessParallelSecondConnects Docs Changes: N/A Release Notes: N/A Platform Specific Features: Signed-off-by: Ryan Hamilton <rch@google.com>
Adds a DEGRADED HealthStatus value that can be set on a host through
LoadAssignment, allowing for a host to be marked degraded without
the need for active health checking.
Moves the mapping of EDS flag to health flag to inside
registerHostForPriority, which means that we're now consistently settingthe EDS health flag for EDS/STATIC/STRICT_DNS/LOGICAL_DNS.
Simplifies the check for whether the health flag value of a host has
changed during EDS updates.
Adds tests for the EDS mapping as well as tests to verify that we're
honoring the EDS flag for non-EDS cluster types.
Signed-off-by: Snow Pettersen snowp@squareup.com
Description:
Risk Level: High, substantial refactoring of how we determine whether health flag has changed.
Testing: UTs coverage for new health flag values.
Docs Changes: n/a
Release Notes: n/a
Fixes #5637
#5063