Skip to content

upstream: add support for setting degraded through LoadAssignment#5649

Merged
htuch merged 10 commits intoenvoyproxy:masterfrom
snowp:degraded-eds
Feb 8, 2019
Merged

upstream: add support for setting degraded through LoadAssignment#5649
htuch merged 10 commits intoenvoyproxy:masterfrom
snowp:degraded-eds

Conversation

@snowp
Copy link
Contributor

@snowp snowp commented Jan 18, 2019

Adds a DEGRADED HealthStatus value that can be set on a host through
LoadAssignment, allowing for a host to be marked degraded without
the need for active health checking.

Moves the mapping of EDS flag to health flag to inside
registerHostForPriority, which means that we're now consistently setting
the EDS health flag for EDS/STATIC/STRICT_DNS/LOGICAL_DNS.

Simplifies the check for whether the health flag value of a host has
changed during EDS updates.

Adds tests for the EDS mapping as well as tests to verify that we're
honoring the EDS flag for non-EDS cluster types.

Signed-off-by: Snow Pettersen snowp@squareup.com

Description:
Risk Level: High, substantial refactoring of how we determine whether health flag has changed.
Testing: UTs coverage for new health flag values.
Docs Changes: n/a
Release Notes: n/a
Fixes #5637
#5063

Snow Pettersen added 2 commits January 17, 2019 22:37
Adds a DEGRADED value that can be set on a host through
EDS/LoadAssignment, allowing for a host to be marked degraded without
the need for active health checking.

Moves the mapping of EDS flag to health flag to inside
registerHostForPriority, which means that we're now consistently setting
the EDS health flag for EDS/STATIC/STRICT_DNS/LOGICAL_DNS.

Simplifies the check for whether the health flag value of a host has
changed during EDS updates.

Adds tests for the EDS mapping as well as tests to verify that we're
honoring the EDS flag for non-EDS cluster types.

Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
lb_endpoint_.metadata(), lb_endpoint_.load_balancing_weight().value(),
locality_lb_endpoint_.locality(), lb_endpoint_.endpoint().health_check_config(),
locality_lb_endpoint_.priority()));
setEdsHealthFlag(*new_hosts.back(), lb_endpoint_.health_status());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary because we create a new HostImpl when we resolve the target, which needs to have the health flags set. Maybe we should be passing the EDS flag to the HostImpl ctor and compute the value there instead of having to call set the flag externally everywhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this would make sense.

@lizan lizan requested a review from dio January 18, 2019 08:27
Copy link
Member

@venilnoronha venilnoronha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just a few comments. Thanks!

// @param existing_host the host to update.
// @param flag the health flag to update.
// @return bool whether the flag update caused the host health to change.
bool updateHealthFlags(const Host& updated_host, Host& existing_host, Host::HealthFlag flag) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/updateHealthFlags/updateHealthFlag ?

EXPECT_EQ(Host::Health::Healthy, hosts[0]->health());
}

const auto rebuild_conter = stats_.counter("cluster.name.update_no_rebuild").value();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/rebuild_conter/rebuild_counter

EXPECT_EQ(Host::Health::Degraded, hosts[0]->health());
}

std::cerr << cluster_->prioritySet().hostSetsPerPriority()[0]->hosts().size() << std::endl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use logger?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left behind debug logging, ill just remove

Signed-off-by: Snow Pettersen <snowp@squareup.com>
venilnoronha
venilnoronha previously approved these changes Jan 23, 2019
Copy link
Member

@venilnoronha venilnoronha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@snowp
Copy link
Contributor Author

snowp commented Jan 25, 2019

@dio Wanna give this a look?

Snow Pettersen added 2 commits January 29, 2019 18:35
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
@snowp
Copy link
Contributor Author

snowp commented Jan 30, 2019

Ping @dio

@snowp
Copy link
Contributor Author

snowp commented Jan 31, 2019

@htuch Wanna give this a look? Talked to @dio and he's not gonna be able to get to it this week

m(DEGRADED_ACTIVE_HC, 0x08)
m(DEGRADED_ACTIVE_HC, 0x08) \
/* The host is currently marked as degraded by EDS. */ \
m(DEGRADED_EDS_HEALTH, 0x10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remind me why we can't just use FAILED_EDS_HEALTH for this use case? AFAICT it fits the bill, and looking at the underlying issue being fixed, it seemed we just needed some plumbing around that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is for degrading an endpoint, not marking it as unhealthy. The new value is necessary to differentiate it here https://github.com/envoyproxy/envoy/pull/5649/files#diff-583237ddb4e16f38ccda2e9affdb0ad8R210 from the host being marked as unhealthy.

Degraded docs if you're not familiar: https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/load_balancing/degraded

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it wasn't super clear, but this PR adds support for marking endpoints as degraded, and while I was in here I also made sure to fix #5637

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that clarifies.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; do you think any integration tests make sense? Not necessary if too much of a pain, but always nice to trust and verify.


void setEdsHealthFlag(Host& host, envoy::api::v2::core::HealthStatus health_status) {
switch (health_status) {
case envoy::api::v2::core::HealthStatus::UNHEALTHY:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe add fall-thru annotations or comments here.

m(DEGRADED_ACTIVE_HC, 0x08)
m(DEGRADED_ACTIVE_HC, 0x08) \
/* The host is currently marked as degraded by EDS. */ \
m(DEGRADED_EDS_HEALTH, 0x10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that clarifies.

Snow Pettersen added 3 commits February 4, 2019 18:28
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Snow Pettersen <snowp@squareup.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo comments.

lb_endpoint_.metadata(), lb_endpoint_.load_balancing_weight().value(),
locality_lb_endpoint_.locality(), lb_endpoint_.endpoint().health_check_config(),
locality_lb_endpoint_.priority()));
setEdsHealthFlag(*new_hosts.back(), lb_endpoint_.health_status());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this would make sense.

EXPECT_FALSE(cluster.info()->addedViaApi());
}

TEST_F(StaticClusterImplTest, LoadAssignmentEdsHealth) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a // above this explaining what it is validating?

Signed-off-by: Snow Pettersen <snowp@squareup.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but CI seems broken.

Host::CreateConnectionData HostImpl::createConnection(
Event::Dispatcher& dispatcher, const Network::ConnectionSocket::OptionsSharedPtr& options,
Network::TransportSocketOptionsSharedPtr transport_socket_options) const {
return {createConnection(dispatcher, *cluster_, address_, options, transport_socket_options),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: why this move?

Signed-off-by: Snow Pettersen <snowp@squareup.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@htuch htuch merged commit 8c6bf40 into envoyproxy:master Feb 8, 2019
fredlas pushed a commit to fredlas/envoy that referenced this pull request Mar 5, 2019
…voyproxy#5649)

Adds a DEGRADED HealthStatus value that can be set on a host through
LoadAssignment, allowing for a host to be marked degraded without
the need for active health checking.

Moves the mapping of EDS flag to health flag to inside
`registerHostForPriority`, which means that we're now consistently setting
the EDS health flag for EDS/STATIC/STRICT_DNS/LOGICAL_DNS.

Simplifies the check for whether the health flag value of a host has
changed during EDS updates.

Adds tests for the EDS mapping as well as tests to verify that we're
honoring the EDS flag for non-EDS cluster types.

Risk Level: High, substantial refactoring of how we determine whether health flag has changed.
Testing: UTs coverage for new health flag values.
Docs Changes: n/a
Release Notes: n/a

Fixes envoyproxy#5637
envoyproxy#5063

Signed-off-by: Snow Pettersen <snowp@squareup.com>
Signed-off-by: Fred Douglas <fredlas@google.com>
alyssawilk pushed a commit that referenced this pull request Apr 8, 2021
…15649) (#15881)

Make the failover timeout a constructor argument of ConnectionGrid (#5649)

Commit Message: make the failover timeout a constructor argument of ConnectionGrid
Additional Description:
Risk Level: low
Testing: unit test ConnectivityGridTest.TimeoutThenSuccessParallelSecondConnects
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features:

Signed-off-by: Ryan Hamilton <rch@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants