-
Notifications
You must be signed in to change notification settings - Fork 5.5k
upstream: rebuild cluster when health check config is changed #4075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 6 commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
73d654a
health check: update address when needed
dio 9353976
Merge remote-tracking branch 'upstream/master' into update-health-che…
dio 1fb3882
Rebuild when health check config changes
dio 8e257c1
Revert adding mutable health check address
dio 10446cc
Add test for post config update cluster rebuild
dio 96bc0c2
Add tests for !drain_connections_on_host_removal
dio fac7657
Merge remote-tracking branch 'upstream/master'
dio 965a32c
Merge remote-tracking branch 'upstream/master'
dio 99943d7
Bring back if health_check_changed check
dio 110d339
Remove unused lines
dio 458d53e
Remove bad merge lines
dio f87cf0e
Update comment
dio a940f15
review: rename and add more comments
dio 72f31f4
review: refactor eds test
dio 038d114
Remove newlines
dio f3abe36
Merge remote-tracking branch 'upstream/master'
dio 7d7aea0
Rename some functions and vars
dio 0f8d6f9
Kick CI
dio 295a8cb
Kick CI
dio 9dd4941
Use 'skip' instead of 'no'?
dio 274f9da
review: updating -> update
dio File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1059,6 +1059,214 @@ TEST_F(EdsTest, PriorityAndLocalityWeighted) { | |
| EXPECT_EQ(1UL, stats_.counter("cluster.name.update_no_rebuild").value()); | ||
| } | ||
|
|
||
| TEST_F(EdsTest, EndpointUpdateHealthCheckConfig) { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Arguably, you could subclass the test fixture |
||
| resetCluster(R"EOF( | ||
| name: name | ||
| connect_timeout: 0.25s | ||
| type: EDS | ||
| lb_policy: ROUND_ROBIN | ||
| eds_cluster_config: | ||
| service_name: fare | ||
| eds_config: | ||
| api_config_source: | ||
| cluster_names: | ||
| - eds | ||
| refresh_delay: 1s | ||
| )EOF"); | ||
|
|
||
| auto health_checker = std::make_shared<MockHealthChecker>(); | ||
| EXPECT_CALL(*health_checker, start()); | ||
| EXPECT_CALL(*health_checker, addHostCheckCompleteCb(_)).Times(2); | ||
| cluster_->setHealthChecker(health_checker); | ||
|
|
||
| Protobuf::RepeatedPtrField<envoy::api::v2::ClusterLoadAssignment> resources; | ||
| auto* cluster_load_assignment = resources.Add(); | ||
| cluster_load_assignment->set_cluster_name("fare"); | ||
|
|
||
| auto add_endpoint = [cluster_load_assignment](int port) { | ||
| auto* endpoints = cluster_load_assignment->add_endpoints(); | ||
|
|
||
| auto* socket_address = endpoints->add_lb_endpoints() | ||
| ->mutable_endpoint() | ||
| ->mutable_address() | ||
| ->mutable_socket_address(); | ||
| socket_address->set_address("1.2.3.4"); | ||
| socket_address->set_port_value(port); | ||
| }; | ||
|
|
||
| auto update_health_check_port = [cluster_load_assignment](const uint32_t index, | ||
| const uint32_t port) { | ||
| cluster_load_assignment->mutable_endpoints(index) | ||
| ->mutable_lb_endpoints(0) | ||
| ->mutable_endpoint() | ||
| ->mutable_health_check_config() | ||
| ->set_port_value(port); | ||
| }; | ||
|
|
||
| add_endpoint(80); | ||
| add_endpoint(81); | ||
|
|
||
| VERBOSE_EXPECT_NO_THROW(cluster_->onConfigUpdate(resources, "")); | ||
| // Make sure the custer is rebuilt. | ||
| EXPECT_EQ(0UL, stats_.counter("cluster.name.update_no_rebuild").value()); | ||
|
|
||
| { | ||
| auto& hosts = cluster_->prioritySet().hostSetsPerPriority()[0]->hosts(); | ||
| EXPECT_EQ(hosts.size(), 2); | ||
|
|
||
| EXPECT_TRUE(hosts[0]->healthFlagGet(Host::HealthFlag::FAILED_ACTIVE_HC)); | ||
| EXPECT_TRUE(hosts[1]->healthFlagGet(Host::HealthFlag::FAILED_ACTIVE_HC)); | ||
|
|
||
| // Mark the hosts as healthy | ||
| hosts[0]->healthFlagClear(Host::HealthFlag::FAILED_ACTIVE_HC); | ||
| hosts[1]->healthFlagClear(Host::HealthFlag::FAILED_ACTIVE_HC); | ||
| } | ||
|
|
||
| const uint32_t new_health_check_port = 8000; | ||
| update_health_check_port(0, new_health_check_port); | ||
|
|
||
| VERBOSE_EXPECT_NO_THROW(cluster_->onConfigUpdate(resources, "")); | ||
| EXPECT_EQ(0UL, stats_.counter("cluster.name.update_no_rebuild").value()); | ||
|
|
||
| { | ||
| auto& hosts = cluster_->prioritySet().hostSetsPerPriority()[0]->hosts(); | ||
| EXPECT_EQ(hosts.size(), 3); | ||
| // Make sure the first endpoint health check port is updated. | ||
| EXPECT_EQ(new_health_check_port, hosts[0]->healthCheckAddress()->ip()->port()); | ||
|
|
||
| EXPECT_NE(new_health_check_port, hosts[1]->healthCheckAddress()->ip()->port()); | ||
| EXPECT_NE(new_health_check_port, hosts[2]->healthCheckAddress()->ip()->port()); | ||
| EXPECT_EQ(81, hosts[1]->healthCheckAddress()->ip()->port()); | ||
| EXPECT_EQ(80, hosts[2]->healthCheckAddress()->ip()->port()); | ||
|
|
||
| EXPECT_TRUE(hosts[0]->healthFlagGet(Host::HealthFlag::FAILED_ACTIVE_HC)); | ||
|
|
||
| // The old hosts still active. The health checker continues to do health checking to these | ||
| // hosts, until they are removed. | ||
| EXPECT_FALSE(hosts[1]->healthFlagGet(Host::HealthFlag::FAILED_ACTIVE_HC)); | ||
| EXPECT_FALSE(hosts[2]->healthFlagGet(Host::HealthFlag::FAILED_ACTIVE_HC)); | ||
| } | ||
|
|
||
| update_health_check_port(1, new_health_check_port); | ||
|
|
||
| VERBOSE_EXPECT_NO_THROW(cluster_->onConfigUpdate(resources, "")); | ||
| EXPECT_EQ(0UL, stats_.counter("cluster.name.update_no_rebuild").value()); | ||
|
|
||
| { | ||
| auto& hosts = cluster_->prioritySet().hostSetsPerPriority()[0]->hosts(); | ||
| EXPECT_EQ(hosts.size(), 4); | ||
| EXPECT_EQ(new_health_check_port, hosts[0]->healthCheckAddress()->ip()->port()); | ||
|
|
||
| // Make sure the second endpoint health check port is updated. | ||
| EXPECT_EQ(new_health_check_port, hosts[1]->healthCheckAddress()->ip()->port()); | ||
|
|
||
| EXPECT_EQ(81, hosts[2]->healthCheckAddress()->ip()->port()); | ||
| EXPECT_EQ(80, hosts[3]->healthCheckAddress()->ip()->port()); | ||
|
|
||
| EXPECT_TRUE(hosts[0]->healthFlagGet(Host::HealthFlag::FAILED_ACTIVE_HC)); | ||
| EXPECT_TRUE(hosts[1]->healthFlagGet(Host::HealthFlag::FAILED_ACTIVE_HC)); | ||
|
|
||
| // The old hosts still active. | ||
| EXPECT_FALSE(hosts[2]->healthFlagGet(Host::HealthFlag::FAILED_ACTIVE_HC)); | ||
| EXPECT_FALSE(hosts[3]->healthFlagGet(Host::HealthFlag::FAILED_ACTIVE_HC)); | ||
| } | ||
| } | ||
|
|
||
| TEST_F(EdsTest, EndpointUpdateHealthCheckConfigWithDrainConnectionsOnRemoval) { | ||
| resetCluster(R"EOF( | ||
| name: name | ||
| connect_timeout: 0.25s | ||
| type: EDS | ||
| lb_policy: ROUND_ROBIN | ||
| drain_connections_on_host_removal: true | ||
| eds_cluster_config: | ||
| service_name: fare | ||
| eds_config: | ||
| api_config_source: | ||
| cluster_names: | ||
| - eds | ||
| refresh_delay: 1s | ||
| )EOF"); | ||
|
|
||
| auto health_checker = std::make_shared<MockHealthChecker>(); | ||
| EXPECT_CALL(*health_checker, start()); | ||
| EXPECT_CALL(*health_checker, addHostCheckCompleteCb(_)).Times(2); | ||
| cluster_->setHealthChecker(health_checker); | ||
|
|
||
| Protobuf::RepeatedPtrField<envoy::api::v2::ClusterLoadAssignment> resources; | ||
| auto* cluster_load_assignment = resources.Add(); | ||
| cluster_load_assignment->set_cluster_name("fare"); | ||
|
|
||
| auto add_endpoint = [cluster_load_assignment](int port) { | ||
| auto* endpoints = cluster_load_assignment->add_endpoints(); | ||
|
|
||
| auto* socket_address = endpoints->add_lb_endpoints() | ||
| ->mutable_endpoint() | ||
| ->mutable_address() | ||
| ->mutable_socket_address(); | ||
| socket_address->set_address("1.2.3.4"); | ||
| socket_address->set_port_value(port); | ||
| }; | ||
|
|
||
| auto update_health_check_port = [cluster_load_assignment](const uint32_t index, | ||
| const uint32_t port) { | ||
| cluster_load_assignment->mutable_endpoints(index) | ||
| ->mutable_lb_endpoints(0) | ||
| ->mutable_endpoint() | ||
| ->mutable_health_check_config() | ||
| ->set_port_value(port); | ||
| }; | ||
|
|
||
| add_endpoint(80); | ||
| add_endpoint(81); | ||
|
|
||
| VERBOSE_EXPECT_NO_THROW(cluster_->onConfigUpdate(resources, "")); | ||
| // Make sure the custer is rebuilt. | ||
| EXPECT_EQ(0UL, stats_.counter("cluster.name.update_no_rebuild").value()); | ||
|
|
||
| { | ||
| auto& hosts = cluster_->prioritySet().hostSetsPerPriority()[0]->hosts(); | ||
| EXPECT_EQ(hosts.size(), 2); | ||
|
|
||
| EXPECT_TRUE(hosts[0]->healthFlagGet(Host::HealthFlag::FAILED_ACTIVE_HC)); | ||
| EXPECT_TRUE(hosts[1]->healthFlagGet(Host::HealthFlag::FAILED_ACTIVE_HC)); | ||
| // Mark the hosts as healthy | ||
| hosts[0]->healthFlagClear(Host::HealthFlag::FAILED_ACTIVE_HC); | ||
| hosts[1]->healthFlagClear(Host::HealthFlag::FAILED_ACTIVE_HC); | ||
| } | ||
|
|
||
| const uint32_t new_health_check_port = 8000; | ||
| update_health_check_port(0, new_health_check_port); | ||
|
|
||
| VERBOSE_EXPECT_NO_THROW(cluster_->onConfigUpdate(resources, "")); | ||
| EXPECT_EQ(0UL, stats_.counter("cluster.name.update_no_rebuild").value()); | ||
|
|
||
| { | ||
| auto& hosts = cluster_->prioritySet().hostSetsPerPriority()[0]->hosts(); | ||
| // Since drain_connections_on_host_removal is set to true, the old hosts are removed | ||
| // immediately. | ||
| EXPECT_EQ(hosts.size(), 2); | ||
| // Make sure the first endpoint health check port is updated. | ||
| EXPECT_EQ(new_health_check_port, hosts[0]->healthCheckAddress()->ip()->port()); | ||
|
|
||
| EXPECT_NE(new_health_check_port, hosts[1]->healthCheckAddress()->ip()->port()); | ||
| } | ||
|
|
||
| update_health_check_port(1, new_health_check_port); | ||
|
|
||
| VERBOSE_EXPECT_NO_THROW(cluster_->onConfigUpdate(resources, "")); | ||
| EXPECT_EQ(0UL, stats_.counter("cluster.name.update_no_rebuild").value()); | ||
|
|
||
| { | ||
| auto& hosts = cluster_->prioritySet().hostSetsPerPriority()[0]->hosts(); | ||
| EXPECT_EQ(hosts.size(), 2); | ||
| EXPECT_EQ(new_health_check_port, hosts[0]->healthCheckAddress()->ip()->port()); | ||
|
|
||
| // Make sure the second endpoint health check port is updated. | ||
| EXPECT_EQ(new_health_check_port, hosts[1]->healthCheckAddress()->ip()->port()); | ||
| } | ||
| } | ||
|
|
||
| // Throw on adding a new resource with an invalid endpoint (since the given address is invalid). | ||
| TEST_F(EdsTest, MalformedIP) { | ||
| Protobuf::RepeatedPtrField<envoy::api::v2::ClusterLoadAssignment> resources; | ||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not your fault, but this entire method is completely unreadable to me; it's hard to keep track of what
current_hostsrepresent, which hosts are to be removed, under which conditions we're going to do a rebuild, inplace modify or remove/add. Do you think there's a way to explain to the layman how this works? Each time I reread this method, by head hurts more and more (and I'm partly to blame for sure).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree. This has been an iterative process over time and I agree it's now not very readable and it hurts my head also. I think it would be worth it to have a lot more comments in here, possibly variable name changes, and maybe even functions broken out. @dio since you are in here now do you mind taking a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, love to have a look 🙂. Will spend some time dive into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems there is a quite intensive refactoring in here: #3959