-
Notifications
You must be signed in to change notification settings - Fork 5.4k
healthcheck filter: compute response based on upstream cluster health #2387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
9b7eb01
76da1ac
ed13504
0985b13
0235233
d665fd8
ca89315
8a06a78
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -43,10 +43,19 @@ HealthCheckFilterConfig::createFilter(const envoy::api::v2::filter::http::Health | |
| std::chrono::milliseconds(cache_time_ms))); | ||
| } | ||
|
|
||
| return [&context, pass_through_mode, cache_manager, | ||
| hc_endpoint](Http::FilterChainFactoryCallbacks& callbacks) -> void { | ||
| callbacks.addStreamFilter(Http::StreamFilterSharedPtr{ | ||
| new HealthCheckFilter(context, pass_through_mode, cache_manager, hc_endpoint)}); | ||
| ClusterMinHealthyPercentagesSharedPtr cluster_min_healthy_percentages; | ||
| if (!pass_through_mode && !proto_config.cluster_min_healthy_percentages().empty()) { | ||
| auto* cluster_to_percentage = new ClusterMinHealthyPercentages(); | ||
| for (const auto& item : proto_config.cluster_min_healthy_percentages()) { | ||
| cluster_to_percentage->emplace(std::make_pair(item.first, item.second.value())); | ||
| } | ||
| cluster_min_healthy_percentages.reset(cluster_to_percentage); | ||
| } | ||
|
|
||
| return [&context, pass_through_mode, cache_manager, hc_endpoint, | ||
| cluster_min_healthy_percentages](Http::FilterChainFactoryCallbacks& callbacks) -> void { | ||
| callbacks.addStreamFilter(Http::StreamFilterSharedPtr{new HealthCheckFilter( | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: not your code but can you switch to std::make_shared? |
||
| context, pass_through_mode, cache_manager, hc_endpoint, cluster_min_healthy_percentages)}); | ||
| }; | ||
| } | ||
|
|
||
|
|
@@ -155,6 +164,32 @@ void HealthCheckFilter::onComplete() { | |
| Http::Code final_status = Http::Code::OK; | ||
| if (cache_manager_) { | ||
| final_status = cache_manager_->getCachedResponseCode(); | ||
| } else if (cluster_min_healthy_percentages_ != nullptr && | ||
| !cluster_min_healthy_percentages_->empty()) { | ||
| const auto clusters(context_.clusterManager().clusters()); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unfortunately this is not safe. |
||
| for (const auto& item : *cluster_min_healthy_percentages_) { | ||
| const std::string& cluster_name = item.first; | ||
| const double min_healthy_percentage = item.second; | ||
| auto match = clusters.find(cluster_name); | ||
| if (match == clusters.end()) { | ||
| final_status = Http::Code::ServiceUnavailable; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add some comments here? I guess I see how the lack of the cluster at all is sufficient ground for failure, though I wonder if there should be a stat. At the very least I would probably add a small comment. |
||
| break; | ||
| } | ||
| const auto& stats = match->second.get().info()->stats(); | ||
| const uint64_t membership_total = stats.membership_total_.value(); | ||
| if (membership_total == 0) { | ||
| if (min_healthy_percentage == 0.0) { | ||
| continue; | ||
| } else { | ||
| final_status = Http::Code::ServiceUnavailable; | ||
| break; | ||
| } | ||
| } | ||
| if (100.0 * stats.membership_healthy_.value() < membership_total * min_healthy_percentage) { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there an advantage to writing it this way? I think it would be clearer if it were
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did it that way to avoid a division operation. But, now that you mention it, g++ might implement the division as multiplication by the reciprocal. I know it does that optimization for division by an integer constant. I'll see if it does the same thing for division by a floating point constant.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After some testing and reading, I found that:
So I'll just go with
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You could remove the division from the health check request path by moving it to HealthCheckFilterConfig::createFilter. That is, store the value in the range That said, I don't think performance is critical here, and the current version lgtm.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we were really going for performance, all of the calculations would be done on the main thread using a timer (say every few seconds) and then just referenced via TLS on the workers. I don't think it's worth doing for this use case assuming a sane health check interval, but if you feel like it you could add a TODO.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The conversions from int to floating point are also expensive, so the fastest implementation probably would be to do the whole check using integers only. I.e., during config loading, we could precompute and store For now, I'll just add a TODO comment. |
||
| final_status = Http::Code::ServiceUnavailable; | ||
| break; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| if (!Http::CodeUtility::is2xx(enumToInt(final_status))) { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -68,15 +68,19 @@ class HealthCheckCacheManager { | |
|
|
||
| typedef std::shared_ptr<HealthCheckCacheManager> HealthCheckCacheManagerSharedPtr; | ||
|
|
||
| typedef std::map<std::string, double> ClusterMinHealthyPercentages; | ||
| typedef std::shared_ptr<const ClusterMinHealthyPercentages> ClusterMinHealthyPercentagesSharedPtr; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
|
||
| /** | ||
| * Health check responder filter. | ||
| */ | ||
| class HealthCheckFilter : public Http::StreamFilter { | ||
| public: | ||
| HealthCheckFilter(Server::Configuration::FactoryContext& context, bool pass_through_mode, | ||
| HealthCheckCacheManagerSharedPtr cache_manager, const std::string& endpoint) | ||
| HealthCheckCacheManagerSharedPtr cache_manager, const std::string& endpoint, | ||
| ClusterMinHealthyPercentagesSharedPtr cluster_min_healthy_percentages) | ||
| : context_(context), pass_through_mode_(pass_through_mode), cache_manager_(cache_manager), | ||
| endpoint_(endpoint) {} | ||
| endpoint_(endpoint), cluster_min_healthy_percentages_(cluster_min_healthy_percentages) {} | ||
|
|
||
| // Http::StreamFilterBase | ||
| void onDestroy() override {} | ||
|
|
@@ -109,5 +113,6 @@ class HealthCheckFilter : public Http::StreamFilter { | |
| bool pass_through_mode_{}; | ||
| HealthCheckCacheManagerSharedPtr cache_manager_{}; | ||
| const std::string endpoint_; | ||
| ClusterMinHealthyPercentagesSharedPtr cluster_min_healthy_percentages_{}; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: |
||
| }; | ||
| } // namespace Envoy | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,10 +2,12 @@ | |
| #include <memory> | ||
|
|
||
| #include "common/buffer/buffer_impl.h" | ||
| #include "common/upstream/upstream_impl.h" | ||
|
|
||
| #include "server/http/health_check.h" | ||
|
|
||
| #include "test/mocks/server/mocks.h" | ||
| #include "test/mocks/upstream/cluster_info.h" | ||
| #include "test/test_common/printers.h" | ||
| #include "test/test_common/utility.h" | ||
|
|
||
|
|
@@ -36,8 +38,11 @@ class HealthCheckFilterTest : public testing::Test { | |
| prepareFilter(pass_through); | ||
| } | ||
|
|
||
| void prepareFilter(bool pass_through) { | ||
| filter_.reset(new HealthCheckFilter(context_, pass_through, cache_manager_, "/healthcheck")); | ||
| void | ||
| prepareFilter(bool pass_through, | ||
| ClusterMinHealthyPercentagesSharedPtr cluster_min_healthy_percentages = nullptr) { | ||
| filter_.reset(new HealthCheckFilter(context_, pass_through, cache_manager_, "/healthcheck", | ||
| cluster_min_healthy_percentages)); | ||
| filter_->setDecoderFilterCallbacks(callbacks_); | ||
| } | ||
|
|
||
|
|
@@ -49,6 +54,30 @@ class HealthCheckFilterTest : public testing::Test { | |
| NiceMock<Http::MockStreamDecoderFilterCallbacks> callbacks_; | ||
| Http::TestHeaderMapImpl request_headers_; | ||
| Http::TestHeaderMapImpl request_headers_no_hc_; | ||
|
|
||
| class MockHealthCheckCluster : public Upstream::MockCluster { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rather than creating a one-off mock here, I think you should add a helper function in the test file that just uses the existing MockCluster. I think something like this will work to create values you can use for cluster_www1/2:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the pointer! |
||
| public: | ||
| MockHealthCheckCluster(uint64_t membership_total, uint64_t membership_healthy) | ||
| : info_(new ClusterInfo(membership_total, membership_healthy)) {} | ||
|
|
||
| Upstream::ClusterInfoConstSharedPtr info() const override { return info_; } | ||
|
|
||
| class ClusterInfo : public Upstream::MockClusterInfo { | ||
| public: | ||
| ClusterInfo(uint64_t membership_total, uint64_t membership_healthy) | ||
| : stats_(Upstream::ClusterInfoImpl::generateStats(stats_store_)) { | ||
| stats_.membership_total_.set(membership_total); | ||
| stats_.membership_healthy_.set(membership_healthy); | ||
| } | ||
|
|
||
| Upstream::ClusterStats& stats() const override { return stats_; } | ||
|
|
||
| private: | ||
| mutable Upstream::ClusterStats stats_; | ||
| }; | ||
|
|
||
| Upstream::ClusterInfoConstSharedPtr info_; | ||
| }; | ||
| }; | ||
|
|
||
| class HealthCheckFilterNoPassThroughTest : public HealthCheckFilterTest { | ||
|
|
@@ -84,6 +113,82 @@ TEST_F(HealthCheckFilterNoPassThroughTest, NotHcRequest) { | |
| EXPECT_STREQ("true", service_response.EnvoyImmediateHealthCheckFail()->value().c_str()); | ||
| } | ||
|
|
||
| TEST_F(HealthCheckFilterNoPassThroughTest, ComputedHealth) { | ||
| // Test health non-pass-through health checks without upstream cluster | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit s/Test health/Test/ |
||
| // minimum health specified. | ||
| prepareFilter(false); | ||
| { | ||
| Http::TestHeaderMapImpl health_check_response{{":status", "200"}}; | ||
| EXPECT_CALL(context_, healthCheckFailed()).WillOnce(Return(false)); | ||
| EXPECT_CALL(callbacks_, encodeHeaders_(HeaderMapEqualRef(&health_check_response), true)) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: |
||
| .Times(1); | ||
| EXPECT_EQ(Http::FilterHeadersStatus::StopIteration, | ||
| filter_->decodeHeaders(request_headers_, true)); | ||
| } | ||
| { | ||
| Http::TestHeaderMapImpl health_check_response{{":status", "503"}}; | ||
| EXPECT_CALL(context_, healthCheckFailed()).WillOnce(Return(true)); | ||
| EXPECT_CALL(callbacks_, encodeHeaders_(HeaderMapEqualRef(&health_check_response), true)) | ||
| .Times(1); | ||
| EXPECT_EQ(Http::FilterHeadersStatus::StopIteration, | ||
| filter_->decodeHeaders(request_headers_, true)); | ||
| } | ||
|
|
||
| // Test health non-pass-through health checks with upstream cluster | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. s/Test health/Test/ |
||
| // minimum health specified. | ||
| prepareFilter(false, ClusterMinHealthyPercentagesSharedPtr( | ||
| new ClusterMinHealthyPercentages{{"www1", 50.0}, {"www2", 75.0}})); | ||
| { | ||
| // This should pass, because each upstream cluster has at least the | ||
| // minimum percentage of healthy servers. | ||
| Http::TestHeaderMapImpl health_check_response{{":status", "200"}}; | ||
| MockHealthCheckCluster cluster_www1(100, 50); | ||
| MockHealthCheckCluster cluster_www2(1000, 800); | ||
| Upstream::ClusterManager::ClusterInfoMap cluster_info_map{ | ||
| {"www1", std::reference_wrapper<const Upstream::Cluster>(cluster_www1)}, | ||
| {"www2", std::reference_wrapper<const Upstream::Cluster>(cluster_www2)}}; | ||
| EXPECT_CALL(context_, healthCheckFailed()).WillOnce(Return(false)); | ||
| EXPECT_CALL(context_, clusterManager()).Times(1); | ||
| EXPECT_CALL(context_.cluster_manager_, clusters()).WillOnce(Return(cluster_info_map)); | ||
| EXPECT_CALL(callbacks_, encodeHeaders_(HeaderMapEqualRef(&health_check_response), true)) | ||
| .Times(1); | ||
| EXPECT_EQ(Http::FilterHeadersStatus::StopIteration, | ||
| filter_->decodeHeaders(request_headers_, true)); | ||
| } | ||
| { | ||
| // This should fail, because one upstream cluster has too few healthy servers. | ||
| Http::TestHeaderMapImpl health_check_response{{":status", "503"}}; | ||
| MockHealthCheckCluster cluster_www1(100, 49); | ||
| MockHealthCheckCluster cluster_www2(1000, 800); | ||
| Upstream::ClusterManager::ClusterInfoMap cluster_info_map{ | ||
| {"www1", std::reference_wrapper<const Upstream::Cluster>(cluster_www1)}, | ||
| {"www2", std::reference_wrapper<const Upstream::Cluster>(cluster_www2)}}; | ||
| EXPECT_CALL(context_, healthCheckFailed()).WillOnce(Return(false)); | ||
| EXPECT_CALL(context_, clusterManager()).Times(1); | ||
| EXPECT_CALL(context_.cluster_manager_, clusters()).WillOnce(Return(cluster_info_map)); | ||
| EXPECT_CALL(callbacks_, encodeHeaders_(HeaderMapEqualRef(&health_check_response), true)) | ||
| .Times(1); | ||
| EXPECT_EQ(Http::FilterHeadersStatus::StopIteration, | ||
| filter_->decodeHeaders(request_headers_, true)); | ||
| } | ||
| { | ||
| // This should fail, because one upstream cluster has no servers at all. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider adding a test for the empty cluster but min health == 0% case. |
||
| Http::TestHeaderMapImpl health_check_response{{":status", "503"}}; | ||
| MockHealthCheckCluster cluster_www1(0, 0); | ||
| MockHealthCheckCluster cluster_www2(1000, 800); | ||
| Upstream::ClusterManager::ClusterInfoMap cluster_info_map{ | ||
| {"www1", std::reference_wrapper<const Upstream::Cluster>(cluster_www1)}, | ||
| {"www2", std::reference_wrapper<const Upstream::Cluster>(cluster_www2)}}; | ||
| EXPECT_CALL(context_, healthCheckFailed()).WillOnce(Return(false)); | ||
| EXPECT_CALL(context_, clusterManager()).Times(1); | ||
| EXPECT_CALL(context_.cluster_manager_, clusters()).WillOnce(Return(cluster_info_map)); | ||
| EXPECT_CALL(callbacks_, encodeHeaders_(HeaderMapEqualRef(&health_check_response), true)) | ||
| .Times(1); | ||
| EXPECT_EQ(Http::FilterHeadersStatus::StopIteration, | ||
| filter_->decodeHeaders(request_headers_, true)); | ||
| } | ||
| } | ||
|
|
||
| TEST_F(HealthCheckFilterNoPassThroughTest, HealthCheckFailedCallbackCalled) { | ||
| EXPECT_CALL(context_, healthCheckFailed()).WillOnce(Return(true)); | ||
| EXPECT_CALL(callbacks_.request_info_, healthCheck(true)); | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please avoid naked memory allocations. I would either a) assign to unique_ptr than release/move into the final shared_ptr, or b) assign into a non-const shared_ptr which I think might be convertible to the const one (not sure).