Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion bazel/repository_locations.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ REPOSITORY_LOCATIONS = dict(
urls = ["https://github.com/google/protobuf/archive/v3.5.0.tar.gz"],
),
envoy_api = dict(
commit = "f268484d221509b175c46cce6c4cc3593a0ede5e",
commit = "b4a303154703d19f49b63e5529877772ce5da617",
remote = "https://github.com/envoyproxy/data-plane-api",
),
grpc_httpjson_transcoding = dict(
Expand Down
43 changes: 39 additions & 4 deletions source/server/http/health_check.cc
Original file line number Diff line number Diff line change
Expand Up @@ -43,10 +43,19 @@ HealthCheckFilterConfig::createFilter(const envoy::api::v2::filter::http::Health
std::chrono::milliseconds(cache_time_ms)));
}

return [&context, pass_through_mode, cache_manager,
hc_endpoint](Http::FilterChainFactoryCallbacks& callbacks) -> void {
callbacks.addStreamFilter(Http::StreamFilterSharedPtr{
new HealthCheckFilter(context, pass_through_mode, cache_manager, hc_endpoint)});
ClusterMinHealthyPercentagesSharedPtr cluster_min_healthy_percentages;
if (!pass_through_mode && !proto_config.cluster_min_healthy_percentages().empty()) {
auto* cluster_to_percentage = new ClusterMinHealthyPercentages();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid naked memory allocations. I would either a) assign to unique_ptr than release/move into the final shared_ptr, or b) assign into a non-const shared_ptr which I think might be convertible to the const one (not sure).

for (const auto& item : proto_config.cluster_min_healthy_percentages()) {
cluster_to_percentage->emplace(std::make_pair(item.first, item.second.value()));
}
cluster_min_healthy_percentages.reset(cluster_to_percentage);
}

return [&context, pass_through_mode, cache_manager, hc_endpoint,
cluster_min_healthy_percentages](Http::FilterChainFactoryCallbacks& callbacks) -> void {
callbacks.addStreamFilter(Http::StreamFilterSharedPtr{new HealthCheckFilter(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not your code but can you switch to std::make_shared?

context, pass_through_mode, cache_manager, hc_endpoint, cluster_min_healthy_percentages)});
};
}

Expand Down Expand Up @@ -155,6 +164,32 @@ void HealthCheckFilter::onComplete() {
Http::Code final_status = Http::Code::OK;
if (cache_manager_) {
final_status = cache_manager_->getCachedResponseCode();
} else if (cluster_min_healthy_percentages_ != nullptr &&
!cluster_min_healthy_percentages_->empty()) {
const auto clusters(context_.clusterManager().clusters());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this is not safe. clusters() right now is only safe to be called from the main thread (not workers) since there is no locking. What you actually want to do is go through your targets clusters and call get() which will give you the thread local version which I think should be sufficient for the computations you need.

for (const auto& item : *cluster_min_healthy_percentages_) {
const std::string& cluster_name = item.first;
const double min_healthy_percentage = item.second;
auto match = clusters.find(cluster_name);
if (match == clusters.end()) {
final_status = Http::Code::ServiceUnavailable;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some comments here? I guess I see how the lack of the cluster at all is sufficient ground for failure, though I wonder if there should be a stat. At the very least I would probably add a small comment.

break;
}
const auto& stats = match->second.get().info()->stats();
const uint64_t membership_total = stats.membership_total_.value();
if (membership_total == 0) {
if (min_healthy_percentage == 0.0) {
continue;
} else {
final_status = Http::Code::ServiceUnavailable;
break;
}
}
if (100.0 * stats.membership_healthy_.value() < membership_total * min_healthy_percentage) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an advantage to writing it this way? I think it would be clearer if it were stats.membership_healthy_.value() < membership_total * min_healthy_percentage / 100.0. I suppose that requires a cast, but perhaps the computation of the minimum required healthy hosts (e.g., membership_total * min_healthy_percentage / 100.0) could be moved into a variable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it that way to avoid a division operation. But, now that you mention it, g++ might implement the division as multiplication by the reciprocal. I know it does that optimization for division by an integer constant. I'll see if it does the same thing for division by a floating point constant.

@brian-pane brian-pane Jan 21, 2018

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some testing and reading, I found that:

  • g++ only converts division by a floating point literal into multiplication by its reciprocal when a special flag -freciprocal-math is specified. That's probably because the optimization can change the results of computations, due to the limits of floating-point precision.
  • Division is still slow compared to multiplication, but it's gotten much better in recent years: the DIVSD instruction that g++ generates for / 100.0 will add under 20 clock cycles of latency on recent x86 processors.

So I'll just go with / 100.0 to improve readability.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could remove the division from the health check request path by moving it to HealthCheckFilterConfig::createFilter. That is, store the value in the range [0.0, 1.0] instead of [0.0, 100.0].

That said, I don't think performance is critical here, and the current version lgtm.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were really going for performance, all of the calculations would be done on the main thread using a timer (say every few seconds) and then just referenced via TLS on the workers. I don't think it's worth doing for this use case assuming a sane health check interval, but if you feel like it you could add a TODO.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conversions from int to floating point are also expensive, so the fastest implementation probably would be to do the whole check using integers only. I.e., during config loading, we could precompute and store threshold = uint64_t(min_healthy_percentage) * 1000000 and then implement the check as

if (stats.membership_healthy_.value() < membership_total * threshold / (100 * 1000000)) {

For now, I'll just add a TODO comment.

final_status = Http::Code::ServiceUnavailable;
break;
}
}
}

if (!Http::CodeUtility::is2xx(enumToInt(final_status))) {
Expand Down
9 changes: 7 additions & 2 deletions source/server/http/health_check.h
Original file line number Diff line number Diff line change
Expand Up @@ -68,15 +68,19 @@ class HealthCheckCacheManager {

typedef std::shared_ptr<HealthCheckCacheManager> HealthCheckCacheManagerSharedPtr;

typedef std::map<std::string, double> ClusterMinHealthyPercentages;
typedef std::shared_ptr<const ClusterMinHealthyPercentages> ClusterMinHealthyPercentagesSharedPtr;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClusterMinHealthyPercentagesConstSharedPtr


/**
* Health check responder filter.
*/
class HealthCheckFilter : public Http::StreamFilter {
public:
HealthCheckFilter(Server::Configuration::FactoryContext& context, bool pass_through_mode,
HealthCheckCacheManagerSharedPtr cache_manager, const std::string& endpoint)
HealthCheckCacheManagerSharedPtr cache_manager, const std::string& endpoint,
ClusterMinHealthyPercentagesSharedPtr cluster_min_healthy_percentages)
: context_(context), pass_through_mode_(pass_through_mode), cache_manager_(cache_manager),
endpoint_(endpoint) {}
endpoint_(endpoint), cluster_min_healthy_percentages_(cluster_min_healthy_percentages) {}

// Http::StreamFilterBase
void onDestroy() override {}
Expand Down Expand Up @@ -109,5 +113,6 @@ class HealthCheckFilter : public Http::StreamFilter {
bool pass_through_mode_{};
HealthCheckCacheManagerSharedPtr cache_manager_{};
const std::string endpoint_;
ClusterMinHealthyPercentagesSharedPtr cluster_min_healthy_percentages_{};

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: {} not needed

};
} // namespace Envoy
109 changes: 107 additions & 2 deletions test/server/http/health_check_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,12 @@
#include <memory>

#include "common/buffer/buffer_impl.h"
#include "common/upstream/upstream_impl.h"

#include "server/http/health_check.h"

#include "test/mocks/server/mocks.h"
#include "test/mocks/upstream/cluster_info.h"
#include "test/test_common/printers.h"
#include "test/test_common/utility.h"

Expand Down Expand Up @@ -36,8 +38,11 @@ class HealthCheckFilterTest : public testing::Test {
prepareFilter(pass_through);
}

void prepareFilter(bool pass_through) {
filter_.reset(new HealthCheckFilter(context_, pass_through, cache_manager_, "/healthcheck"));
void
prepareFilter(bool pass_through,
ClusterMinHealthyPercentagesSharedPtr cluster_min_healthy_percentages = nullptr) {
filter_.reset(new HealthCheckFilter(context_, pass_through, cache_manager_, "/healthcheck",
cluster_min_healthy_percentages));
filter_->setDecoderFilterCallbacks(callbacks_);
}

Expand All @@ -49,6 +54,30 @@ class HealthCheckFilterTest : public testing::Test {
NiceMock<Http::MockStreamDecoderFilterCallbacks> callbacks_;
Http::TestHeaderMapImpl request_headers_;
Http::TestHeaderMapImpl request_headers_no_hc_;

class MockHealthCheckCluster : public Upstream::MockCluster {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than creating a one-off mock here, I think you should add a helper function in the test file that just uses the existing MockCluster. I think something like this will work to create values you can use for cluster_www1/2:

MockCluster makeHealthCheckCluster(uint64_t membership_total, uint64_t membership_healthy) {
  MockCluster cluster;
  cluster.info_->stats_.membership_total_.set(membership_total);
  cluster.info_->stats_.membership_healthy_.set(membership_healthy);
  return cluster;
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer! MockCluster isn't copyable (something in the gmock macros seems to be explicitly deleting the copy constructor), but I was able to use your approach to replace all of MockHealthCheckCluster with just a constructor that sets the stats in MockCluster.

public:
MockHealthCheckCluster(uint64_t membership_total, uint64_t membership_healthy)
: info_(new ClusterInfo(membership_total, membership_healthy)) {}

Upstream::ClusterInfoConstSharedPtr info() const override { return info_; }

class ClusterInfo : public Upstream::MockClusterInfo {
public:
ClusterInfo(uint64_t membership_total, uint64_t membership_healthy)
: stats_(Upstream::ClusterInfoImpl::generateStats(stats_store_)) {
stats_.membership_total_.set(membership_total);
stats_.membership_healthy_.set(membership_healthy);
}

Upstream::ClusterStats& stats() const override { return stats_; }

private:
mutable Upstream::ClusterStats stats_;
};

Upstream::ClusterInfoConstSharedPtr info_;
};
};

class HealthCheckFilterNoPassThroughTest : public HealthCheckFilterTest {
Expand Down Expand Up @@ -84,6 +113,82 @@ TEST_F(HealthCheckFilterNoPassThroughTest, NotHcRequest) {
EXPECT_STREQ("true", service_response.EnvoyImmediateHealthCheckFail()->value().c_str());
}

TEST_F(HealthCheckFilterNoPassThroughTest, ComputedHealth) {
// Test health non-pass-through health checks without upstream cluster

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit s/Test health/Test/

// minimum health specified.
prepareFilter(false);
{
Http::TestHeaderMapImpl health_check_response{{":status", "200"}};
EXPECT_CALL(context_, healthCheckFailed()).WillOnce(Return(false));
EXPECT_CALL(callbacks_, encodeHeaders_(HeaderMapEqualRef(&health_check_response), true))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: .Times(1) is redundant (it's implied). You can remove. Same below.

.Times(1);
EXPECT_EQ(Http::FilterHeadersStatus::StopIteration,
filter_->decodeHeaders(request_headers_, true));
}
{
Http::TestHeaderMapImpl health_check_response{{":status", "503"}};
EXPECT_CALL(context_, healthCheckFailed()).WillOnce(Return(true));
EXPECT_CALL(callbacks_, encodeHeaders_(HeaderMapEqualRef(&health_check_response), true))
.Times(1);
EXPECT_EQ(Http::FilterHeadersStatus::StopIteration,
filter_->decodeHeaders(request_headers_, true));
}

// Test health non-pass-through health checks with upstream cluster

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Test health/Test/

// minimum health specified.
prepareFilter(false, ClusterMinHealthyPercentagesSharedPtr(
new ClusterMinHealthyPercentages{{"www1", 50.0}, {"www2", 75.0}}));
{
// This should pass, because each upstream cluster has at least the
// minimum percentage of healthy servers.
Http::TestHeaderMapImpl health_check_response{{":status", "200"}};
MockHealthCheckCluster cluster_www1(100, 50);
MockHealthCheckCluster cluster_www2(1000, 800);
Upstream::ClusterManager::ClusterInfoMap cluster_info_map{
{"www1", std::reference_wrapper<const Upstream::Cluster>(cluster_www1)},
{"www2", std::reference_wrapper<const Upstream::Cluster>(cluster_www2)}};
EXPECT_CALL(context_, healthCheckFailed()).WillOnce(Return(false));
EXPECT_CALL(context_, clusterManager()).Times(1);
EXPECT_CALL(context_.cluster_manager_, clusters()).WillOnce(Return(cluster_info_map));
EXPECT_CALL(callbacks_, encodeHeaders_(HeaderMapEqualRef(&health_check_response), true))
.Times(1);
EXPECT_EQ(Http::FilterHeadersStatus::StopIteration,
filter_->decodeHeaders(request_headers_, true));
}
{
// This should fail, because one upstream cluster has too few healthy servers.
Http::TestHeaderMapImpl health_check_response{{":status", "503"}};
MockHealthCheckCluster cluster_www1(100, 49);
MockHealthCheckCluster cluster_www2(1000, 800);
Upstream::ClusterManager::ClusterInfoMap cluster_info_map{
{"www1", std::reference_wrapper<const Upstream::Cluster>(cluster_www1)},
{"www2", std::reference_wrapper<const Upstream::Cluster>(cluster_www2)}};
EXPECT_CALL(context_, healthCheckFailed()).WillOnce(Return(false));
EXPECT_CALL(context_, clusterManager()).Times(1);
EXPECT_CALL(context_.cluster_manager_, clusters()).WillOnce(Return(cluster_info_map));
EXPECT_CALL(callbacks_, encodeHeaders_(HeaderMapEqualRef(&health_check_response), true))
.Times(1);
EXPECT_EQ(Http::FilterHeadersStatus::StopIteration,
filter_->decodeHeaders(request_headers_, true));
}
{
// This should fail, because one upstream cluster has no servers at all.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a test for the empty cluster but min health == 0% case.

Http::TestHeaderMapImpl health_check_response{{":status", "503"}};
MockHealthCheckCluster cluster_www1(0, 0);
MockHealthCheckCluster cluster_www2(1000, 800);
Upstream::ClusterManager::ClusterInfoMap cluster_info_map{
{"www1", std::reference_wrapper<const Upstream::Cluster>(cluster_www1)},
{"www2", std::reference_wrapper<const Upstream::Cluster>(cluster_www2)}};
EXPECT_CALL(context_, healthCheckFailed()).WillOnce(Return(false));
EXPECT_CALL(context_, clusterManager()).Times(1);
EXPECT_CALL(context_.cluster_manager_, clusters()).WillOnce(Return(cluster_info_map));
EXPECT_CALL(callbacks_, encodeHeaders_(HeaderMapEqualRef(&health_check_response), true))
.Times(1);
EXPECT_EQ(Http::FilterHeadersStatus::StopIteration,
filter_->decodeHeaders(request_headers_, true));
}
}

TEST_F(HealthCheckFilterNoPassThroughTest, HealthCheckFailedCallbackCalled) {
EXPECT_CALL(context_, healthCheckFailed()).WillOnce(Return(true));
EXPECT_CALL(callbacks_.request_info_, healthCheck(true));
Expand Down