Skip to content
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions api/filter/http/health_check.proto
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,10 @@ message HealthCheck {
// If operating in pass through mode, the amount of time in milliseconds
// that the filter should cache the upstream response.
google.protobuf.Duration cache_time = 3;

// [#not-implemented-hide:]
// If operating in non-pass-through mode, specifies a set of upstream cluster
// names and the minimum percentage of servers in each of those clusters that
// must be healthy in order for the filter to return a 200.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call it cluster_min_healthy_percentages.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the filter" is lacking context. Which filter are we talking about here? What "return a 200" means?

I think it is less ideal to use boolean for health check result. In a distributed system, we should measure the health in some form of percentage or probability, and let the caller decide the threshold.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wora, although I like the idea of signaling a percent-availability or current-capacity status in health check responses, I think it's outside the scope of this PR. The reason I say that is: interoperability. For an Envoy-to-Envoy service mesh, it's easy to add some new HTTP header like "X-Envoy-Health-Level: 75%." But in some deployments, the thing sending health checks to Envoy is a hardware load balancer or a cloud hosting provider's "load balancer as a service" (like AWS's NLB and Google's Cloud Load Balancing). To get multiple 3rd party systems/services to support a "partially healthy" health check response status from Envoy, we'd need to find or write an RFC that defines a standardized way of communicating fractional health status over HTTP.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't insist that you need to support it. I just suggest that is how we should design a distributed system. I will leave the review to Envoy experts. Thanks.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I think what we have here is good for now. I agree with @wora that % based health is very interesting from a mesh perspective, but I think we should track looking into that separately.

map<string, uint32> cluster_min_healthy_percentages = 4;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per_cluster_mint_healthy_percentages ?

}