Allow healthchecks to be performed on another port by daedric · Pull Request #560 · envoyproxy/envoy

daedric · 2017-03-10T14:25:30Z

Please find a pull request to allow one to configure a specific port for the health checks.

fixes #439

jbdalido · 2017-03-10T14:28:08Z

We use this PR in production to perform healthchecks against http services, alongside grpc services on the same upstream.

mattklein123 · 2017-03-10T18:16:52Z

Hi,

Thanks a lot for working on this. I linked in the tracking issue for this feature. I would like to propose a different set of interfaces for implementing this. I'm hoping it won't be too much work to implement the proposal. My main issue with this approach is that it puts port into a whole lot of places which might not have any port (for example pipe/uds).

Here is what I would do:

Add a new function, createHealthCheckConnection(...), similar to: https://github.com/lyft/envoy/blob/master/include/envoy/upstream/upstream.h#L42
Add the ability for a host to have an alternate HC address. I would have this be full URL syntax such as "tcp://1.2.3.4:81". The reason I would do this is that it will actually set us up later to easily get HC status from a centralized HC if we want.

How we actually get the alternate host address for HC port now becomes cluster specific (DNS, static, SDS, etc.). We don't necessarily have to implement them all at the same time. What cluster types are you using?

Thanks,
Matt

daedric · 2017-03-11T09:58:23Z

I'll let @jbdalido answer to the question on the type of cluster.
Thanks a lot for your suggestion!
It does sound better indeed, we aimed for the simplest as we needed it to work. I'll get back on it ASAP, probably not before monday however.

Cheers

daedric · 2017-03-11T14:03:29Z

Add the ability for a host to have an alternate HC address. I would have this be full URL syntax such as "tcp://1.2.3.4:81". The reason I would do this is that it will actually set us up later to easily get HC status from a centralized HC if we want.

Actually I was thinking of this, in case of a dns entry resolving to several host it leads to an ambiguity. Let's take for instance a domain name services and with several IPs behind ip1, ip2 and ip3.
If I understood you correctly I would setup the cluster with for instance:
url: tcp://services:1234 and hc_url: tcp://services:1235, however I'll need to sync the dns resolution to get the proper ip for each entry and avoid ip1 being checked on ip2.
What is your take on this ?

mattklein123 · 2017-03-11T17:16:01Z

I think there are a few options here depending on the cluster type that you need this to work for:

Just ignore DNS clusters for now, and don't support a separate HC port. (I think most people that want this feature are probably going to use SDS or static clusters, but I'm not sure).
For DNS, if we want this, either have a separate cluster option such as "alt_hc_port", which is then used to populate the HC address when the DNS response is received. Another option would be to expand the "URL" syntax in some way which we document. For example, "tcp://1.2.3.4:80:81" means primary port is 80, HC is 81. We have already discussed doing something like this for SRV such as "tcp://1.2.3.4:srv".

daedric · 2017-03-15T08:55:49Z

Hi,
sorry for the delay, so we do use Strict DNS cluster type so we do need to keep it that way for now. We use this type of cluster because it makes scaling and HA basically free and makes envoy really fast at detecting problems. We can't just have a tcp health check because the fact that the port is bound does not necessarily means the application is in good health.
From all your suggestion, the alt_hc_port option in the cluster section would probably the easiest and clearer one. I'll try to get back on this asap.

Cheers

mattklein123 · 2017-03-15T15:17:11Z

@daedric OK, I'm still not 100% crazy about the "alt_hc_port" option in the sense that we don't need it for static or SDS clusters (it can be a property of the returned host). I'm really leaning towards having it be part of the "URL" somehow.

@lyft/network-team @rshriram @htuch opinions on the config for ^?

htuch · 2017-03-15T15:40:38Z

It might be nice if we can keep URLs as valid URIs, to allow using standard parsing functions and interoperate with libraries that generate them in config pipelines. It's probably not a strong requirement though and might make it more concise in this case (and with srv).

I don't see the issue with alt_hc_port though if it's optional - it's the same as just omitting the second : fragment from the URL.

mattklein123 · 2017-03-15T16:43:00Z

OK if we go with alt_hc_port though we will have to reconcile w/ SDS and static where you might not want the port to be the same for each host. In that case we almost definitely want it to be part of each host definition, so I guess we can only look at it in DNS cases. I don't feel that strongly, so I'm fine either way.

rshriram · 2017-03-16T15:23:20Z

I agree with @htuch that it would be better to keep the URL as a valid URI. Makes it much easier for all the config gens to generate envoy config. Besides, tcp://1.2.3.4:80:81 is kind of confusing [while it saves another config option, it kills readability].

Here is a random thought: why not use a URI in the health check as well, instead of having a separate port field? [May be it has already been discussed in this thread]. For example, user could specify either a PATH or a URI field for the health check (not both). For SDS/static, there would be no change. For strict_dns/logical_dns, when a separate port is desired, people could specify it as a complete URL (there will be duplication of hostname, but thats better than a port number that stands out in the config).

It won't pollute the config with a separate port that needs to be kept track of. As a nice side effect, we could even do health check with a different host. E.g., strict/logical dns points to foo.bar.com external service, while the health check service goes to uptime.com:1234/check/foo.bar.com.

In future, if we move to subsets or global health check service, we could reuse the same machinery [I am just thinking out aloud here, havent thought through fully yet].

mattklein123 · 2017-03-24T23:42:24Z

Going to close this for now. Please reopen if/when you are ready to clean this up. I also noted that this PR is ongoing in the primary issue.

1mentat · 2017-04-14T00:46:37Z

Following up on this from the original issue, having a full URL doesn't make much sense in the usual case, which is you want "the same one as the connection is going to" with perhaps different protocol and port. How would that be expressed this this proposed reworking?

mattklein123 · 2017-04-15T01:13:47Z

@1mentat if possible can we discuss design in #439 so it doesn't get lost. Feel free to repost there and we can discuss.

Signed-off-by: Pengyuan Bian <bianpengyuan@google.com>

Removing E2E gRPC testing infrastructure since we are unable to communicate with a vanilla gRPC service due to: envoyproxy/envoy-mobile#502 Signed-off-by: Alan Chiu <achiu@lyft.com> Description: remove unused tests for grpc and protos Risk Level: low Testing: none Docs Changes: n/a Release Notes: n/a [Optional Fixes #Issue] [Optional Deprecated:] Signed-off-by: JP Simard <jp@jpsim.com>

Allow healthchecks to be performed on another port

c4829b2

mattklein123 closed this Mar 11, 2017

mattklein123 reopened this Mar 11, 2017

mattklein123 closed this Mar 24, 2017

dio mentioned this pull request Apr 2, 2018

health check: add optional alternative health check port envoyproxy/data-plane-api#597

Merged

jplevyak pushed a commit to istio/envoy that referenced this pull request Jul 9, 2020

Update proxy_wasm_cpp_host sha (envoyproxy#560)

06d236f

Signed-off-by: Pengyuan Bian <bianpengyuan@google.com>

This was referenced Feb 27, 2026

🌱 CNCF mission generation 2026-02-27 kubestellar/console-kb#6

Closed

🌱 CNCF mission generation 2026-02-27 kubestellar/console-kb#10

Merged

mathetake pushed a commit that referenced this pull request Mar 3, 2026

build(deps): bump github.com/miekg/dns from 1.1.63 to 1.1.65 (#560)

b3bc5d3

Conversation

daedric commented Mar 10, 2017 • edited by mattklein123 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbdalido commented Mar 10, 2017

Uh oh!

mattklein123 commented Mar 10, 2017

Uh oh!

daedric commented Mar 11, 2017

Uh oh!

daedric commented Mar 11, 2017

Uh oh!

mattklein123 commented Mar 11, 2017

Uh oh!

daedric commented Mar 15, 2017

Uh oh!

mattklein123 commented Mar 15, 2017

Uh oh!

htuch commented Mar 15, 2017

Uh oh!

mattklein123 commented Mar 15, 2017

Uh oh!

rshriram commented Mar 16, 2017

Uh oh!

mattklein123 commented Mar 24, 2017

Uh oh!

1mentat commented Apr 14, 2017

Uh oh!

mattklein123 commented Apr 15, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

daedric commented Mar 10, 2017 •

edited by mattklein123

Loading