Skip to content

Allow healthchecks to be performed on another port#560

Closed
daedric wants to merge 1 commit intoenvoyproxy:masterfrom
daedric:healthchecks-with-port
Closed

Allow healthchecks to be performed on another port#560
daedric wants to merge 1 commit intoenvoyproxy:masterfrom
daedric:healthchecks-with-port

Conversation

@daedric
Copy link

@daedric daedric commented Mar 10, 2017

Please find a pull request to allow one to configure a specific port for the health checks.

fixes #439

@jbdalido
Copy link

We use this PR in production to perform healthchecks against http services, alongside grpc services on the same upstream.

@mattklein123
Copy link
Member

Hi,

Thanks a lot for working on this. I linked in the tracking issue for this feature. I would like to propose a different set of interfaces for implementing this. I'm hoping it won't be too much work to implement the proposal. My main issue with this approach is that it puts port into a whole lot of places which might not have any port (for example pipe/uds).

Here is what I would do:

How we actually get the alternate host address for HC port now becomes cluster specific (DNS, static, SDS, etc.). We don't necessarily have to implement them all at the same time. What cluster types are you using?

Thanks,
Matt

@daedric
Copy link
Author

daedric commented Mar 11, 2017

I'll let @jbdalido answer to the question on the type of cluster.
Thanks a lot for your suggestion!
It does sound better indeed, we aimed for the simplest as we needed it to work. I'll get back on it ASAP, probably not before monday however.

Cheers

@daedric
Copy link
Author

daedric commented Mar 11, 2017

Add the ability for a host to have an alternate HC address. I would have this be full URL syntax such as "tcp://1.2.3.4:81". The reason I would do this is that it will actually set us up later to easily get HC status from a centralized HC if we want.

Actually I was thinking of this, in case of a dns entry resolving to several host it leads to an ambiguity. Let's take for instance a domain name services and with several IPs behind ip1, ip2 and ip3.
If I understood you correctly I would setup the cluster with for instance:
url: tcp://services:1234 and hc_url: tcp://services:1235, however I'll need to sync the dns resolution to get the proper ip for each entry and avoid ip1 being checked on ip2.
What is your take on this ?

@mattklein123
Copy link
Member

I think there are a few options here depending on the cluster type that you need this to work for:

  1. Just ignore DNS clusters for now, and don't support a separate HC port. (I think most people that want this feature are probably going to use SDS or static clusters, but I'm not sure).
  2. For DNS, if we want this, either have a separate cluster option such as "alt_hc_port", which is then used to populate the HC address when the DNS response is received. Another option would be to expand the "URL" syntax in some way which we document. For example, "tcp://1.2.3.4:80:81" means primary port is 80, HC is 81. We have already discussed doing something like this for SRV such as "tcp://1.2.3.4:srv".

@daedric
Copy link
Author

daedric commented Mar 15, 2017

Hi,
sorry for the delay, so we do use Strict DNS cluster type so we do need to keep it that way for now. We use this type of cluster because it makes scaling and HA basically free and makes envoy really fast at detecting problems. We can't just have a tcp health check because the fact that the port is bound does not necessarily means the application is in good health.
From all your suggestion, the alt_hc_port option in the cluster section would probably the easiest and clearer one. I'll try to get back on this asap.

Cheers

@mattklein123
Copy link
Member

@daedric OK, I'm still not 100% crazy about the "alt_hc_port" option in the sense that we don't need it for static or SDS clusters (it can be a property of the returned host). I'm really leaning towards having it be part of the "URL" somehow.

@lyft/network-team @rshriram @htuch opinions on the config for ^?

@htuch
Copy link
Member

htuch commented Mar 15, 2017

It might be nice if we can keep URLs as valid URIs, to allow using standard parsing functions and interoperate with libraries that generate them in config pipelines. It's probably not a strong requirement though and might make it more concise in this case (and with srv).

I don't see the issue with alt_hc_port though if it's optional - it's the same as just omitting the second : fragment from the URL.

@mattklein123
Copy link
Member

OK if we go with alt_hc_port though we will have to reconcile w/ SDS and static where you might not want the port to be the same for each host. In that case we almost definitely want it to be part of each host definition, so I guess we can only look at it in DNS cases. I don't feel that strongly, so I'm fine either way.

@rshriram
Copy link
Member

I agree with @htuch that it would be better to keep the URL as a valid URI. Makes it much easier for all the config gens to generate envoy config. Besides, tcp://1.2.3.4:80:81 is kind of confusing [while it saves another config option, it kills readability].

Here is a random thought: why not use a URI in the health check as well, instead of having a separate port field? [May be it has already been discussed in this thread]. For example, user could specify either a PATH or a URI field for the health check (not both). For SDS/static, there would be no change. For strict_dns/logical_dns, when a separate port is desired, people could specify it as a complete URL (there will be duplication of hostname, but thats better than a port number that stands out in the config).

It won't pollute the config with a separate port that needs to be kept track of. As a nice side effect, we could even do health check with a different host. E.g., strict/logical dns points to foo.bar.com external service, while the health check service goes to uptime.com:1234/check/foo.bar.com.

In future, if we move to subsets or global health check service, we could reuse the same machinery [I am just thinking out aloud here, havent thought through fully yet].

@mattklein123
Copy link
Member

Going to close this for now. Please reopen if/when you are ready to clean this up. I also noted that this PR is ongoing in the primary issue.

@1mentat
Copy link

1mentat commented Apr 14, 2017

Following up on this from the original issue, having a full URL doesn't make much sense in the usual case, which is you want "the same one as the connection is going to" with perhaps different protocol and port. How would that be expressed this this proposed reworking?

@mattklein123
Copy link
Member

@1mentat if possible can we discuss design in #439 so it doesn't get lost. Feel free to repost there and we can discuss.

jplevyak pushed a commit to istio/envoy that referenced this pull request Jul 9, 2020
Signed-off-by: Pengyuan Bian <bianpengyuan@google.com>
jpsim pushed a commit that referenced this pull request Nov 28, 2022
Removing E2E gRPC testing infrastructure since we are unable to communicate with a vanilla gRPC service due to: envoyproxy/envoy-mobile#502

Signed-off-by: Alan Chiu <achiu@lyft.com>

Description: remove unused tests for grpc and protos
Risk Level: low
Testing: none
Docs Changes: n/a
Release Notes: n/a
[Optional Fixes #Issue]
[Optional Deprecated:]

Signed-off-by: JP Simard <jp@jpsim.com>
jpsim pushed a commit that referenced this pull request Nov 29, 2022
Removing E2E gRPC testing infrastructure since we are unable to communicate with a vanilla gRPC service due to: envoyproxy/envoy-mobile#502

Signed-off-by: Alan Chiu <achiu@lyft.com>

Description: remove unused tests for grpc and protos
Risk Level: low
Testing: none
Docs Changes: n/a
Release Notes: n/a
[Optional Fixes #Issue]
[Optional Deprecated:]

Signed-off-by: JP Simard <jp@jpsim.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow healthcheck on different port for non-EDS cluster types

6 participants