Skip to content

Only pass one hostname via EDS and prefer healthy ones#8084

Merged
freddygv merged 4 commits intomasterfrom
tgw-unique-hostnames
Jun 12, 2020
Merged

Only pass one hostname via EDS and prefer healthy ones#8084
freddygv merged 4 commits intomasterfrom
tgw-unique-hostnames

Conversation

@freddygv
Copy link
Contributor

@freddygv freddygv commented Jun 10, 2020

Currently when passing hostname clusters to Envoy, we set each service instance registered with Consul as an LbEndpoint for the cluster.

However, Envoy can only handle one per cluster:
[2020-06-04 18:32:34.094][1][warning][config] [source/common/config/grpc_subscription_impl.cc:87] gRPC config for type.googleapis.com/envoy.api.v2.Cluster rejected: Error adding/updating cluster(s) dc2.internal.ddd90499-9b47-91c5-4616-c0cbf0fc358a.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint, server.dc2.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint

Envoy is currently handling this gracefully by only picking one of the endpoints. However, we should avoid passing multiple to avoid these warning logs.

This PR:

  • Ensures we only pass one endpoint, which is tied to one service instance.
  • We prefer sending an endpoint which is marked as Healthy by Consul.
  • If no endpoints are healthy we emit a warning and skip the cluster.
  • If multiple unique hostnames are spread across service instances we emit a warning and let the user know which will be resolved.

@freddygv freddygv requested a review from a team June 10, 2020 21:47
@freddygv freddygv force-pushed the tgw-unique-hostnames branch from 3ac3725 to 1ee11be Compare June 10, 2020 21:49
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
@preetapan preetapan added this to the 1.8.0 milestone Jun 11, 2020
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Copy link
Contributor

@crhino crhino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few small comments, but overall looks good. Took it for a spin locally and it worked, especially like the log messages that tell you which hostname is being resolved!

Copy link
Contributor

@crhino crhino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@freddygv freddygv merged commit 166a8b2 into master Jun 12, 2020
@freddygv freddygv deleted the tgw-unique-hostnames branch June 12, 2020 19:46
hashicorp-ci pushed a commit that referenced this pull request Jun 12, 2020
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

Currently when passing hostname clusters to Envoy, we set each service instance registered with Consul as an LbEndpoint for the cluster.

However, Envoy can only handle one per cluster:
[2020-06-04 18:32:34.094][1][warning][config] [source/common/config/grpc_subscription_impl.cc:87] gRPC config for type.googleapis.com/envoy.api.v2.Cluster rejected: Error adding/updating cluster(s) dc2.internal.ddd90499-9b47-91c5-4616-c0cbf0fc358a.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint, server.dc2.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint

Envoy is currently handling this gracefully by only picking one of the endpoints. However, we should avoid passing multiple to avoid these warning logs.

This PR:

* Ensures we only pass one endpoint, which is tied to one service instance.
* We prefer sending an endpoint which is marked as Healthy by Consul.
* If no endpoints are healthy we emit a warning and skip the cluster.
* If multiple unique hostnames are spread across service instances we emit a warning and let the user know which will be resolved.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants