feat(translator): remove absent healthy endpoints by guydc · Pull Request #6478 · envoyproxy/gateway

guydc · 2025-07-07T17:59:55Z

What type of PR is this?

What this PR does / why we need it:
Removes absent (no longer discovered) endpoints, without waiting for their health to fail as well.

Which issue(s) this PR fixes:

Fixes #6463

Release Notes: Yes

arkodg · 2025-07-07T19:15:04Z

api/v1alpha1/healthcheck_types.go

can we do without an API field ?
are there use cases where the API has removed the endpoint, but we want the data plane to keep using it ?

The envoy rationale for this (default) behavior is outlined here:

Host absent / health check OK:
Envoy will route to the target host. This is very important since the design assumes that the discovery service can fail at any time. If a host continues to pass health check even after becoming absent from the discovery data, Envoy will still route. Although it would be impossible to add new hosts in this scenario, existing hosts will continue to operate normally. When the discovery service is operating normally again the data will eventually re-converge.

When envoy depends on successful DNS resolution rather than a control plane to discover endpoints, there's a greater risk, and some users may prefer to route to stale endpoints, as long as they are known to be healthy, as mitigation to discovery failures.

if the discovery service is not providing endpoints but DNS is, how will this setting help ?

Discovery works differently in these cases. XDS-based discovery is inherently more resilient (control plane is typically a local component, xds has a bunch of built-in guardrails like caching). There are several examples of DNS-based resolution failures that lead to traffic drops:

DNS Timeout Issue After Upgrading to v1.32.1 envoy#37676

STRICT_DNS drops cluster members on lookup failure envoy#2691

I agree that the envoy default is not intuitive. We can enable this by default and see if users are interested in the other option as time goes on.

Tested out a few behaviors:

NXDOMAIN: in both cases endpoints are removed (even when considering health)

Timeout to DNS: in both cases endpoints are retained (even when ignoring health)

I'm fine with changing the default behavior for STRICT_DNS as well.

thanks for testing this out @guydc

+1 on setting this by default, and considering adding a field to opt if if users really need it

codecov · 2025-07-07T20:37:46Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.77%. Comparing base (4140f61) to head (6ff2651).
Report is 4 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6478      +/-   ##
==========================================
+ Coverage   70.74%   70.77%   +0.02%     
==========================================
  Files         220      220              
  Lines       37594    37594              
==========================================
+ Hits        26596    26607      +11     
+ Misses       9431     9424       -7     
+ Partials     1567     1563       -4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Guy Daich <guy.daich@sap.com>

arkodg

LGTM thanks

zhaohuabing

LGTM thanks!

* feat(translator): remove absent healthy endpoints Signed-off-by: Guy Daich <guy.daich@sap.com> Signed-off-by: Tjeerd Jan van der Molen <34071+tjvdmolen@users.noreply.github.com>

guydc requested a review from a team as a code owner July 7, 2025 17:59

arkodg reviewed Jul 7, 2025

View reviewed changes

guydc force-pushed the feat-btp-ignore-health branch from 3eaabf5 to 70509f3 Compare July 7, 2025 20:29

guydc changed the title ~~feat(translator): support removal of absent healthy endpoints~~ feat(translator): remove absent healthy endpoints Jul 8, 2025

feat(translator): remove absent healthy endpoints

d835a1b

Signed-off-by: Guy Daich <guy.daich@sap.com>

guydc force-pushed the feat-btp-ignore-health branch from 9bf3cb7 to d835a1b Compare July 8, 2025 18:18

fix gen

6ff2651

Signed-off-by: Guy Daich <guy.daich@sap.com>

arkodg added this to the v1.5.0-rc.1 Release milestone Jul 8, 2025

arkodg approved these changes Jul 8, 2025

View reviewed changes

arkodg requested review from a team July 9, 2025 00:48

zhaohuabing approved these changes Jul 10, 2025

View reviewed changes

arkodg merged commit 6b699c8 into envoyproxy:main Jul 10, 2025
44 of 47 checks passed

BrewTestBot mentioned this pull request Aug 8, 2025

egctl 1.5.0 Homebrew/homebrew-core#232799

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(translator): remove absent healthy endpoints#6478

feat(translator): remove absent healthy endpoints#6478
arkodg merged 2 commits intoenvoyproxy:mainfrom
guydc:feat-btp-ignore-health

guydc commented Jul 7, 2025 •

edited

Loading

Uh oh!

arkodg Jul 7, 2025

Uh oh!

guydc Jul 7, 2025 •

edited

Loading

Uh oh!

arkodg Jul 7, 2025

Uh oh!

guydc Jul 8, 2025

Uh oh!

guydc Jul 8, 2025

Uh oh!

arkodg Jul 8, 2025

Uh oh!

codecov bot commented Jul 7, 2025 •

edited

Loading

Uh oh!

arkodg left a comment

Uh oh!

zhaohuabing left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

guydc commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arkodg Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

guydc Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arkodg Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

guydc Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

guydc Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

arkodg Jul 8, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

arkodg left a comment

Choose a reason for hiding this comment

Uh oh!

zhaohuabing left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

guydc commented Jul 7, 2025 •

edited

Loading

guydc Jul 7, 2025 •

edited

Loading

codecov bot commented Jul 7, 2025 •

edited

Loading