Skip to content

Add retriable_serving_statuses to grpc health check to support retrying NOT_SERVING#43331

Open
fishcakez wants to merge 18 commits intoenvoyproxy:mainfrom
fishcakez:grpc-hc-retry-not-serving
Open

Add retriable_serving_statuses to grpc health check to support retrying NOT_SERVING#43331
fishcakez wants to merge 18 commits intoenvoyproxy:mainfrom
fishcakez:grpc-hc-retry-not-serving

Conversation

@fishcakez
Copy link
Copy Markdown
Contributor

@fishcakez fishcakez commented Feb 4, 2026

Commit Message: Add retriable_serving_statuses to gRPC health check to support retrying NOT_SERVING
Additional Description: Add retriable_serving_statuses to gRPC health check to mimic the HTTP health check retriable_statuses, See #17948. This supports the case where want a NOT_SERVING gRPC response to honor the unhealthy_threshold instead of immediately failing.
Risk Level: low - requires enabling to change behavior
Testing: unit, integration and fuzzing changes
Docs Changes: API has inline doc but not health check architecture doc (awaiting review first)
Release Notes: Not updated yet.
Platform Specific Features: N/A
[Optional Runtime guard:] N/A
[Optional Fixes #Issue] N/A
[Optional Fixes commit #PR or SHA] N/A
[Optional Deprecated:] N/A
[Optional API Considerations:] imports gRPC health check proto, may be control plane implications/build changes needed when its brought in. Note that buf linter cannot handle the import due to difference between grpc/grpc and grpc/grpc-proto so skip linting the file for now. Also PGV has issues with repeated enum validation in cc and go, so validate explicit in config load logic.

Signed-off-by: James Fish <jfish@pinterest.com>
@repokitteh-read-only repokitteh-read-only bot added api deps Approval required for changes to Envoy's external dependencies labels Feb 4, 2026
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @wbpcode
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).
CC @envoyproxy/dependency-shepherds: Your approval is needed for changes made to (bazel/.*repos.*\.bzl)|(bazel/dependency_imports\.bzl)|(api/bazel/.*\.bzl)|(.*/requirements\.txt)|(.*\.patch).
envoyproxy/dependency-shepherds assignee is @jwendell

🐱

Caused by: #43331 was opened by fishcakez.

see: more, trace.

Signed-off-by: James Fish <jfish@pinterest.com>
@fishcakez fishcakez force-pushed the grpc-hc-retry-not-serving branch from 33af993 to 204dbe0 Compare February 5, 2026 00:22
@fishcakez
Copy link
Copy Markdown
Contributor Author

fishcakez commented Feb 5, 2026

@wbpcode, as the api-shepherd assigned, I am unable to fix the format-api issue locally - I can not add grpc/grpc to the api/buf.lock, and so linting always fails. I would appreciate some guidance on whether should avoid this import (and define ServingStatus in envoy's health check proto file) or try to fix importing issue.

@wbpcode
Copy link
Copy Markdown
Member

wbpcode commented Feb 7, 2026

Our tool chain is pretty complex. Not sure could we have the lucky that @phlax in case know how to resolve it?

@phlax
Copy link
Copy Markdown
Member

phlax commented Feb 7, 2026

im not a buf expert (so apologies in advance if this is not correct) - but i think you need to do something like

  • add buf.build/grpc/grpc to deps in buf.yaml
  • change import "src/proto/grpc/health/v1/health.proto" -> import "grpc/health/v1/health.proto"
  • remove the src/proto prefix in key in "grpc/health/v1/health.proto"

Signed-off-by: James Fish <jfish@pinterest.com>
Signed-off-by: James Fish <jfish@pinterest.com>
Signed-off-by: James Fish <jfish@pinterest.com>
@fishcakez fishcakez force-pushed the grpc-hc-retry-not-serving branch from 64bc2a4 to e2e136d Compare February 11, 2026 02:39
Signed-off-by: James Fish <jfish@pinterest.com>
bazel/pgv.patch Outdated
- "fileneeds": FileNeeds,
- "isEnum": isEnum,
- "enumList": enumList,
- "enumVal": enumVal,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a lot of patching - is it all necessary - or an artifact of your linter or somesuch

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch is taken directly from bufbuild/protoc-gen-validate#1360 (skipping tests added in the local patch). The larger change to this file is due to gofmt where it aligns all fields based on the longest name.

Copy link
Copy Markdown
Contributor Author

@fishcakez fishcakez Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could avoid patching pgv if we skip unique and not_in validation of the repeated enum until this patch is landed and released in pgv.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok - we are about to drop pgv - so im less concerned about that one really

with the grpc patch - seems strange that their source != buf registration

im wondering if we can either upstream that fun - or if there is a pattern that avoids this

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a preference between the patch or ignoring buf linting on this specific xDS proto file. Either one seems like a reasonable temporary measure that we can get rid of once the tech debt is cleaned up on the gRPC side.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reverted the patches, and move the proto validation to be explicit to the config loading in envoy and updated the api/buf.yaml to not lint the health_check.proto file. This makes the change as simple as possible. Can I get another look/review please?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the api/buf.yaml change did not work to ignore the health_check.proto. It's required to ignore whole directory, and that would mean ignoring envoy/config/core/v3/, which would then of course make the buf linting not work as other protos import those. I could not find a way to ignore a single file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore I've reverted some of the reverts, and brought back the grpc patch to align it on the grpc-proto path.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @markdroth @phlax are you able to take a second look on the grpc/dependency handling please.

@fishcakez
Copy link
Copy Markdown
Contributor Author

fishcakez commented Feb 11, 2026

Thanks @phlax! The missing piece was that buf will only added a dependency from buf.yaml to buf.lock if it detects the dependency in the proto files. Therefore first i had to:

change import "src/proto/grpc/health/v1/health.proto" -> import "grpc/health/v1/health.proto"

Then I could do:

add buf.build/grpc/grpc to deps in buf.yaml

Then run

buf deps update api

However to make the import path grpc/health/v1/ I had to patch the BUILD file of for the proto_library in grpc to add strip_import_prefix. This requires also changing the internal grpc proto library macro. Then I also had to change any c++ includes inside the grpc code base for the health.pb.h. Then repeat that where included in the envoy code base. This is a lot of work to fix a linting issue, and I'm unsure if the include/import changes will impact folks extending envoy in their own code bases.

@paul-r-gall
Copy link
Copy Markdown
Contributor

@markdroth do you have any additional thoughts here?

@paul-r-gall
Copy link
Copy Markdown
Contributor

ping @jwendell for deps review.

This reverts commit 9485a03.

Signed-off-by: James Fish <jfish@pinterest.com>
This reverts commit e2e136d.

Signed-off-by: James Fish <jfish@pinterest.com>
This reverts commit e6a1d23.

Signed-off-by: James Fish <jfish@pinterest.com>
This reverts commit ba4e888.

Signed-off-by: James Fish <jfish@pinterest.com>
Signed-off-by: James Fish <jfish@pinterest.com>
Signed-off-by: James Fish <jfish@pinterest.com>
@fishcakez
Copy link
Copy Markdown
Contributor Author

Very sorry for the auto-adding of so many reviewers, I made an error merging main, and then undid it after the diff was not clean in the PR UI.

Signed-off-by: James Fish <jfish@pinterest.com>
@fishcakez fishcakez force-pushed the grpc-hc-retry-not-serving branch from b692532 to ddf495a Compare March 25, 2026 15:47
… prefix""

This reverts commit 22695dd.

Signed-off-by: James Fish <jfish@pinterest.com>
This reverts commit 5ce7cbb.

Signed-off-by: James Fish <jfish@pinterest.com>
This reverts commit 6624d41.

Signed-off-by: James Fish <jfish@pinterest.com>
This reverts commit 1c4e7ca.

Signed-off-by: James Fish <jfish@pinterest.com>
This reverts commit 69f4ca8.

Signed-off-by: James Fish <jfish@pinterest.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api deps Approval required for changes to Envoy's external dependencies

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants