Add unified /api/healthz#56943
Conversation
b0b8808 to
a84c0ca
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces a new unified health check endpoint /api/healthz which combines the health status of the local Raylet and the GCS. The implementation is clear and achieves its goal. I have a couple of suggestions to improve code quality: one regarding Python's comparison operators for better correctness and another to remove some unreachable code for better clarity and maintainability.
69de984 to
5ace9dd
Compare
edoakes
left a comment
There was a problem hiding this comment.
Let's add some basic tests. If it's possible to write integration tests for this, that would be great, but not sure how because if the raylet and/or GCS become unhealthy, I think the agent will shortly crash...
Else we can scaffold up some unit tests (without too much mocking please)
|
I also kicked off premerge CI by adding the |
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
|
unstale |
|
Hi any updates on this? |
5ace9dd to
eb44607
Compare
This new endpoint on the HealthzAgent class combines the status of /api/local_raylet_healthz and /api/gcs_healthz into one endpoint for use with Kubernetes. See ray-project#56204. Signed-off-by: Spencer Peterson <spencerjp@google.com>
Signed-off-by: Spencer Peterson <spencerjp@google.com>
Signed-off-by: Spencer Peterson <spencerjp@google.com>
Signed-off-by: Spencer Peterson <spencerjp@google.com>
Signed-off-by: Spencer Peterson <spencerjp@google.com>
- imports - Use asyncio.gather instead of TaskGroup, which is not available on 3.10 - Fix missing happy path in GCS check Signed-off-by: Spencer Peterson <spencerjp@google.com>
Signed-off-by: Spencer Peterson <spencerjp@google.com>
Signed-off-by: Spencer Peterson <spencerjp@google.com>
fixes missing await and bad logging references. Signed-off-by: Spencer Peterson <spencerjp@google.com>
cf42636 to
fe75776
Compare
|
Tests are up; ready for review. |
Signed-off-by: Spencer Peterson <spencerjp@google.com>
Signed-off-by: Spencer Peterson <spencerjp@google.com>
Signed-off-by: Spencer Peterson <spencerjp@google.com>
Why are these changes needed?
This new endpoint on the HealthzAgent class combines the status of /api/local_raylet_healthz and /api/gcs_healthz into one endpoint for use with Kubernetes.
Related issue number
See #56204.
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.