-
Notifications
You must be signed in to change notification settings - Fork 664
[APIServer][Docs] Add user guide for retry behavior & configuration #4144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
rueian
merged 20 commits into
ray-project:master
from
justinyeh1995:docs/3883-add-apiserver-rety-to-doc
Nov 20, 2025
Merged
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
30234a2
[Docs] Add the draft description about feature intro, configurations,…
justinyeh1995 a746d50
Merge branch 'ray-project:master' into docs/3883-add-apiserver-rety-t…
justinyeh1995 14638bd
[Fix] Update the retry walk-through
justinyeh1995 fcfcdf4
Merge branch 'master' of https://github.com/ray-project/kuberay into …
justinyeh1995 8287448
[Doc] rewrite the first 2 sections
justinyeh1995 8533fb3
Merge branch 'master' of https://github.com/ray-project/kuberay into …
justinyeh1995 6a85ad3
Merge branch 'master' of https://github.com/ray-project/kuberay into …
justinyeh1995 5911cc4
Merge branch 'master' of https://github.com/ray-project/kuberay into …
justinyeh1995 f656a35
[Doc] Revise documentation wording and add Observing Retry Behavior s…
justinyeh1995 67c1476
[Fix] fix linting issue by running pre-commit run berfore commiting
justinyeh1995 da763de
[Fix] fix linting errors in the Markdown linting
justinyeh1995 9f9e3f4
Merge branch 'master' of https://github.com/ray-project/kuberay into …
justinyeh1995 fb4874a
[Fix] Clean up the math equation
justinyeh1995 9ed4b17
Update the math formula of Backoff calculation.
justinyeh1995 7640567
[Fix] Explicitly mentioned exponential backoff and removed the custom…
justinyeh1995 9a1e786
[Docs] Clarify naming by replacing “APIServer” with “KubeRay APIServer”
justinyeh1995 784228e
[Docs] Rename retry-configuration.md to retry-behavior.md for accuracy
justinyeh1995 5d58086
Update Title to KubeRay APIServer Retry Behavior
justinyeh1995 3e9b06b
[Docs] Add a note about the limitation of retry configuration
justinyeh1995 6a5e883
Merge branch 'master' of https://github.com/ray-project/kuberay into …
justinyeh1995 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| # KubeRay APIServer Retry Behavior | ||
|
|
||
| The KubeRay APIServer automatically retries failed requests to the Kubernetes API when transient errors occur. | ||
| This built-in mechanism uses exponential backoff to improve reliability without requiring manual intervention. | ||
| As of `v1.5.0`, the retry configuration is hard-coded and cannot be customized. | ||
| This guide describes the default retry behavior. | ||
|
|
||
| ## Default Retry Behavior | ||
|
|
||
| The KubeRay APIServer automatically retries with exponential backoff for these HTTP status codes: | ||
|
|
||
| - 408 (Request Timeout) | ||
| - 429 (Too Many Requests) | ||
| - 500 (Internal Server Error) | ||
| - 502 (Bad Gateway) | ||
| - 503 (Service Unavailable) | ||
| - 504 (Gateway Timeout) | ||
|
|
||
| Note that non-retryable errors (4xx except 408/429) fail immediately without retries. | ||
|
|
||
| The following default configuration explains how retry works: | ||
|
|
||
| - **MaxRetry**: 3 retries (4 total attempts including the initial one) | ||
| - **InitBackoff**: 500ms (initial wait time) | ||
| - **BackoffFactor**: 2.0 (exponential multiplier) | ||
| - **MaxBackoff**: 10s (maximum wait time between retries) | ||
| - **OverallTimeout**: 30s (total timeout for all attempts) | ||
|
|
||
| which means $$\text{Backoff}_i = \min(\text{InitBackoff} \times \text{BackoffFactor}^i, \text{MaxBackoff})$$ | ||
|
|
||
| where $i$ is the attempt number (starting from 0). | ||
| The retries will stop if the total time exceeds the `OverallTimeout`. | ||
rueian marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.