Add 429 throttle retry policy#26820
Open
andrewmathew1 wants to merge 4 commits into
Open
Conversation
Adds a dedicated throttle retry policy that retries HTTP 429 responses, honors the x-ms-retry-after-ms response header, and is configurable via ClientOptions.ThrottlingRetryOptions (MaxRetryAttempts, MaxRetryWaitTime). This brings parity with the throttling retry behavior of the .NET, Java, and Python Cosmos SDKs. Fixes Azure#25638 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a Cosmos-specific throttling retry policy to retry HTTP 429 responses, honoring the x-ms-retry-after-ms header with configurable limits exposed via ClientOptions.ThrottlingRetryOptions.
Changes:
- Introduces
throttleRetryPolicyto retry HTTP 429 responses with a capped attempt count and cumulative wait budget. - Wires the new policy into the azcosmos client pipeline and adds
ThrottlingRetryOptionstoClientOptions. - Adds unit tests validating retry behavior and updates the changelog.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/data/azcosmos/cosmos_throttle_retry_policy.go | Implements the dedicated 429 throttling retry policy and retry-after parsing. |
| sdk/data/azcosmos/cosmos_throttle_retry_policy_test.go | Adds unit tests covering success, exhaustion, missing header, and cancellation scenarios. |
| sdk/data/azcosmos/cosmos_http_constants.go | Adds the x-ms-retry-after-ms header constant. |
| sdk/data/azcosmos/cosmos_client.go | Inserts the throttling retry policy into the per-retry pipeline. |
| sdk/data/azcosmos/cosmos_client_options.go | Exposes ThrottlingRetryOptions on ClientOptions. |
| sdk/data/azcosmos/CHANGELOG.md | Records the feature addition. |
The new throttle retry policy retries 429 with Cosmos-specific x-ms-retry-after-ms semantics, but azcore's default retry policy also retries 429. Layered together they compound: with maxRetryAttempts=2 a single 429 produced 12 transport calls (3 throttle attempts x 4 azcore attempts) instead of 3. newClient now copies the embedded ClientOptions and, when the caller hasn't set Retry.StatusCodes or Retry.ShouldRetry, swaps in azcore's default status code set without 429 (408, 500, 502, 503, 504). The throttle policy owns 429 end-to-end. Callers who customize Retry behavior are left untouched. Also replace `time.After` in the retry-wait select with `time.NewTimer` + `Stop()` on cancellation, so cancelled long delays don't leak timers until they fire. New tests: * TestThrottleRetry_RewindsRequestBodyAcrossRetries (POST body is re-readable across retries via the documented RewindBody path). * TestThrottleRetry_NoDoubleRetryWith429 (full newClient pipeline - single 429 with delay produces 1 + N throttle attempts, not the compounded count). * TestThrottleRetry_AzcoreStillRetriesOther5xx (503 is retried by azcore as before; the 429 carve-out doesn't disable the rest). * TestThrottleRetry_CallerStatusCodesPreserved (explicit Retry.StatusCodes from the caller is not overridden). * TestDefaultAzcoreRetryStatusCodesWithout429 (helper sanity). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ry-policy-fixes azcosmos: fix double-retry on 429 and timer leak in throttle policy
Four issues raised by the Copilot reviewer: 1. The doc comment on newThrottleRetryPolicy said "non-positive values for either option fall back to defaults" but a negative MaxRetryAttempts actually disables retries entirely. Updated the comment to describe the real behavior of zero vs. negative. 2. readRetryAfterMs treated `delay <= 0` as "missing header" and fell back to defaultDelay, so an explicit `x-ms-retry-after-ms: 0` (a valid "retry immediately" hint) was silently replaced with a 5s sleep. Changed readRetryAfterMs to return `(duration, ok bool)` and only fall back when `!ok`. Explicit 0 is now honored. 3. The retry path called `azruntime.Drain(response)` before `req.RewindBody()`, so a rewind failure returned an error along with a response whose body was already closed. Reordered so the rewind happens first; if it fails, the caller still receives an intact 429 response for diagnostics. 4. `strconv.ParseFloat` accepts `NaN`/`Inf` and would produce surprising durations when scaled. Added explicit `math.IsNaN` / `math.IsInf` guards alongside the existing negative-value check. Tests: * TestReadRetryAfterMs extended with explicit-zero, nan, positive-inf, and negative-inf cases. Updated to assert the new `(duration, ok)` return. * New TestThrottleRetry_ExplicitZeroRetryAfterIsHonored proves the policy retries immediately on `x-ms-retry-after-ms: 0` with a configured defaultDelay (5s) and a tight cumulative wait budget (100ms): if the fallback were applied, the first retry would either exceed the budget or take longer than the asserted threshold. CHANGELOG: added an Other Changes entry under 1.5.0-beta.7 documenting the new explicit-zero semantics, the NaN/Inf rejection, and the rewind-before-drain ordering. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
simorenoh
reviewed
Jun 1, 2026
|
|
||
| ### Features Added | ||
|
|
||
| * Added a dedicated 429 (Too Many Requests) throttling retry policy that honors the `x-ms-retry-after-ms` response header and is configurable via `ClientOptions.ThrottlingRetryOptions` (`MaxRetryAttempts`, `MaxRetryWaitTime`). This brings parity with the throttling retry behavior in the .NET, Java, and Python Cosmos SDKs. When `ClientOptions.Retry.StatusCodes` and `ClientOptions.Retry.ShouldRetry` are both unset, 429 is no longer in the azcore retry policy's default status codes (it is now handled exclusively by the throttling retry policy); the other transient status codes (408, 500, 502, 503, 504) remain. |
Member
There was a problem hiding this comment.
Suggested change
| * Added a dedicated 429 (Too Many Requests) throttling retry policy that honors the `x-ms-retry-after-ms` response header and is configurable via `ClientOptions.ThrottlingRetryOptions` (`MaxRetryAttempts`, `MaxRetryWaitTime`). This brings parity with the throttling retry behavior in the .NET, Java, and Python Cosmos SDKs. When `ClientOptions.Retry.StatusCodes` and `ClientOptions.Retry.ShouldRetry` are both unset, 429 is no longer in the azcore retry policy's default status codes (it is now handled exclusively by the throttling retry policy); the other transient status codes (408, 500, 502, 503, 504) remain. | |
| * Added a dedicated 429 (Too Many Requests) throttling retry policy that honors the `x-ms-retry-after-ms` response header and is configurable via `ClientOptions.ThrottlingRetryOptions` (`MaxRetryAttempts`, `MaxRetryWaitTime`). When `ClientOptions.Retry.StatusCodes` and `ClientOptions.Retry.ShouldRetry` are both unset, 429 is no longer in the azcore retry policy's default status codes (it is now handled exclusively by the throttling retry policy); the other transient status codes (408, 500, 502, 503, 504) remain. See [PR 26820](https://github.com/Azure/azure-sdk-for-go/pull/26820). |
simorenoh
reviewed
Jun 1, 2026
Member
simorenoh
left a comment
There was a problem hiding this comment.
Just a changelog comment, LGTM otherwise!
|
|
||
| ### Other Changes | ||
|
|
||
| * Throttling retry policy: an explicit `x-ms-retry-after-ms: 0` header is now honored as "retry immediately" instead of being treated as a missing header (which would have applied the default delay). NaN/Inf values for the header are now rejected as invalid. The request body is rewound before the 429 response body is drained so a rewind failure surfaces a usable 429 response to the caller. |
Member
There was a problem hiding this comment.
we probably don't need this since these changes are also in this PR for the first time right?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a dedicated throttle retry policy that retries HTTP 429 responses, honors the x-ms-retry-after-ms response header, and is configurable via ClientOptions.ThrottlingRetryOptions (MaxRetryAttempts, MaxRetryWaitTime). This brings parity with the throttling retry behavior of the .NET, Java, and Python Cosmos SDKs.
Fixes #25638