Cosmos: Add priority-based execution and throughput bucket support#26750
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds request prioritization and throughput-bucket routing to the azcosmos data-plane client by introducing new options at both the client level and per-operation level, and wiring them into the request header pipeline.
Changes:
- Added
PriorityLeveltype (with helpers) and client/per-request option fields to multiple request option structs. - Plumbed priority/throughput bucket values through the header policy via new Cosmos request headers.
- Added/updated tests to validate header emission and override behavior.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/data/azcosmos/cosmos_transactional_batch_options.go | Adds per-batch PriorityLevel/ThroughputBucket options. |
| sdk/data/azcosmos/cosmos_read_many_request_options.go | Adds per-read-many PriorityLevel/ThroughputBucket options. |
| sdk/data/azcosmos/cosmos_query_request_options.go | Adds per-query PriorityLevel/ThroughputBucket options. |
| sdk/data/azcosmos/cosmos_priority_level.go | Introduces PriorityLevel enum-like type plus Values() and ToPtr() helpers. |
| sdk/data/azcosmos/cosmos_priority_level_test.go | Adds unit tests for PriorityLevelValues() and ToPtr(). |
| sdk/data/azcosmos/cosmos_item_request_options.go | Adds per-item-operation PriorityLevel/ThroughputBucket options. |
| sdk/data/azcosmos/cosmos_http_constants.go | Defines the new x-ms-cosmos-priority-level and x-ms-cosmos-throughput-bucket header constants. |
| sdk/data/azcosmos/cosmos_headers_policy.go | Extends the header policy to apply priority/throughput bucket headers from client defaults and per-request overrides. |
| sdk/data/azcosmos/cosmos_headers_policy_test.go | Adds coverage for priority/throughput bucket header behavior in the header policy. |
| sdk/data/azcosmos/cosmos_container.go | Wires per-operation overrides into container item/query/batch/change-feed request contexts; also contains doc-comment typos in changed lines. |
| sdk/data/azcosmos/cosmos_container_read_many.go | Attempts to forward read-many priority/throughput settings through query options (currently not fully applied). |
| sdk/data/azcosmos/cosmos_client.go | Applies client-level defaults into the header policy and allow-lists the new headers. |
| sdk/data/azcosmos/cosmos_client_options.go | Adds client-level PriorityLevel and ThroughputBucket defaults. |
| sdk/data/azcosmos/cosmos_change_feed_request_options.go | Adds per-change-feed PriorityLevel/ThroughputBucket options. |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Delete queryOpts.PriorityLevel/ThroughputBucket assignments in executeReadManyWithQueries (headers flow via headerOptionsOverride) - Remove duplicate if-err-nil block in TestThroughputBucketHeaderNotSetWhenNil Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds 5 tests that exercise the full path from public API options through headerPolicies to outbound HTTP headers: - CreateItem (ItemOptions) - NewQueryItemsPager (QueryOptions) - ExecuteTransactionalBatch (TransactionalBatchOptions) - GetChangeFeed (ChangeFeedOptions) - ReadManyItems (ReadManyOptions) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
kushagraThapar
left a comment
There was a problem hiding this comment.
Thanks @simorenoh — really clean implementation and the test coverage is great. All 4 Copilot bot comments look addressed in the latest commits, and I really appreciate that the E2E tests verify headers reach the wire for all 5 op types (Item / Query / Batch / ChangeFeed / ReadMany), including the chunked ReadMany dispatch path which is the easy spot to get wrong.
A couple of optional polish items, none blocking — would love your thoughts:
1. PriorityLevel accepts arbitrary string values
type PriorityLevel string will silently let PriorityLevel("Bogus") flow to the service. I am curious how you'd like to handle this — peer SDKs take different approaches:
- .NET: true
enum PriorityLevel { High = 1, Low = 2 }— compile-time safety - Java:
PriorityLevel.fromString()throwsIllegalArgumentExceptionon invalid input - Python: loose (matches what we have here in Go)
The service will reject invalid values with a clear error so this is safe today, but could you please consider adding a small IsValid() helper, or at minimum a doc comment listing the valid constants (PriorityLevelHigh, PriorityLevelLow)? Either is fine with me — just want users to have a bit more guidance.
2. ThroughputBucket has no range hint
The service expects values in the range 1-5. Neither Go nor .NET validates this client-side, but could you please add a one-line doc comment on the ThroughputBucket *int32 field on the relevant options structs (and ClientOptions) calling out the valid range? It would save users a round-trip when they pick the wrong value.
Minor doc nit
Could you please consider mirroring .NET's doc comment on cosmos_priority_level.go which links to https://aka.ms/CosmosDB/PriorityBasedExecution and explains the High/Low semantics (low-priority requests get throttled first when over RU/s budget)? Helps users discover what PBE is and when to enable it.
LGTM otherwise — happy to approve once you've had a chance to look at #1. Thanks again!
- Add PBE reference link and High/Low semantics to PriorityLevel type doc - Document valid PriorityLevel values on all option struct fields - Document valid ThroughputBucket range (1-5) on all option struct fields Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
kushagraThapar
left a comment
There was a problem hiding this comment.
Thanks @simorenoh for the thorough follow-up — this is a really clean iteration. I walked through each previously raised item against the latest commit (cc5b727a) and everything checks out:
From the Copilot bot review:
- ✅ ReadMany header wiring —
ReadManyItemsnow setsoperationContext.headerOptionsOverrideand that flows throughexecuteReadManyWithQueries→executeQueryChunks→executeOneChunk, so every chunked request inherits the priority/throughput-bucket overrides. - ✅ Doc-comment typos on
NewQueryItemsPagerandNewTransactionalBatch— fixed. - ✅
cosmos_priority_level_test.gomigrated totestify/require.
From my prior review:
- ✅
ThroughputBucket— "Valid range is 1 to 5 (inclusive)" doc-comment now onClientOptions,ItemOptions,QueryOptions,ReadManyOptions,ChangeFeedOptions, andTransactionalBatchOptions. Lovely consistency. - ✅
PriorityLeveldoc —cosmos_priority_level.gonow links tohttps://aka.ms/CosmosDB/PriorityBasedExecutionand clearly states the High/Low semantics. ThePriorityLevelValues()helper plus "Valid values are PriorityLevelHigh and PriorityLevelLow" doc comments on every options struct give users plenty of guidance. - 🟢
IsValid()— still aconsider, not blocking; I'm happy with the doc-comment-only path you chose.
End-to-end test coverage is particularly nice:
I am curious to learn how you decided on the chunked-path coverage for TestReadManyItemsPriorityAndThroughputBucketHeaders — the fact that you exercise the override semantics across all 5 op types (Item / Query / Batch / ChangeFeed / ReadMany) and assert exact wire-level header values gives me a lot of confidence the policy will keep emitting these headers as the code evolves. The header-policy unit tests with client-default vs. per-request override permutations are a great complement to that.
Other things I double-checked:
headerPoliciesVerifystruct field alignment looks correct (gofmt-clean).- Both
cosmosHeaderPriorityLevelandcosmosHeaderThroughputBucketare in theAllowedHeaderslist so they show up in diagnostics. - CHANGELOG entry is present and links back to this PR.
- CI is fully green across Linux + Windows on Go 1.25.8 and 1.26.1.
Great job — approving. Thanks again for taking the time to address everything so carefully!
tvaron3
left a comment
There was a problem hiding this comment.
LGTM few non blocking comments
- Add preview service note with docs link to ClientOptions PriorityLevel and ThroughputBucket fields - Add e2e header tests for UpsertItem, ReplaceItem, ReadItem, DeleteItem, and PatchItem operations Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…put options post-merge - (X1) Add hard cap on drain-loop rotation budget (4*(initialLen+1)) matching Java's FeedRangeCompositeContinuationImpl defense. Prevents unbounded loops if overlap-resolution ever produces duplicate ranges. - (X2) Add focused table-driven unit tests for the overlap helpers: TestNormalizeMaxBoundary, TestRangesOverlap_TableDriven (10 cases), TestOverlappingPartitionKeyRanges_TableDriven (11 cases covering exact- match, multi-overlap, sub-range, straddling, empty/inverted/equal feed ranges, boundary touches). - (X3) PopulateCompositeContinuationToken (public API) is now a no-op when ContinuationToken is already populated, preventing the multi-range composite token written by the drain loop from being overwritten with a stale single-range token rebuilt from the per-head FeedRange. Post-merge with upstream/main (Azure#26750): - Thread options.PriorityLevel and options.ThroughputBucket through the F1 drain loop's pipelineRequestOptions.headerOptionsOverride so the new priority/throughput-bucket support added in Azure#26750 reaches the change-feed request. - Rewire TestChangeFeedPriorityAndThroughputBucketHeaders to use createChangeFeedTestClient (caches pre-wired) since the cache-nil fallback was removed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…one retry, multi-range continuation queue (#26767) (#26768) * fix(azcosmos): GetChangeFeed split-resilient via overlap match, 410/Gone retry, multi-range continuation queue (#26767) GetChangeFeed previously failed (silently or loudly) when a physical partition split occurred under a customer-supplied FeedRange: * toHeaders did exact-equality matching against the cached PK ranges, so a parent FeedRange that no longer matched any current child fell through to a request issued WITHOUT the partition-key-range-id header (silent miss). * 410/Gone with the PartitionKeyRangeGone substatus surfaced directly to the caller; the underlying maxPKRangeGoneRetries / forceRefresh plumbing added in #26723 was never wired into the change feed. * The composite continuation token only encoded a single EPK range, so resuming after a split lost events on the boundary. This change rewires GetChangeFeed onto a queue-driven drain loop: * buildChangeFeedInitialQueue resolves the customer FeedRange (or inbound continuation token) by overlap-match against the routing map, expanding a parent FeedRange into N children inheriting the parent ETag. * getChangeFeedForQueue drains the queue FIFO. On each iteration it re-resolves the head against the cached snapshot, detects mid-drain splits and bumps the rotation budget so newly-inserted children get queried in the same call. * On 410/Gone with PK-range substatus, the cache is refreshed, the routing snapshot is re-fetched, and the head is retried; capped at maxPKRangeGoneRetries. * The composite continuation token now carries the full multi-range queue, with each entry's ETag advanced FIFO on every response. * Cross-container token reuse is rejected loudly: if a token's ResourceID does not match the current container's _rid, the call fails with a diagnostic error rather than silently misrouting against the wrong routing map. Behavior preserved for callers without the cache infrastructure wired: when no PK-range cache is present and the direct fetch returns no ranges (test/mock case), a single passthrough queue entry is used so the legacy issue-without-overlap-match flow still runs. Tests added in cosmos_change_feed_split_test.go cover: 410-retry, retry cap surfacing the final 410, cross-container ResourceID rejection, no-overlap-after-refresh -> ErrFeedRangeUnresolved, split expansion routing the first child, multi-range continuation round-trip, and the mid-drain split-during-drain budget bump regression guard. All three existing change-feed tests continue to pass. Closes #26767 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * azcosmos: address reviewer round-1 findings on F1 change-feed split handling - Capture drain-loop error in GetChangeFeed so deferred span observes failures (#1) - Reset 410/Gone budget on both split-expansion and head-advance (#2) - Move resp.Body.Close() above 304 short-circuit in newChangeFeedResponse (#7) - Drop dead empty-feedMax branch in cosmos_feed_range.go (#8) - Remove redundant token.head() reload in drain loop (#9) - Reduce buildChangeFeedInitialQueue to 3 returns (#10) - buildRequestHeaders surfaces PartitionKey serialization errors (#11) - Cache-nil + no-routing degraded fallback: log warning, drop continuation token, and read fresh with passthrough range instead of erroring; preserves back-compat for tests built without caches wired up - CHANGELOG link points to PR Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * azcosmos: drop redundant head reload after dropHeadContinuation Round-2 reviewer cleanup. dropHeadContinuation mutates Continuation[0] in place; the existing head pointer reflects the change without reload. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * azcosmos: tighten GetChangeFeed doc + CHANGELOG entry Condenses the GetChangeFeed numbered behavior list to a short paragraph and trims the CHANGELOG entry. No behavior change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * azcosmos: relocate overlap helper, delete legacy toHeaders + findPartitionKeyRangeID - Move overlappingPartitionKeyRanges into cosmos_feed_range.go next to its sibling findOverlappingPartitionKeyRangeIDs and simplify to a single pass (no map round-trip) - Delete the legacy ChangeFeedOptions.toHeaders builder and its test file; the new GetChangeFeed path uses buildRequestHeaders exclusively - Delete the now-unused findPartitionKeyRangeID exact-match helper (the original silent-mismatch source); overlap-resolution lives upstream now - Drop the small toHeaders-tail test in TestContainerGetChangeFeedWithStartFrom that only verified the legacy header builder Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * azcosmos: split GetChangeFeed into its own file + restore options test coverage - Move GetChangeFeed, buildChangeFeedInitialQueue, resolveFeedRangeToChildren, buildChildQueueEntries, getChangeFeedForQueue, and serializeCompositeContinuationToken out of cosmos_container.go into cosmos_container_change_feed.go (mirrors the cosmos_container_read_many.go split). Pure relocation, no behavior change. - Add cosmos_change_feed_request_options_test.go covering buildRequestHeaders: defaults (only AIM header), all-fields population, empty-ETag continuation is omitted, and PartitionKey serialization errors are surfaced (the new error-returning contract the legacy toHeaders silently swallowed). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * azcosmos: add emulator change-feed tests for F1 split-handling scenarios Three new emulator tests targeting F1-specific paths not covered by the existing TestEmulatorContainerChangeFeed: - SubRangeOfPhysicalRange — original-bug repro: a customer FeedRange that is a strict sub-range of a physical pkrange (no exact-match) must resolve via overlap-match and emit a request. - WideRangeMultiDrain — wide FeedRange overlapping every physical range; paginates via the composite continuation token until all inserted items are drained, exercising the multi-range queue rotation. - CrossContainerTokenRejected — token from container-A handed to container-B must be rejected by the ResourceID guard. All three skip when EMULATOR is unset, matching repo convention. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * azcosmos: fix lint failures from CI - Drop ineffectual headFeedRange reassignment in degraded fallback (it was never read after the assignment) - Delete unused findOverlappingPartitionKeyRangeIDs (overlappingPartitionKeyRanges superseded it; no remaining callers) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * azcosmos: keep GetChangeFeed entrypoint in cosmos_container.go Helpers (buildChangeFeedInitialQueue, resolveFeedRangeToChildren, buildChildQueueEntries, getChangeFeedForQueue, serializeCompositeContinuationToken) remain in cosmos_container_change_feed.go; only the public method moves back so the canonical container file continues to host the public API surface. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * azcosmos: address PR review comments on F1 change-feed Round-2 reviewer findings from kushagraThapar plus a CHANGELOG link revert: - (#2) Revert Copilot autofix that changed the CHANGELOG PR link from 26768 (this PR) to 26767 (the issue it closes). - (#4) Clamp child ranges to the parent token's bounds during split- expansion (matches Java FeedRangeCompositeContinuationImpl.createChildRanges). Children that fall entirely outside the parent are dropped. New clampChildrenToParent helper + TestClampChildrenToParent. - (#5) Populate token.ResourceID at queue-construction time (both Path A and Path B) so the cross-container guard remains meaningful even when the very first response is a 304 (no body parsed). - (#6) Replace misleading 'legacy ETag-only continuation handled by buildRequestHeaders' comment with an accurate one stating that only composite tokens issued by GetChangeFeed are honored on resume; legacy raw-ETag continuation strings are not supported. - (#7) Surface partial drain state alongside the 410-budget-exhausted error so callers can resume from the failed head instead of re-querying drained heads. New TestGetChangeFeed_410BudgetExhaustedMidDrain_SurfacesPartialState. - (#8) Aggregate RequestCharge across drain iterations so the returned response reports total RU consumed (matches cosmos_container_read_many pattern). - (#9) Remove the cache-nil degraded fallback (production always wires caches via acquireCaches; the fallback existed only for legacy tests built without caches). Rewire the 3 legacy TestContainerGetChangeFeed* tests to use createChangeFeedTestClient with pre-seeded caches. Removes now-unused dropHeadContinuation helper. - (#10) Belt-and-suspenders: also accept len(response.Documents) > 0 in the early-return condition so a response missing _count but carrying docs isn't silently re-drained. - (#11) Add explicit ctx.Done() check at the top of the drain loop so a cancelled context exits between sub-requests rather than waiting for transport-level cancellation. New TestGetChangeFeed_ContextCancelled_HonoredBetweenSubRequests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * azcosmos: address xinlian12 review comments + thread priority/throughput options post-merge - (X1) Add hard cap on drain-loop rotation budget (4*(initialLen+1)) matching Java's FeedRangeCompositeContinuationImpl defense. Prevents unbounded loops if overlap-resolution ever produces duplicate ranges. - (X2) Add focused table-driven unit tests for the overlap helpers: TestNormalizeMaxBoundary, TestRangesOverlap_TableDriven (10 cases), TestOverlappingPartitionKeyRanges_TableDriven (11 cases covering exact- match, multi-overlap, sub-range, straddling, empty/inverted/equal feed ranges, boundary touches). - (X3) PopulateCompositeContinuationToken (public API) is now a no-op when ContinuationToken is already populated, preventing the multi-range composite token written by the drain loop from being overwritten with a stale single-range token rebuilt from the per-head FeedRange. Post-merge with upstream/main (#26750): - Thread options.PriorityLevel and options.ThroughputBucket through the F1 drain loop's pipelineRequestOptions.headerOptionsOverride so the new priority/throughput-bucket support added in #26750 reaches the change-feed request. - Rewire TestChangeFeedPriorityAndThroughputBucketHeaders to use createChangeFeedTestClient (caches pre-wired) since the cache-nil fallback was removed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Adds PriorityLevel and ThroughputBucket options at both client and per-request level for all data-plane
operations.