Skip to content

Cosmos: Add priority-based execution and throughput bucket support#26750

Merged
simorenoh merged 11 commits into
Azure:mainfrom
simorenoh:throughput-bucket-pbe
May 14, 2026
Merged

Cosmos: Add priority-based execution and throughput bucket support#26750
simorenoh merged 11 commits into
Azure:mainfrom
simorenoh:throughput-bucket-pbe

Conversation

@simorenoh
Copy link
Copy Markdown
Member

Adds PriorityLevel and ThroughputBucket options at both client and per-request level for all data-plane
operations.

Copilot AI review requested due to automatic review settings May 11, 2026 21:03
@simorenoh simorenoh requested a review from a team as a code owner May 11, 2026 21:03
@simorenoh simorenoh added Cosmos Client This issue points to a problem in the data-plane of the library. labels May 11, 2026
@simorenoh simorenoh changed the title Add priority-based execution and throughput bucket support to azcosmos Cosmos: Add priority-based execution and throughput bucket support May 11, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds request prioritization and throughput-bucket routing to the azcosmos data-plane client by introducing new options at both the client level and per-operation level, and wiring them into the request header pipeline.

Changes:

  • Added PriorityLevel type (with helpers) and client/per-request option fields to multiple request option structs.
  • Plumbed priority/throughput bucket values through the header policy via new Cosmos request headers.
  • Added/updated tests to validate header emission and override behavior.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
sdk/data/azcosmos/cosmos_transactional_batch_options.go Adds per-batch PriorityLevel/ThroughputBucket options.
sdk/data/azcosmos/cosmos_read_many_request_options.go Adds per-read-many PriorityLevel/ThroughputBucket options.
sdk/data/azcosmos/cosmos_query_request_options.go Adds per-query PriorityLevel/ThroughputBucket options.
sdk/data/azcosmos/cosmos_priority_level.go Introduces PriorityLevel enum-like type plus Values() and ToPtr() helpers.
sdk/data/azcosmos/cosmos_priority_level_test.go Adds unit tests for PriorityLevelValues() and ToPtr().
sdk/data/azcosmos/cosmos_item_request_options.go Adds per-item-operation PriorityLevel/ThroughputBucket options.
sdk/data/azcosmos/cosmos_http_constants.go Defines the new x-ms-cosmos-priority-level and x-ms-cosmos-throughput-bucket header constants.
sdk/data/azcosmos/cosmos_headers_policy.go Extends the header policy to apply priority/throughput bucket headers from client defaults and per-request overrides.
sdk/data/azcosmos/cosmos_headers_policy_test.go Adds coverage for priority/throughput bucket header behavior in the header policy.
sdk/data/azcosmos/cosmos_container.go Wires per-operation overrides into container item/query/batch/change-feed request contexts; also contains doc-comment typos in changed lines.
sdk/data/azcosmos/cosmos_container_read_many.go Attempts to forward read-many priority/throughput settings through query options (currently not fully applied).
sdk/data/azcosmos/cosmos_client.go Applies client-level defaults into the header policy and allow-lists the new headers.
sdk/data/azcosmos/cosmos_client_options.go Adds client-level PriorityLevel and ThroughputBucket defaults.
sdk/data/azcosmos/cosmos_change_feed_request_options.go Adds per-change-feed PriorityLevel/ThroughputBucket options.

Comment thread sdk/data/azcosmos/cosmos_container_read_many.go
Comment thread sdk/data/azcosmos/cosmos_container.go Outdated
Comment thread sdk/data/azcosmos/cosmos_container.go Outdated
Comment thread sdk/data/azcosmos/cosmos_priority_level_test.go Outdated
simorenoh and others added 6 commits May 11, 2026 17:41
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Delete queryOpts.PriorityLevel/ThroughputBucket assignments in
  executeReadManyWithQueries (headers flow via headerOptionsOverride)
- Remove duplicate if-err-nil block in TestThroughputBucketHeaderNotSetWhenNil

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds 5 tests that exercise the full path from public API options
through headerPolicies to outbound HTTP headers:
- CreateItem (ItemOptions)
- NewQueryItemsPager (QueryOptions)
- ExecuteTransactionalBatch (TransactionalBatchOptions)
- GetChangeFeed (ChangeFeedOptions)
- ReadManyItems (ReadManyOptions)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Member

@kushagraThapar kushagraThapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @simorenoh — really clean implementation and the test coverage is great. All 4 Copilot bot comments look addressed in the latest commits, and I really appreciate that the E2E tests verify headers reach the wire for all 5 op types (Item / Query / Batch / ChangeFeed / ReadMany), including the chunked ReadMany dispatch path which is the easy spot to get wrong.

A couple of optional polish items, none blocking — would love your thoughts:

1. PriorityLevel accepts arbitrary string values

type PriorityLevel string will silently let PriorityLevel("Bogus") flow to the service. I am curious how you'd like to handle this — peer SDKs take different approaches:

  • .NET: true enum PriorityLevel { High = 1, Low = 2 } — compile-time safety
  • Java: PriorityLevel.fromString() throws IllegalArgumentException on invalid input
  • Python: loose (matches what we have here in Go)

The service will reject invalid values with a clear error so this is safe today, but could you please consider adding a small IsValid() helper, or at minimum a doc comment listing the valid constants (PriorityLevelHigh, PriorityLevelLow)? Either is fine with me — just want users to have a bit more guidance.

2. ThroughputBucket has no range hint

The service expects values in the range 1-5. Neither Go nor .NET validates this client-side, but could you please add a one-line doc comment on the ThroughputBucket *int32 field on the relevant options structs (and ClientOptions) calling out the valid range? It would save users a round-trip when they pick the wrong value.

Minor doc nit

Could you please consider mirroring .NET's doc comment on cosmos_priority_level.go which links to https://aka.ms/CosmosDB/PriorityBasedExecution and explains the High/Low semantics (low-priority requests get throttled first when over RU/s budget)? Helps users discover what PBE is and when to enable it.

LGTM otherwise — happy to approve once you've had a chance to look at #1. Thanks again!

- Add PBE reference link and High/Low semantics to PriorityLevel type doc
- Document valid PriorityLevel values on all option struct fields
- Document valid ThroughputBucket range (1-5) on all option struct fields

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Member

@kushagraThapar kushagraThapar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @simorenoh for the thorough follow-up — this is a really clean iteration. I walked through each previously raised item against the latest commit (cc5b727a) and everything checks out:

From the Copilot bot review:

  • ✅ ReadMany header wiring — ReadManyItems now sets operationContext.headerOptionsOverride and that flows through executeReadManyWithQueriesexecuteQueryChunksexecuteOneChunk, so every chunked request inherits the priority/throughput-bucket overrides.
  • ✅ Doc-comment typos on NewQueryItemsPager and NewTransactionalBatch — fixed.
  • cosmos_priority_level_test.go migrated to testify/require.

From my prior review:

  • ThroughputBucket — "Valid range is 1 to 5 (inclusive)" doc-comment now on ClientOptions, ItemOptions, QueryOptions, ReadManyOptions, ChangeFeedOptions, and TransactionalBatchOptions. Lovely consistency.
  • PriorityLevel doc — cosmos_priority_level.go now links to https://aka.ms/CosmosDB/PriorityBasedExecution and clearly states the High/Low semantics. The PriorityLevelValues() helper plus "Valid values are PriorityLevelHigh and PriorityLevelLow" doc comments on every options struct give users plenty of guidance.
  • 🟢 IsValid() — still a consider, not blocking; I'm happy with the doc-comment-only path you chose.

End-to-end test coverage is particularly nice:
I am curious to learn how you decided on the chunked-path coverage for TestReadManyItemsPriorityAndThroughputBucketHeaders — the fact that you exercise the override semantics across all 5 op types (Item / Query / Batch / ChangeFeed / ReadMany) and assert exact wire-level header values gives me a lot of confidence the policy will keep emitting these headers as the code evolves. The header-policy unit tests with client-default vs. per-request override permutations are a great complement to that.

Other things I double-checked:

  • headerPoliciesVerify struct field alignment looks correct (gofmt-clean).
  • Both cosmosHeaderPriorityLevel and cosmosHeaderThroughputBucket are in the AllowedHeaders list so they show up in diagnostics.
  • CHANGELOG entry is present and links back to this PR.
  • CI is fully green across Linux + Windows on Go 1.25.8 and 1.26.1.

Great job — approving. Thanks again for taking the time to address everything so carefully!

Copy link
Copy Markdown
Member

@tvaron3 tvaron3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM few non blocking comments

Comment thread sdk/data/azcosmos/cosmos_client_options.go
Comment thread sdk/data/azcosmos/cosmos_container_test.go
simorenoh and others added 3 commits May 14, 2026 12:05
- Add preview service note with docs link to ClientOptions PriorityLevel
  and ThroughputBucket fields
- Add e2e header tests for UpsertItem, ReplaceItem, ReadItem, DeleteItem,
  and PatchItem operations

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@simorenoh simorenoh merged commit 51bf8c7 into Azure:main May 14, 2026
20 checks passed
tvaron3 added a commit to tvaron3/azure-sdk-for-go that referenced this pull request May 14, 2026
…put options post-merge

- (X1) Add hard cap on drain-loop rotation budget (4*(initialLen+1))
  matching Java's FeedRangeCompositeContinuationImpl defense. Prevents
  unbounded loops if overlap-resolution ever produces duplicate ranges.
- (X2) Add focused table-driven unit tests for the overlap helpers:
  TestNormalizeMaxBoundary, TestRangesOverlap_TableDriven (10 cases),
  TestOverlappingPartitionKeyRanges_TableDriven (11 cases covering exact-
  match, multi-overlap, sub-range, straddling, empty/inverted/equal
  feed ranges, boundary touches).
- (X3) PopulateCompositeContinuationToken (public API) is now a no-op
  when ContinuationToken is already populated, preventing the multi-range
  composite token written by the drain loop from being overwritten with
  a stale single-range token rebuilt from the per-head FeedRange.

Post-merge with upstream/main (Azure#26750):
- Thread options.PriorityLevel and options.ThroughputBucket through the
  F1 drain loop's pipelineRequestOptions.headerOptionsOverride so the
  new priority/throughput-bucket support added in Azure#26750 reaches the
  change-feed request.
- Rewire TestChangeFeedPriorityAndThroughputBucketHeaders to use
  createChangeFeedTestClient (caches pre-wired) since the cache-nil
  fallback was removed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
simorenoh pushed a commit that referenced this pull request May 15, 2026
…one retry, multi-range continuation queue (#26767) (#26768)

* fix(azcosmos): GetChangeFeed split-resilient via overlap match, 410/Gone retry, multi-range continuation queue (#26767)

GetChangeFeed previously failed (silently or loudly) when a physical
partition split occurred under a customer-supplied FeedRange:

  * toHeaders did exact-equality matching against the cached PK ranges,
    so a parent FeedRange that no longer matched any current child fell
    through to a request issued WITHOUT the partition-key-range-id
    header (silent miss).
  * 410/Gone with the PartitionKeyRangeGone substatus surfaced directly
    to the caller; the underlying maxPKRangeGoneRetries / forceRefresh
    plumbing added in #26723 was never wired into the change feed.
  * The composite continuation token only encoded a single EPK range,
    so resuming after a split lost events on the boundary.

This change rewires GetChangeFeed onto a queue-driven drain loop:

  * buildChangeFeedInitialQueue resolves the customer FeedRange (or
    inbound continuation token) by overlap-match against the routing
    map, expanding a parent FeedRange into N children inheriting the
    parent ETag.
  * getChangeFeedForQueue drains the queue FIFO. On each iteration it
    re-resolves the head against the cached snapshot, detects mid-drain
    splits and bumps the rotation budget so newly-inserted children
    get queried in the same call.
  * On 410/Gone with PK-range substatus, the cache is refreshed, the
    routing snapshot is re-fetched, and the head is retried; capped at
    maxPKRangeGoneRetries.
  * The composite continuation token now carries the full multi-range
    queue, with each entry's ETag advanced FIFO on every response.
  * Cross-container token reuse is rejected loudly: if a token's
    ResourceID does not match the current container's _rid, the call
    fails with a diagnostic error rather than silently misrouting
    against the wrong routing map.

Behavior preserved for callers without the cache infrastructure wired:
when no PK-range cache is present and the direct fetch returns no
ranges (test/mock case), a single passthrough queue entry is used so
the legacy issue-without-overlap-match flow still runs.

Tests added in cosmos_change_feed_split_test.go cover: 410-retry, retry
cap surfacing the final 410, cross-container ResourceID rejection,
no-overlap-after-refresh -> ErrFeedRangeUnresolved, split expansion
routing the first child, multi-range continuation round-trip, and the
mid-drain split-during-drain budget bump regression guard. All three
existing change-feed tests continue to pass.

Closes #26767

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* azcosmos: address reviewer round-1 findings on F1 change-feed split handling

- Capture drain-loop error in GetChangeFeed so deferred span observes failures (#1)
- Reset 410/Gone budget on both split-expansion and head-advance (#2)
- Move resp.Body.Close() above 304 short-circuit in newChangeFeedResponse (#7)
- Drop dead empty-feedMax branch in cosmos_feed_range.go (#8)
- Remove redundant token.head() reload in drain loop (#9)
- Reduce buildChangeFeedInitialQueue to 3 returns (#10)
- buildRequestHeaders surfaces PartitionKey serialization errors (#11)
- Cache-nil + no-routing degraded fallback: log warning, drop continuation
  token, and read fresh with passthrough range instead of erroring; preserves
  back-compat for tests built without caches wired up
- CHANGELOG link points to PR

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* azcosmos: drop redundant head reload after dropHeadContinuation

Round-2 reviewer cleanup. dropHeadContinuation mutates Continuation[0]
in place; the existing head pointer reflects the change without reload.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* azcosmos: tighten GetChangeFeed doc + CHANGELOG entry

Condenses the GetChangeFeed numbered behavior list to a short paragraph
and trims the CHANGELOG entry. No behavior change.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* azcosmos: relocate overlap helper, delete legacy toHeaders + findPartitionKeyRangeID

- Move overlappingPartitionKeyRanges into cosmos_feed_range.go next to its
  sibling findOverlappingPartitionKeyRangeIDs and simplify to a single
  pass (no map round-trip)
- Delete the legacy ChangeFeedOptions.toHeaders builder and its test file;
  the new GetChangeFeed path uses buildRequestHeaders exclusively
- Delete the now-unused findPartitionKeyRangeID exact-match helper (the
  original silent-mismatch source); overlap-resolution lives upstream now
- Drop the small toHeaders-tail test in TestContainerGetChangeFeedWithStartFrom
  that only verified the legacy header builder

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* azcosmos: split GetChangeFeed into its own file + restore options test coverage

- Move GetChangeFeed, buildChangeFeedInitialQueue, resolveFeedRangeToChildren,
  buildChildQueueEntries, getChangeFeedForQueue, and serializeCompositeContinuationToken
  out of cosmos_container.go into cosmos_container_change_feed.go (mirrors the
  cosmos_container_read_many.go split). Pure relocation, no behavior change.
- Add cosmos_change_feed_request_options_test.go covering buildRequestHeaders:
  defaults (only AIM header), all-fields population, empty-ETag continuation
  is omitted, and PartitionKey serialization errors are surfaced (the new
  error-returning contract the legacy toHeaders silently swallowed).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* azcosmos: add emulator change-feed tests for F1 split-handling scenarios

Three new emulator tests targeting F1-specific paths not covered by the
existing TestEmulatorContainerChangeFeed:

- SubRangeOfPhysicalRange — original-bug repro: a customer FeedRange that
  is a strict sub-range of a physical pkrange (no exact-match) must
  resolve via overlap-match and emit a request.
- WideRangeMultiDrain — wide FeedRange overlapping every physical range;
  paginates via the composite continuation token until all inserted items
  are drained, exercising the multi-range queue rotation.
- CrossContainerTokenRejected — token from container-A handed to
  container-B must be rejected by the ResourceID guard.

All three skip when EMULATOR is unset, matching repo convention.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* azcosmos: fix lint failures from CI

- Drop ineffectual headFeedRange reassignment in degraded fallback (it was
  never read after the assignment)
- Delete unused findOverlappingPartitionKeyRangeIDs (overlappingPartitionKeyRanges
  superseded it; no remaining callers)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* azcosmos: keep GetChangeFeed entrypoint in cosmos_container.go

Helpers (buildChangeFeedInitialQueue, resolveFeedRangeToChildren,
buildChildQueueEntries, getChangeFeedForQueue, serializeCompositeContinuationToken)
remain in cosmos_container_change_feed.go; only the public method moves back
so the canonical container file continues to host the public API surface.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* azcosmos: address PR review comments on F1 change-feed

Round-2 reviewer findings from kushagraThapar plus a CHANGELOG link revert:

- (#2) Revert Copilot autofix that changed the CHANGELOG PR link from
  26768 (this PR) to 26767 (the issue it closes).
- (#4) Clamp child ranges to the parent token's bounds during split-
  expansion (matches Java FeedRangeCompositeContinuationImpl.createChildRanges).
  Children that fall entirely outside the parent are dropped. New
  clampChildrenToParent helper + TestClampChildrenToParent.
- (#5) Populate token.ResourceID at queue-construction time (both Path A
  and Path B) so the cross-container guard remains meaningful even when
  the very first response is a 304 (no body parsed).
- (#6) Replace misleading 'legacy ETag-only continuation handled by
  buildRequestHeaders' comment with an accurate one stating that only
  composite tokens issued by GetChangeFeed are honored on resume; legacy
  raw-ETag continuation strings are not supported.
- (#7) Surface partial drain state alongside the 410-budget-exhausted
  error so callers can resume from the failed head instead of re-querying
  drained heads. New TestGetChangeFeed_410BudgetExhaustedMidDrain_SurfacesPartialState.
- (#8) Aggregate RequestCharge across drain iterations so the returned
  response reports total RU consumed (matches cosmos_container_read_many
  pattern).
- (#9) Remove the cache-nil degraded fallback (production always wires
  caches via acquireCaches; the fallback existed only for legacy tests
  built without caches). Rewire the 3 legacy TestContainerGetChangeFeed*
  tests to use createChangeFeedTestClient with pre-seeded caches.
  Removes now-unused dropHeadContinuation helper.
- (#10) Belt-and-suspenders: also accept len(response.Documents) > 0 in
  the early-return condition so a response missing _count but carrying
  docs isn't silently re-drained.
- (#11) Add explicit ctx.Done() check at the top of the drain loop so a
  cancelled context exits between sub-requests rather than waiting for
  transport-level cancellation. New
  TestGetChangeFeed_ContextCancelled_HonoredBetweenSubRequests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* azcosmos: address xinlian12 review comments + thread priority/throughput options post-merge

- (X1) Add hard cap on drain-loop rotation budget (4*(initialLen+1))
  matching Java's FeedRangeCompositeContinuationImpl defense. Prevents
  unbounded loops if overlap-resolution ever produces duplicate ranges.
- (X2) Add focused table-driven unit tests for the overlap helpers:
  TestNormalizeMaxBoundary, TestRangesOverlap_TableDriven (10 cases),
  TestOverlappingPartitionKeyRanges_TableDriven (11 cases covering exact-
  match, multi-overlap, sub-range, straddling, empty/inverted/equal
  feed ranges, boundary touches).
- (X3) PopulateCompositeContinuationToken (public API) is now a no-op
  when ContinuationToken is already populated, preventing the multi-range
  composite token written by the drain loop from being overwritten with
  a stale single-range token rebuilt from the per-head FeedRange.

Post-merge with upstream/main (#26750):
- Thread options.PriorityLevel and options.ThroughputBucket through the
  F1 drain loop's pipelineRequestOptions.headerOptionsOverride so the
  new priority/throughput-bucket support added in #26750 reaches the
  change-feed request.
- Rewire TestChangeFeedPriorityAndThroughputBucketHeaders to use
  createChangeFeedTestClient (caches pre-wired) since the cache-nil
  fallback was removed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Client This issue points to a problem in the data-plane of the library. Cosmos

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants