Skip to content

[Internal] DTS: Adds partitionKeyRangeId to operation result to assemble valid session tokens in DTS#5856

Merged
Meghana-Palaparthi merged 7 commits into
mainfrom
users/Meghana-Palaparthi/dtx_pkRangeId_in_response
May 14, 2026
Merged

[Internal] DTS: Adds partitionKeyRangeId to operation result to assemble valid session tokens in DTS#5856
Meghana-Palaparthi merged 7 commits into
mainfrom
users/Meghana-Palaparthi/dtx_pkRangeId_in_response

Conversation

@Meghana-Palaparthi
Copy link
Copy Markdown
Contributor

@Meghana-Palaparthi Meghana-Palaparthi commented May 12, 2026

Description

Problem

Distributed transaction (DTC) operation-level session tokens are sent by the coordinator as raw LSN vectors (e.g., "-1#425344#1=12345") without the {pkRangeId}: prefix. SessionContainer.SetSessionToken() unconditionally does Split(':')[1] on the token — an LSN-only token has no colon, so tokenParts has length 1, causing an IndexOutOfRangeException that escapes and surfaces a committed transaction as failed to the caller.

Fix

Add a partitionKeyRangeId field to DistributedTransactionOperationResult and assemble the canonical {pkRangeId}:{lsn} session token in FromJson() before it reaches SessionContainer.

Design Decisions

  1. Assembly happens in FromJson(), not MergeSessionTokens()
    The token is assembled at parse time so that result.SessionToken is always in canonical form (or null) by the time any downstream code reads it. This keeps MergeSessionTokens simple — it just sets the header.

  2. Warn + null (skip merge) when partitionKeyRangeId is absent/blank — no throw
    The DTC coordinator currently never sends partitionKeyRangeId (tracked by #5857). Once the coordinator update ships, both fields will always be present and the null-path will be removed.

    We considered throwing InvalidOperationException on blank partitionKeyRangeId, but rejected it because:

    • The server-side SetPartitionKeyRangeId() has zero validation — it can legitimately be null or empty (e.g., master resource operations, partition startup, aborted transactions where SDKResponseBuilder.cpp:193 explicitly sets empty string).
    • All non-DTC SDK modes handle null/empty pkRangeId without throwing: Gateway mode omits the header entirely; Direct mode falls back to the request's token prefix or hardcodes "0".
    • Throwing would transform the exact crash class this PR fixes (committed tx → exception) into a different but equally bad failure.
  3. Known limitation: stale-read window when pkRangeId is absent
    When partitionKeyRangeId is absent (current coordinator behavior), SessionToken is set to null → session token merge is skipped → SessionContainer retains the previous LSN for that partition. A subsequent session-consistent read may be served by a replica that hasn't yet replicated the DTC commit. This is a strict improvement over the pre-fix crash (committed tx surfacing as failed) and is transient — replication lag is typically milliseconds, and any other write to the same partition advances the token past the gap. This will be fully resolved when the coordinator always sends partitionKeyRangeId (#5857).

Changes

File Change
DistributedTransactionOperationResult.cs Add PartitionKeyRangeId JSON property. FromJson() assembles {pkRangeId}:{lsn} when valid, else warns and sets SessionToken = null. Removed unused using System.ComponentModel.
DistributedTransactionSerializer.cs Add PartitionKeyRangeId constant for the JSON field name.
DistributedTransactionCommitter.cs Widen try/catch to wrap entire per-operation body including Operations[result.Index]. Improve catch diagnostics. Token is already canonical from FromJson()MergeSessionTokens just sets the header.
DistributedTransactionResponseTests.cs Add 5 parser tests: pkRangeId absent, blank (empty/whitespace ×3 via DataRow), sessionToken field absent.
DistributedTransactionCommitterTests.cs Add end-to-end committer tests: blank pkRangeId → commit succeeds, session container not updated.
DistributedTransactionTests.cs (emulator) Update ValidateSessionTokenMergedIntoDtcClient to use new wire format. Add ValidateSessionTokenSkipped_WhenPartitionKeyRangeIdAbsent. Both tests use matching PKs for partition alignment.

Wire Contract (Transitional)

Current coordinator behavior:
  sessionToken:        always sent (raw LSN vector, e.g., "-1#425344#1=12345")
  partitionKeyRangeId: never sent

Future coordinator behavior (issue #5857):
  sessionToken:        always sent (raw LSN vector)
  partitionKeyRangeId: always sent (e.g., "3")

SDK assembly:
  Both present + valid  → "{pkRangeId}:{lsn}" (canonical)
  pkRangeId absent/blank → warn + null (skip merge)
  sessionToken absent    → no assembly needed (no merge)

Type of change

  • Bug fix (non-breaking change which fixes an issue)

…n tokens in DTS

In Gateway mode, distributed transaction operation-level session tokens are sent as LSN-only (without the pkRangeId: prefix) rather than the fully-assembled {pkRangeId}:{lsn} format that SessionContainer.SetSessionToken expects.

Changes:
- Add optional PartitionKeyRangeId property to DistributedTransactionOperationResult (JSON: 'partitionKeyRangeId'), with constant in DistributedTransactionSerializer
- Update MergeSessionTokens() to assemble {pkRangeId}:{lsn} when the token is LSN-only and partitionKeyRangeId is present
- If token already contains ':' (pre-assembled), use it as-is (backward compat)
- If token is LSN-only and partitionKeyRangeId is absent/whitespace, log a trace warning and silently skip merging for that operation
- Add unit tests covering all new branches including whitespace/empty partitionKeyRangeId edge cases and precedence when both fields are present
Copy link
Copy Markdown
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SDK should not attempt to fix broken sesison tokens.

@NaluTripician
Copy link
Copy Markdown
Contributor

Deep Review (local) — findings

Sharing findings from a local deep review. Not posting as a formal "Request Changes" because @FabianMeiswinkel already has CHANGES_REQUESTED on file — these notes are meant to feed into that iteration.

Summary

This correctly identifies a real latent crash from #5705: SessionContainer.SetSessionToken does tokenParts[1] unconditionally, so any LSN-only token coming back in a CommitDistributedTransaction response throws IndexOutOfRangeException out of MergeSessionTokens → bubbles to CommitTransactionAsync's outer catchthe SDK reports a committed DTS as failed. Good find.

However: the architectural direction is contested, and the workaround is incomplete — other malformed shapes will still convert a committed DTS into a thrown exception.


🔴 Blocker

B1 — Architectural pushback unresolved. Aligning with @FabianMeiswinkel: the session token shape is a server-emitted contract; the right fix is server-side, not a parsing escape hatch in the SDK. Once dual-shape parsing ships, the SDK has to tolerate both shapes indefinitely for interop with older clients, which effectively makes the "transitional" code permanent. PR #5705's earlier review from @kirankumarkolli made the same point ("Validations right place is upstream when response is composed.").

If a client-side patch must land as a stop-gap, gate it behind a feature flag / version check with a tracking issue and an explicit deletion target.


🟡 Major

M2 — The fix is partial. SessionContainer.SetSessionToken (private overload, SessionContainer.cs:306-321) still throws on:

  • ":"tokenParts = ["", ""], SessionTokenHelper.Parse("") throws
  • "0:" → same parse failure
  • "abc:def" (non-numeric LSN) → FormatException
  • result.PartitionKeyRangeId containing a colon → produces "0:abc:lsn...", then tokenParts[1] = "abc" fails to parse

Each of these still escapes MergeSessionTokens → caught by CommitTransactionAsync's catch (Exception ex) when (ex is not OperationCanceledException) (line 68) → server-committed DTS surfaces as a thrown exception to the caller. That's exactly the failure mode this PR is trying to fix; it just plugs one of several holes.

Suggestion: wrap the inner merge loop body (or the SetSessionToken call) in try/catch-and-trace. Nothing in post-commit session-token bookkeeping should be able to fail a transaction that the server already committed.

M3 — Silent skip creates a mystery 404/1002 for customers. When pkRangeId is missing, the SessionContainer never learns the LSN advance. The next session-consistency read against a replica that hasn't caught up returns ReadSessionNotAvailable. From the customer's perspective: "I committed a transaction, then immediately read and got a 404." The only signal of the real cause is a DefaultTrace.TraceWarning on the writing client, with no link to the response the customer holds.

Consider attaching a diagnostic note onto DistributedTransactionResponse so the skip is visible on the result the caller already has.

M4 — No operational signal for the Cosmos team. DefaultTrace is invisible in most deployments. The service team can't tell if this branch fires zero times per day or millions. That's exactly the kind of "compensating client code that ages badly" — no metric ever lights up, so no one notices the server was never fixed.

Suggestion: route this through CosmosDbEventSource and/or a counter (e.g. DtsMalformedSessionTokenCount) so it shows up in standard ETW ingestion. Add a // TODO(issue#…) deletion gate.

M5 — Comment vs. behavior contradict each other. The block at DistributedTransactionCommitter.cs:212-222 reads:

// Cannot form a valid session token without pkRangeId; silently skip merging
// this operation's token but continue processing the rest of the response.
DefaultTrace.TraceWarning(...);

The comment (and PR description) say "silently skip"; the code emits a warning. Pick one — given M4 I'd keep and upgrade the trace, then reword the comment.


🟢 Minor

  • m6 — IndexOf(':') < 0 is a weak shape heuristic. ":" and "0:" slip through as "pre-assembled". A stricter check (both halves non-empty after split) or a structured assembly from the two fields is more defensible.
  • m7 — Trace-log spam risk. If the server regresses universally, every op of every transaction emits a warning. Sample, or fold into the counter from M4.
  • m8 — Test gap: multi-operation mixed-shape responses. All five new tests use a single operation. The continue path was added so the loop keeps processing surrounding operations, but no test exercises that (e.g. op 0 pre-assembled, op 1 LSN+pkRangeId, op 2 LSN-only-missing-pkRangeId → assert ops 0 and 1 still merge correctly).
  • m9 — Tests don't assert the TraceWarning. Given M3/M4 depend on that diagnostic, the trace is behavior, not incidental. Subscribe a listener and lock it down.
  • m10 — PartitionKeyRangeId becomes part of the INTERNAL-public surface. It also serializes outbound via [JsonInclude] + public getter. If this is provisional, mark it [JsonIgnore] for serialization and XML-doc it as transitional so consumers don't take a dependency.

💬 Suggestions

  • s11 — Prefer .Contains(':', StringComparison.Ordinal) over IndexOf(':') < 0.
  • s12// TODO(issue#NNNN): Remove once the gateway emits {pkRangeId}:{lsn} deletion gate.
  • s13 — Per @kirankumarkolli's prior guidance on [Internal] DTS: Adds logic to merge session data from DTC response into SessionContainer #5705, do the assembly in DistributedTransactionOperationResult.FromJson (where both fields are right there) and produce a canonical {pkRangeId}:{lsn} SessionToken to the rest of the SDK. That keeps the transitional knowledge in one place and removes the need to expose PartitionKeyRangeId separately at all (resolving m10).

✅ Positives

  • The crash analysis on SessionContainer.SetSessionToken's tokenParts[1] is non-obvious and a real find.
  • Test-helper extension (4-tuple overload alongside the 2/3-tuple shims) keeps 30+ existing tests working unchanged — clean refactor.
  • Deliberate edge-case coverage on the new branch ("", " ", " ", and the precedence test for pkRangeId present and token already containing :).
  • Copy constructor correctly updated to propagate PartitionKeyRangeId — easy to miss.
  • Trace message includes operation index and collection rid — good diagnostic context if anyone does read the trace.

@FabianMeiswinkel FabianMeiswinkel self-requested a review May 12, 2026 23:09
Copy link
Copy Markdown
Contributor

@NaluTripician NaluTripician left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on my earlier deep-review notes — the crash fix and overall shape look right, but I'd like these specific code asks resolved before I sign off. None are large; all are inline below.

Code asks (gating my sign-off):

  1. Narrow the new outer catch (Exception) in MergeSessionTokens to exclude OperationCanceledException so it matches CommitTransactionAsync's outer-catch contract.
  2. Null-guard result in the new catch's TraceWarning format-args path so a null array element can't NRE inside the safety net and escape the loop.
  3. Make the SessionToken / PartitionKeyRangeId null-checks in FromJson symmetric (both IsNullOrWhiteSpace).
  4. Add an "already canonical" pass-through in FromJson so the SDK is defensive against any coordinator version that still emits {pkRangeId}:{lsn} directly in sessionToken — silently nulling a valid token is a worse regression than the original crash.

Process gates (not code, but worth calling out):

  • @FabianMeiswinkel still has CHANGES_REQUESTED on file — please get an explicit re-review / dismissal rather than relying on the text reply.
  • CI on 6fb31fb is still in-progress for most dotnet-v3-ci jobs (Microsoft.Azure.Cosmos.Tests, Static Analysis, EmulatorTests suites, packaging). Want to see those green before I approve.

Once the four inline items are addressed and CI/Fabian unblock, this is good to go from my side.

Meghana-Palaparthi and others added 4 commits May 13, 2026 18:17
- catch (Exception ex) when (ex is not OperationCanceledException): mirrors
  outer CommitTransactionAsync contract so cancellations propagate
- FromJson ?? throw JsonException with element kind and raw text: null
  guard at the source rather than downstream in MergeSessionTokens
- IsNullOrWhiteSpace on SessionToken (symmetric with PartitionKeyRangeId);
  whitespace-only SessionToken normalized to null
- Already-canonical pass-through tightened: colonIndex > 0 &&
  colonIndex < Length-1 ensures both pkRangeId and LSN sides are non-empty
- Tests: OCE propagation, whitespace sessionToken (x2), canonical preserved
  (x2), edge-colon not-canonical (x2)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…sionTokens

In a committed DTC (200 OK), no individual operation result can be
404/1002  that would be a server contract violation. The check was
copied from GatewayStoreModel where it applies to single-operation
retryable flows, but is dead code in the DTC commit path.

Removed the corresponding test that asserted the now-deleted behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@NaluTripician NaluTripician left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve with nits ✅

Thanks for the iterations — ~80% of my prior asks are addressed, the crash fix is real, and the test coverage (parser-level + committer-level + emulator + DelegatingTraceListener for the trace contract) is genuinely thorough. Signing off; Fabian is updating his review asynchronously so I'm not gating on that here.

Nits below are non-blocking — happy to take any/all as follow-ups.

Nits

  1. result?.Index in the catch's format args. FromJson's ?? throw new JsonException(...) guard closes the realistic path, but the catch at line 224 is the last line of defense against bookkeeping bugs. A null-conditional access (result?.Index ?? -1) costs nothing and keeps the warning robust if anything upstream regresses.

  2. 404 / ReadSessionNotAvailable skip removal in 6708805 is undefended. The rationale ("unreachable" because DTC ops are writes) is plausible, but the deletion has no inline comment and the prior test CommitTransactionAsync_SkipsMerge_When404ReadSessionNotAvailable was removed rather than replaced. Either restore the skip (cheap defensive depth) or add a 2-line comment + a regression test asserting MergeSessionTokens no-ops gracefully on a per-op 404/1002 — relies on the new try/catch.

  3. Edge-colon assembly produces double-colon tokens. When pkRangeId="X" and sessionToken=":Y", the assembled output is "X::Y". The new try/catch in MergeSessionTokens turns the downstream throw into a TraceWarning, but FromResponseMessage_OperationResult_SessionToken_AssembledWhenColonIsAtEdge actively locks in the malformed output as intended behavior. Suggest only assembling when the raw token contains no colon at all — and update that test to assert the token is nulled / skipped instead.

  4. TODO deletion gates for the transitional null-pkRangeId branch are only in tests, not at the FromJson branch sites. When the coordinator update ships, the person ripping this out has to discover the three FromJson branches by hand. A // TODO(#NNNN, delete after coordinator emits pkRangeId) next to each branch would make that work mechanical. Pair with a tracking issue.

  5. TraceWarning spam risk is unbounded. If the coordinator never updates for a given account, every op of every DTS commit logs. Not urgent, but worth a sampling guard or first-N-per-process throttle before this is in customers' hands at scale.

  6. Brittle trace assertion. CommitTransactionAsync_EmitsTraceWarning_WhenPartitionKeyRangeIdIsAbsent matches on .Contains("partitionKeyRangeId") — if anyone reformats the warning and drops that exact substring the test goes silently green. Consider asserting on the operation index or a more stable token like "DTC operation index".

  7. Customer/operational diagnostic gap (M3/M4 from my prior review). DefaultTrace.TraceWarning is invisible in nearly all customer deployments and gives the Cosmos team no production telemetry to size the urgency of the coordinator fix. Even one of these as a follow-up would help materially: a SessionMergeSkipped flag on DistributedTransactionResponse, or routing the warning through CosmosDbEventSource so it's observable.

Positives worth saying out loud

  • OperationCanceledException carve-out + the new test that locks the contract is exactly right.
  • Already-canonical pass-through branch + the colonIndex > 0 && colonIndex < length - 1 guard is a clean way to reject ":" / "0:" without breaking real tokens.
  • IsNullOrWhiteSpace symmetry + the whitespace-to-null normalization branch and the parameterized tests covering " ", " " are nice.
  • Copy constructor propagating PartitionKeyRangeId — easy to miss.
  • Moving the Operations[result.Index] access inside the try/catch is a genuine hardening fix on top of the headline change.
  • ?? throw new JsonException on FromJson's Deserialize result quietly turns "NRE somewhere later" into a boundary-level exception.

LGTM 🚢

Copy link
Copy Markdown
Member

@FabianMeiswinkel FabianMeiswinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Meghana-Palaparthi Meghana-Palaparthi merged commit 74cd17b into main May 14, 2026
32 checks passed
@Meghana-Palaparthi Meghana-Palaparthi deleted the users/Meghana-Palaparthi/dtx_pkRangeId_in_response branch May 14, 2026 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants