Skip to content

Fix CTS disposal in ProcessFetchInBackground that breaks cancellation and causes SemaphoreSlim convoy#6056

Closed
bgavrilMS wants to merge 2 commits into
mainfrom
bgavrilMS/fix-cts-disposal-convoy
Closed

Fix CTS disposal in ProcessFetchInBackground that breaks cancellation and causes SemaphoreSlim convoy#6056
bgavrilMS wants to merge 2 commits into
mainfrom
bgavrilMS/fix-cts-disposal-convoy

Conversation

@bgavrilMS

Copy link
Copy Markdown
Member

Summary

Fixes #6053 — Background token refresh ignores cancellation, causing thread starvation via semaphore convoy.

The Bug

The lambda passed to ProcessFetchInBackground uses using var to create a linked CancellationTokenSource, but returns the Task without awaiting it. In a non-async method, using var disposes at the end of the block (the return statement), before the async operation completes. This severs the link to the parent cancellation token.

Impact: SemaphoreSlim.WaitAsync(cancellationToken) in ManagedIdentityAuthRequest becomes permanently uncancellable. When the token endpoint (IMDS) is temporarily unreachable, every proactive refresh background task becomes a permanent semaphore waiter → unbounded convoy → foreground threads blocked for hours.

The Fix

Make the lambda async and await the inner call. The compiler-generated state machine keeps using var alive until the awaited operation completes, so parent cancellation propagates correctly through the linked token.

4 files fixed (all introduced by PR #4471):

  • ClientCredentialRequest.cs
  • ManagedIdentityAuthRequest.cs
  • OnBehalfOfRequest.cs (OBO)
  • CacheSilentStrategy.cs (Silent)

Regression Test

Added LinkedCancellationTokenTests.cs with two pattern tests:

  • BugPattern: non-async lambda → CTS disposed → cancellation does NOT propagate (semaphore hangs)
  • FixPattern: async lambda → CTS alive → cancellation propagates (semaphore aborts promptly)

Attribution

Fix commit authored by @jayesh-a-shah (cherry-picked from PR #6054 where ADO builds were not triggering).

Jayesh Shah and others added 2 commits June 10, 2026 19:07
… and causes SemaphoreSlim convoy

The lambda passed to ProcessFetchInBackground uses 'using var' to create a
linked CancellationTokenSource, but returns the Task without awaiting it.
This causes the linked CTS to be disposed before the async operation completes,
breaking the link to the parent cancellation token. As a result, WaitAsync on
the static SemaphoreSlim(1,1) becomes permanently unkillable.

When the token endpoint (IMDS) is temporarily unreachable, every proactive
refresh background task becomes a permanent semaphore waiter, forming an
unbounded convoy. Foreground threads that later need a token are blocked
behind the convoy for hours.

Fix: make the lambda async and await the inner call, so 'using var' disposal
waits for the async operation to complete and parent cancellation propagates
correctly through the linked token.

Fixes all 4 affected locations introduced by PR #4471:
- ClientCredentialRequest.cs
- ManagedIdentityAuthRequest.cs
- OnBehalfOfRequest.cs
- CacheSilentStrategy.cs
Pattern tests proving the bug mechanism: non-async lambda with 'using var'
on a linked CancellationTokenSource disposes it before the async operation
completes, severing cancellation propagation through semaphore waits.

- BugPattern test: demonstrates cancellation does NOT propagate (hangs)
- FixPattern test: demonstrates cancellation DOES propagate (aborts promptly)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 10, 2026 18:11
@bgavrilMS bgavrilMS requested a review from a team as a code owner June 10, 2026 18:11

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a proactive background refresh cancellation bug caused by disposing a linked CancellationTokenSource before the returned async work completes (non-async lambda + using var). By making the background fetch lambdas async and awaiting the inner call, cancellation now properly propagates through the linked token, preventing long-lived uncancellable waits (e.g., semaphore convoy) when downstream endpoints are slow/unreachable.

Changes:

  • Update 4 proactive-refresh call sites to use async lambdas and await so the linked CTS lifetime spans the async operation.
  • Add a regression test validating the C# language behavior difference between non-async vs async lambdas with using var linked CTS.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/client/Microsoft.Identity.Client/Internal/Requests/ClientCredentialRequest.cs Make proactive refresh background fetch lambda async and await the inner token acquisition to preserve linked CTS lifetime.
src/client/Microsoft.Identity.Client/Internal/Requests/ManagedIdentityAuthRequest.cs Same fix pattern for managed identity proactive refresh.
src/client/Microsoft.Identity.Client/Internal/Requests/OnBehalfOfRequest.cs Same fix pattern for OBO proactive refresh.
src/client/Microsoft.Identity.Client/Internal/Requests/Silent/CacheSilentStrategy.cs Same fix pattern for silent proactive refresh.
tests/Microsoft.Identity.Test.Unit/PublicApiTests/LinkedCancellationTokenTests.cs Add regression tests demonstrating buggy vs fixed patterns for linked CTS disposal + cancellation propagation.

Comment on lines +72 to +85
// Act
Task backgroundTask = Task.Run(buggyLambda);

// Wait for the operation to start (semaphore wait is active)
await operationStarted.Task.ConfigureAwait(false);

// Small delay to guarantee the non-async lambda has returned and disposed the linked CTS.
// With RunContinuationsAsynchronously, we resume on a different thread, but the
// TaskRun thread may not have finished the `return` + `using var` disposal yet.
await Task.Delay(100).ConfigureAwait(false);

// Cancel the parent — this SHOULD propagate to the linked token...
parentCts.Cancel();

Comment on lines +18 to +20
/// Pattern affected (4 files):
/// ClientCredentialRequest.cs, OboRequest.cs, SilentRequest.cs, ManagedIdentityAuthRequest.cs
///
@bgavrilMS bgavrilMS closed this Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Background token refresh ignores cancellation, causing threads to block for hours

3 participants