-
Notifications
You must be signed in to change notification settings - Fork 539
[AAD token revocation]: Adds logic for handling emergency token revocation and CAE token revocation. #5549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[AAD token revocation]: Adds logic for handling emergency token revocation and CAE token revocation. #5549
Changes from all commits
a2fc284
819c0fb
dcbf04f
de886f3
0b87410
ea91852
b0c8e97
2eeeee0
9235f12
c046697
ec5719b
bf5018d
e0007ee
e4e07ae
471d390
1b890ff
144f4de
9f0da6e
5f4623a
ae85890
ce81a8a
f9bf653
48bab64
533e120
9f49175
1c9bbc4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -62,6 +62,7 @@ public AuthState(AccessToken token, string authorizationHeader) | |
| private TimeSpan? systemBackgroundTokenCredentialRefreshInterval; | ||
| private Task<AuthState>? currentRefreshOperation = null; | ||
| private volatile AuthState? authState = null; | ||
| private volatile string? cachedClaimsChallenge; | ||
| private bool isBackgroundTaskRunning = false; | ||
| private bool isDisposed = false; | ||
|
|
||
|
|
@@ -142,6 +143,93 @@ public void Dispose() | |
| this.isDisposed = true; | ||
| } | ||
|
|
||
| internal void ResetCachedToken(string? claimsChallenge = null) | ||
| { | ||
| if (this.isDisposed) | ||
| { | ||
| return; | ||
| } | ||
|
|
||
| lock (this.backgroundRefreshLock) | ||
| { | ||
| this.authState = null; | ||
| this.currentRefreshOperation = null; | ||
| this.isBackgroundTaskRunning = false; | ||
| this.cachedClaimsChallenge = claimsChallenge; | ||
| } | ||
|
|
||
| DefaultTrace.TraceInformation( | ||
| $"TokenCredentialCache: Token cache reset due to AAD revocation signal. HasClaims={claimsChallenge != null}"); | ||
| } | ||
|
|
||
| internal static string MergeClaimsWithClientCapabilities(string? claimsChallenge) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔴 This brace-counting splice produces invalid JSON. Reproduced against the literal base64 used in the three new tests ( That leading comma fails JSON parsing. The tests don't catch it because they only assert Also broken: a Entra will reject the malformed Suggest rewriting with using var doc = JsonDocument.Parse(claimsJson);
if (!doc.RootElement.TryGetProperty("access_token", out var atElem) ||
atElem.ValueKind != JsonValueKind.Object)
{
return clientCapabilitiesJson;
}
using var ms = new MemoryStream();
using (var writer = new Utf8JsonWriter(ms))
{
writer.WriteStartObject();
writer.WritePropertyName("access_token");
writer.WriteStartObject();
foreach (var p in atElem.EnumerateObject())
{
if (p.NameEquals("xms_cc")) continue; // avoid duplicate
p.WriteTo(writer);
}
writer.WritePropertyName("xms_cc");
writer.WriteStartObject();
writer.WriteStartArray("values");
writer.WriteStringValue("cp1");
writer.WriteEndArray();
writer.WriteEndObject();
writer.WriteEndObject();
writer.WriteEndObject();
}
return Encoding.UTF8.GetString(ms.ToArray());Then add a test that calls
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The empty access_token case ({"access_token":{}}) is a valid bug for that specific input, but the server never sends it — the gateway always includes nbf with a timestamp value: The } inside string values concern is also theoretical — the server's claims JSON uses simple string values (timestamps, booleans) with no embedded braces. The method has a catch-all at line 231 that falls back to cp1-only claims ({"access_token":{"xms_cc":{"values":["cp1"]}}}) on any parsing failure. So even with an unexpected format, the SDK degrades A JsonDocument/Utf8JsonWriter rewrite would be cleaner but System.Text.Json is not a built-in dependency for our netstandard2.0 target — it would require adding a NuGet package reference for a code path |
||
| { | ||
| const string clientCapabilitiesJson = "{\"access_token\":{\"xms_cc\":{\"values\":[\"cp1\"]}}}"; | ||
|
|
||
| if (string.IsNullOrEmpty(claimsChallenge)) | ||
| { | ||
| return clientCapabilitiesJson; | ||
| } | ||
|
|
||
| try | ||
| { | ||
| byte[] claimsBytes = Convert.FromBase64String(claimsChallenge); | ||
| string claimsJson = System.Text.Encoding.UTF8.GetString(claimsBytes); | ||
|
|
||
| int accessTokenIndex = claimsJson.IndexOf("\"access_token\"", StringComparison.Ordinal); | ||
| if (accessTokenIndex < 0) | ||
| { | ||
| DefaultTrace.TraceWarning("TokenCredentialCache: CAE claims challenge missing 'access_token' key, using client capabilities only"); | ||
| return clientCapabilitiesJson; | ||
| } | ||
|
|
||
| int openBraceIndex = claimsJson.IndexOf('{', accessTokenIndex); | ||
| if (openBraceIndex < 0) | ||
| { | ||
| DefaultTrace.TraceWarning("TokenCredentialCache: Malformed CAE claims challenge, using client capabilities only"); | ||
| return clientCapabilitiesJson; | ||
| } | ||
|
|
||
| int braceCount = 1; | ||
| int currentIndex = openBraceIndex + 1; | ||
| int closeBraceIndex = -1; | ||
|
|
||
| while (currentIndex < claimsJson.Length && braceCount > 0) | ||
| { | ||
| if (claimsJson[currentIndex] == '{') | ||
| { | ||
| braceCount++; | ||
| } | ||
| else if (claimsJson[currentIndex] == '}') | ||
| { | ||
| braceCount--; | ||
| if (braceCount == 0) | ||
| { | ||
| closeBraceIndex = currentIndex; | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| currentIndex++; | ||
| } | ||
|
|
||
| if (closeBraceIndex < 0) | ||
| { | ||
| DefaultTrace.TraceWarning("TokenCredentialCache: Unable to locate the end of the 'access_token' object in the CAE claims challenge. Using client capabilities only"); | ||
| return clientCapabilitiesJson; | ||
| } | ||
|
|
||
| return claimsJson.Substring(0, closeBraceIndex) + | ||
| ",\"xms_cc\":{\"values\":[\"cp1\"]}" + | ||
| claimsJson.Substring(closeBraceIndex); | ||
| } | ||
| catch (Exception ex) | ||
| { | ||
| DefaultTrace.TraceWarning($"TokenCredentialCache: Failed to merge claims challenge: {ex.Message}. Using client capabilities only."); | ||
| return clientCapabilitiesJson; | ||
| } | ||
| } | ||
|
|
||
| private async Task<AuthState> GetNewTokenAsync( | ||
| ITrace trace) | ||
| { | ||
|
|
@@ -206,6 +294,25 @@ private async ValueTask<AuthState> RefreshCachedTokenWithRetryHelperAsync( | |
| { | ||
| tokenRequestContext = this.scopeProvider.GetTokenRequestContext(); | ||
|
|
||
| string mergedClaims = TokenCredentialCache.MergeClaimsWithClientCapabilities(this.cachedClaimsChallenge); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 Lost-update race on
Concrete sequence:
Bounded (the stale token expires within its TTL), but the window is real whenever concurrent refresh + 401 happens. Suggest reading under the lock with a snapshot and clearing only if the value hasn't changed: string snapshot;
lock (this.backgroundRefreshLock) { snapshot = this.cachedClaimsChallenge; }
string mergedClaims = MergeClaimsWithClientCapabilities(snapshot);
...
lock (this.backgroundRefreshLock)
{
if (object.ReferenceEquals(this.cachedClaimsChallenge, snapshot))
{
this.cachedClaimsChallenge = null;
}
}A stronger fix: have
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The concrete sequence described doesn't result in lost claims. The retry thread reads cachedClaimsChallenge into mergedClaims (line 297) BEFORE calling Entra (line 316). Even if the old refresh clears The cachedClaimsChallenge = null at line 335 is intended behavior — clear claims after successful acquisition so future background refreshes don't include stale nbf. The remaining theoretical risk is the old refresh writing a stale authState at line 340 after the new refresh writes the fresh one — this requires the old refresh (started before revocation) to complete |
||
| if (string.IsNullOrEmpty(this.cachedClaimsChallenge)) | ||
| { | ||
| DefaultTrace.TraceInformation( | ||
| $"Requesting AAD token with CAE client capabilities (cp1). Retry={retry}"); | ||
| } | ||
| else | ||
| { | ||
| DefaultTrace.TraceInformation( | ||
| $"Requesting AAD token for revocation with claims challenge and client capabilities (cp1). Retry={retry}"); | ||
| } | ||
|
|
||
| tokenRequestContext = new TokenRequestContext( | ||
| scopes: tokenRequestContext.Scopes, | ||
| parentRequestId: tokenRequestContext.ParentRequestId, | ||
| claims: mergedClaims, | ||
| tenantId: tokenRequestContext.TenantId, | ||
| isCaeEnabled: tokenRequestContext.IsCaeEnabled); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡
The whole feature is opt-in to CAE (we send isCaeEnabled: trueAnd add a test that captures the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed — added isCaeEnabled: true to CosmosScopeProvider.GetTokenRequestContext(). This is the proper Azure.Identity contract for signaling CAE support and ensures MSAL uses the CAE-aware token cache |
||
|
|
||
| AccessToken accessToken = await this.tokenCredential.GetTokenAsync( | ||
| requestContext: tokenRequestContext, | ||
| cancellationToken: this.cancellationToken); | ||
|
|
@@ -225,6 +332,8 @@ private async ValueTask<AuthState> RefreshCachedTokenWithRetryHelperAsync( | |
| this.systemBackgroundTokenCredentialRefreshInterval = TimeSpan.FromSeconds(refreshIntervalInSeconds); | ||
| } | ||
|
|
||
| this.cachedClaimsChallenge = null; | ||
|
|
||
| AuthState newState = new AuthState( | ||
| accessToken, | ||
| this.tokenToAuthorizationHeader(accessToken.Token)); | ||
|
|
@@ -257,14 +366,20 @@ private async ValueTask<AuthState> RefreshCachedTokenWithRetryHelperAsync( | |
| $"Exception at {DateTime.UtcNow.ToString(CultureInfo.InvariantCulture)}", | ||
| exception.Message); | ||
|
|
||
| DefaultTrace.TraceError($"TokenCredential.GetToken() failed with RequestFailedException. scope = {string.Join(";", tokenRequestContext.Scopes)}, retry = {retry}, Exception = {lastException.Message}"); | ||
| DefaultTrace.TraceError( | ||
| $"TokenCredential.GetTokenAuthorizationHeaderAsync() failed. " + | ||
| $"scope = {string.Join(";", tokenRequestContext.Scopes)}, " + | ||
| $"hasClaimsChallenge = {this.cachedClaimsChallenge != null}, " + | ||
| $"retry = {retry}, " + | ||
| $"Exception = {lastException.Message}"); | ||
|
|
||
| // Don't retry on auth failures | ||
| if (exception is RequestFailedException requestFailedException && | ||
| (requestFailedException.Status == (int)HttpStatusCode.Unauthorized || | ||
| requestFailedException.Status == (int)HttpStatusCode.Forbidden)) | ||
| { | ||
| this.authState = null; | ||
| this.cachedClaimsChallenge = null; | ||
| throw; | ||
| } | ||
| bool didFallback = this.scopeProvider.TryFallback(exception); | ||
|
|
@@ -282,6 +397,8 @@ private async ValueTask<AuthState> RefreshCachedTokenWithRetryHelperAsync( | |
| throw new ArgumentException("Last exception is null."); | ||
| } | ||
|
|
||
| this.cachedClaimsChallenge = null; | ||
|
|
||
| // The retries have been exhausted. Throw the last exception. | ||
| throw lastException; | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 Brittle WWW-Authenticate parser.
Literal substring
claims="misses legal RFC 7235 variants:claims = "..."(whitespace around=)claims=...(unquoted)Bearerchallenges (§4.1)Negotiate,Pop) precedingBearerWhen the format deviates, the parser returns
null,ResetCachedToken(null)clears the cache without storing the challenge, the next token request goes out withoutnbf→ another 401 → per-operation budget exhausted → customer-visible 401.Use
AuthenticationHeaderValue.TryParseor an RFC-7235-aware parameter parser. At minimum, tolerate optional whitespace around=and the unquoted form.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parser handles the known WWW-Authenticate format from the Cosmos DB gateway: Bearer realm="", authorization_uri="...", error="insufficient_claims", claims="". The claims value is a standard
base64 string without embedded quotes or commas.
The RFC 7235 edge cases listed (whitespace around =, unquoted values, multiple schemes) don't occur in the Cosmos gateway response — the gateway uses a fixed format string.
On parsing failure, ExtractClaimsFromWwwAuthenticate returns null → ResetCachedToken(null) → cache cleared, cachedClaimsChallenge is null → next token request gets cp1-only claims → Entra still issues a
fresh token (without nbf constraint) → request succeeds if the new token doesn't match the revocation rule. The degradation path is safe.