-
Notifications
You must be signed in to change notification settings - Fork 406
Add GetManagedIdentityCapabilitiesAsync host capability discovery API (Phase 1) #6049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Robbie-Microsoft
merged 7 commits into
main
from
rginsburg/GetManagedIdentityCapabilitiesAsync
Jun 4, 2026
Merged
Changes from 5 commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
bc68491
Add GetManagedIdentityCapabilitiesAsync managed identity capability API
Robbie-Microsoft 619a20c
Add KeyGuard binding-strength tier to IMDSv2 capability discovery
Robbie-Microsoft 339fee0
Document capability-discovery key-provisioning side effect
Robbie-Microsoft 97b4bc7
Address PR review: single-flight discovery lock, clearer errors, chan…
Robbie-Microsoft d93eb8f
Rename MtlsBindingStrength.Bearer to None; simplify compute-metadata …
Robbie-Microsoft 1f8d97b
Remove stray scratch file and fix CHANGELOG enum value name
Robbie-Microsoft 8af439f
Merge branch 'main' into rginsburg/GetManagedIdentityCapabilitiesAsync
Robbie-Microsoft File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,150 @@ | ||
| ## Background | ||
|
|
||
| Spawned from #6044 (region discovery migration to IMDS `/compute`) and the follow-up discussion on Teams with @gladjohn and @neha-bhargava. Capturing the design while it's fresh. | ||
|
|
||
| ## Motivation | ||
|
Robbie-Microsoft marked this conversation as resolved.
Outdated
|
||
|
|
||
| **IMDS is throttled aggressively.** The instance-metadata service rate-limits at ~5 requests/sec/VM ([docs](https://learn.microsoft.com/azure/virtual-machines/instance-metadata-service#rate-limiting)). Today, every IMDS-on-VM consumer in MSAL.NET stands up its own call, cache, retry policy, and telemetry ΓÇö `RegionManager`, future bound-token capability detection, future cloud-environment validation, future zone-aware routing, etc. With both `ConfidentialClientApplication` and `ManagedIdentityApplication` running in the same process and each spawning multiple consumers, the rate limit is genuinely reachable under load. That's the real pain ΓÇö duplication is secondary. | ||
|
|
||
| The newly-adopted `/metadata/instance/compute` (api-version `2021-02-01`) endpoint returns a full instance document (`location`, `zone`, `azEnvironment`, `vmId`, `vmSize`, `osType`, `securityProfile.{securityType, secureBootEnabled, virtualTpmEnabled, encryptionAtHost}`, `extendedLocation`, `subscriptionId`, `resourceGroupName`, etc.). One fetch yields data useful for many consumers ΓÇö provided we share it. | ||
|
|
||
| ## Proposal | ||
|
|
||
| Introduce a process-wide, lazy, single-flight `/compute` snapshot **at the PlatformProxy layer**, so it is reachable from both CCA and MIA without duplicating the IMDS pipeline: | ||
|
|
||
| - One IMDS `/compute` call per process on first need. | ||
| - Single-flight: concurrent first-time callers share the in-flight `Task`. | ||
| - Consumers read typed properties off the cached snapshot (`Location`, `AzEnvironment`, `Zone`, `VmId`, `SecurityProfile`, ...) instead of issuing their own IMDS call. | ||
| - `RegionManager` becomes the first consumer; existing region-specific cache is replaced by a thin accessor over the snapshot. | ||
|
|
||
| ### Concrete consumers (not hypothetical) | ||
|
|
||
| | Consumer | Fields needed | Notes | | ||
| |---|---|---| | ||
| | **Region discovery** | `location` | Existing #6044 consumer; first to migrate. | | ||
| | **Bound-token / mTLS PoP capability detection** | `securityProfile.securityType` (`TrustedLaunch` / `ConfidentialVM` / `Standard`), `securityProfile.secureBootEnabled`, `securityProfile.virtualTpmEnabled` | Determines whether a VM/VMSS is capable of bound tokens. Active feature area; concrete near-term consumer. | | ||
| | **Cloud-environment validation** | `azEnvironment` | Cross-check against authority/instance discovery. | | ||
| | **Zone-aware routing** | `zone`, `extendedLocation` | Future; but available "for free" once the snapshot exists. | | ||
|
|
||
| Both `ConfidentialClientApplication` (region) and `ManagedIdentityApplication` (bound-token capability) are first-class consumers ΓÇö having the cache at PlatformProxy means a single fetch serves both. | ||
|
|
||
| ## Out of scope | ||
|
|
||
| - **Managed Identity host/source detection.** App Service, Functions, Azure Arc, Service Fabric, Cloud Shell, and Container Apps don't host `/compute` at all ΓÇö they're detected via env vars (`IDENTITY_ENDPOINT`, `IDENTITY_HEADER`, `IMDS_ENDPOINT`, etc.) and probe completely different URLs. The existing host-detection ladder is unchanged. `/compute` is consulted only **after** the host has been identified as VM/VMSS-IMDS. | ||
| - **Token endpoints** (`/metadata/identity/oauth2/token`). Unrelated; out of scope. | ||
|
|
||
| ## Failure semantics (the load-bearing design question) | ||
|
|
||
| This is the part Neha specifically called out: *"what happens if the data is not fetched the first time due to IMDS endpoint not being available?"* Today's `RegionManager` answer ("cache failure for process lifetime, never retry") is fine when region's the only consumer because the fallback is harmless (global ESTS). With multiple shared consumers, that policy is too coarse ΓÇö a transient blip would permanently blind every consumer. | ||
|
|
||
| Proposed policy: | ||
|
|
||
| | Failure class | Examples | Cache duration | Rationale | | ||
| |---|---|---|---| | ||
| | **Structural** | Not on a VM (connection refused / 404 / DNS fail to `169.254.169.254`) | Process lifetime | Will not change for the life of the process; cheap to remember. | | ||
| | **Transient** | `5xx`, `408`, `429`, network timeout | Short TTL (e.g., 30 s) | A blip should not permanently blind every consumer. Bounded re-fetch protects against thundering herd against IMDS while allowing recovery. | | ||
| | **API-version drift** | `400` with `newest-versions` | Existing api-version probe behavior ΓÇö retry within the same call | Already covered by `RegionManager.GetImdsUriApiVersionAsync` pattern. | | ||
| | **Success** | `200` with parseable `location` | Process lifetime | Compute metadata is effectively immutable for the process. | | ||
|
|
||
| Snapshot API exposes `TryGet` / `IsAvailable` rather than throwing. **Per-consumer fallback policy stays with the consumer**: | ||
|
|
||
| - Region discovery → silent fallback to global ESTS (unchanged). | ||
| - Bound-token capability → fallback to bearer token (or whatever MIA's degraded mode is). | ||
| - Cloud-environment validation → log + skip cross-check. | ||
|
|
||
| This keeps the snapshot neutral and lets each consumer pick its own degraded mode. | ||
|
|
||
| ### Throttling considerations | ||
|
|
||
| - One process-wide fetch puts the snapshot's IMDS load at O(1) for the lifetime of the process on the success path. | ||
| - Transient-failure re-fetch is bounded by TTL → at most `1 / TTL` calls/sec per process under sustained IMDS unavailability. With a 30 s TTL that's well below the 5 req/s/VM ceiling even with multiple processes. | ||
| - Single-flight on first fetch prevents thundering-herd across consumers. | ||
|
|
||
| ## API shape (sketch, non-normative) | ||
|
|
||
| `csharp | ||
| internal interface IImdsComputeSnapshot | ||
| { | ||
| Task<ImdsComputeResult> GetAsync(CancellationToken cancellationToken); | ||
| } | ||
|
|
||
| internal sealed class ImdsComputeResult | ||
| { | ||
| public ImdsComputeStatus Status { get; } // Available, NotAvailable, TransientError | ||
| public ImdsComputeMetadata Metadata { get; } // null unless Available | ||
| public string FailureReason { get; } // null unless not Available | ||
| } | ||
|
|
||
| internal sealed class ImdsComputeMetadata | ||
| { | ||
| public string Location { get; } | ||
| public string AzEnvironment { get; } | ||
| public string Zone { get; } | ||
| public string VmId { get; } | ||
| public string VmSize { get; } | ||
| public string OsType { get; } | ||
| public string SubscriptionId { get; } | ||
| public string ResourceGroupName { get; } | ||
| public ImdsSecurityProfile SecurityProfile { get; } | ||
| // ... additional fields as consumers need them | ||
| } | ||
|
|
||
| internal sealed class ImdsSecurityProfile | ||
| { | ||
| public ImdsSecurityType SecurityType { get; } // Standard / TrustedLaunch / ConfidentialVm / Unknown | ||
| public bool SecureBootEnabled { get; } | ||
| public bool VirtualTpmEnabled { get; } | ||
| public bool EncryptionAtHostEnabled { get; } | ||
| } | ||
| ` | ||
|
|
||
| Concrete shape (interface name, public vs. internal, normalization of IMDS string-bools to `bool`, enum vs. string for `securityType`) is for the spec PR. Above is a sketch only. | ||
|
|
||
| ## Layering | ||
|
|
||
| The snapshot lives at the **PlatformProxy** layer (or a new sibling helper that PlatformProxy exposes), so: | ||
|
|
||
| - Both `ConfidentialClientApplicationBuilder` and `ManagedIdentityApplicationBuilder` paths can reach it via the existing `IPlatformProxy` injection. | ||
| - Tests can swap it in via the existing `IPlatformProxy` mocking pattern. | ||
| - It is reset per `PublicClientApplication` only at process scope; **not** per `CCA` / `MIA` instance, since IMDS data is host-machine state. | ||
|
|
||
| ## Telemetry | ||
|
|
||
| - Add `ImdsSnapshotFetchOutcome` (`Success` / `StructuralFailure` / `TransientFailure` / `ApiVersionRetry`) to the existing ApiEvent surface, emitted on first fetch only. | ||
| - Existing `RegionAutodetectionSource` / `RegionOutcome` / `RegionDiscoveryFailureReason` semantics preserved verbatim ΓÇö region telemetry layered on top of the snapshot. | ||
| - Per-consumer telemetry (e.g., a future `BoundTokenCapability`) decoupled from snapshot telemetry. | ||
|
|
||
| ## Migration plan | ||
|
|
||
| 1. **Phase 0 ΓÇö done.** #6044 swaps `RegionManager` to `/compute` with no caching changes. | ||
| 2. **Phase 1 ΓÇö this issue.** Introduce `IImdsComputeSnapshot` at PlatformProxy. `RegionManager` re-routed through it. Behavior-identical for region. Adds the `securityProfile` / `azEnvironment` / `zone` fields to the DTO so subsequent consumers don't need another DTO change. | ||
| 3. **Phase 2.** First non-region consumer (likely bound-token capability detection) reads from the snapshot. | ||
|
|
||
| ## Test plan (skeleton) | ||
|
|
||
| - First call → IMDS hit → `Available`; `location` and other fields populated. | ||
| - Concurrent first-call → single IMDS hit shared across awaiters. | ||
| - Repeat call after success → no IMDS hit; same result. | ||
| - `404` / connection-refused → `NotAvailable`; subsequent calls cached for process lifetime. | ||
| - `500` / `429` / timeout → `TransientError`; re-fetched after TTL elapses; not before. | ||
| - `400` → existing api-version probe → success on retry; treated as success. | ||
| - `200` with malformed JSON → `TransientError` (since IMDS returning malformed JSON is itself transient/anomalous). | ||
| - `200` with missing/empty `location` → `Available` for non-region fields; region-specific consumers get `FailedAutoDiscovery` (existing behavior). | ||
| - `RegionManager` tests after re-routing ΓÇö no behavior change observable. | ||
|
|
||
| ## Open questions | ||
|
|
||
| - Visibility: does the snapshot need a public read API for hosts that want to reuse the data, or stay strictly internal? (Lean: internal until a clear public consumer emerges.) | ||
| - Forced refresh: do any consumers ever need to bypass the cache? (Lean: no ΓÇö process-lifetime caching is correct for compute metadata.) | ||
| - Multi-process / multi-AppDomain: out of scope ΓÇö caching is per `AppDomain` like today's region cache. | ||
|
|
||
| ## References | ||
|
|
||
| - #6039 ΓÇö original request to migrate region discovery to `/compute`. | ||
| - #6044 ΓÇö region discovery migration spec (Phase 0). Future Work section in that spec links back here. | ||
| - `src/client/Microsoft.Identity.Client/Instance/Region/RegionManager.cs` ΓÇö first consumer. | ||
| - `src/client/Microsoft.Identity.Client/PlatformsCommon/Interfaces/IPlatformProxy.cs` ΓÇö proposed home for the snapshot service. | ||
| - IMDS docs: | ||
| - Compute schema: https://learn.microsoft.com/azure/virtual-machines/instance-metadata-service | ||
| - Rate limiting: https://learn.microsoft.com/azure/virtual-machines/instance-metadata-service#rate-limiting | ||
|
|
||
41 changes: 41 additions & 0 deletions
41
src/client/Microsoft.Identity.Client/AppConfig/MtlsBindingStrength.cs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| // Copyright (c) Microsoft Corporation. All rights reserved. | ||
| // Licensed under the MIT License. | ||
|
|
||
| namespace Microsoft.Identity.Client.AppConfig | ||
| { | ||
| /// <summary> | ||
| /// Describes the strength with which a token can be bound to a cryptographic key on the | ||
| /// current host. Higher values indicate stronger binding. The value reflects what the host | ||
| /// is capable of producing, not what a particular request used. | ||
| /// </summary> | ||
| /// <remarks> | ||
| /// This type is shared by managed identity and confidential client mTLS Proof-of-Possession | ||
| /// scenarios. A value greater than <see cref="None"/> means the host can bind a token to a | ||
| /// key; it does <b>not</b> by itself imply hardware attestation. Attestation corresponds to | ||
| /// the <see cref="KeyGuard"/> tier specifically. | ||
| /// </remarks> | ||
| public enum MtlsBindingStrength | ||
| { | ||
| /// <summary> | ||
| /// No key binding is available, so the host cannot perform mTLS Proof-of-Possession. This | ||
| /// is the floor of the range (for example, on .NET Framework 4.6.2, which does not support | ||
| /// PoP). | ||
| /// </summary> | ||
| None = 0, | ||
|
|
||
| /// <summary> | ||
| /// The token can be bound to a software-backed key (for example, a persisted CNG key on | ||
| /// Windows, or a software RSA key elsewhere). The key is not hardware-isolated. | ||
| /// </summary> | ||
| Software = 1, | ||
|
Robbie-Microsoft marked this conversation as resolved.
|
||
|
|
||
| // 2 is reserved for a future tier (for example, TPM-backed keys). | ||
|
gladjohn marked this conversation as resolved.
|
||
|
|
||
| /// <summary> | ||
| /// The token can be bound to a key isolated by Virtualization-based Security (VBS), such | ||
| /// as KeyGuard on a Trusted Launch (TVM) or Confidential (CVM) virtual machine. This is | ||
| /// the only tier that implies hardware-backed attestation. | ||
| /// </summary> | ||
| KeyGuard = 3 | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
39 changes: 39 additions & 0 deletions
39
src/client/Microsoft.Identity.Client/ManagedIdentity/ComputeMetadataResponse.cs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| // Copyright (c) Microsoft Corporation. All rights reserved. | ||
| // Licensed under the MIT License. | ||
|
|
||
| using Microsoft.Identity.Client.Platforms.net; | ||
| using JsonProperty = System.Text.Json.Serialization.JsonPropertyNameAttribute; | ||
|
|
||
| namespace Microsoft.Identity.Client.ManagedIdentity | ||
| { | ||
| /// <summary> | ||
| /// Represents compute metadata retrieved from the Azure Instance Metadata Service (IMDS). | ||
| /// </summary> | ||
| [JsonObject] | ||
| [Preserve(AllMembers = true)] | ||
| internal class ComputeMetadataResponse | ||
| { | ||
| /// <summary>Operating system type (e.g., Windows, Linux).</summary> | ||
| [JsonProperty("osType")] | ||
| public string OsType { get; set; } | ||
|
|
||
| /// <summary> | ||
| /// Security profile indicating platform security posture. May be null when IMDS | ||
| /// does not return security profile information for the current VM. | ||
| /// </summary> | ||
| [JsonProperty("securityProfile")] | ||
| public ComputeSecurityProfile SecurityProfile { get; set; } | ||
| } | ||
|
|
||
| /// <summary> | ||
| /// Represents the security profile of an Azure VM from IMDS compute metadata. | ||
| /// </summary> | ||
| [JsonObject] | ||
| [Preserve(AllMembers = true)] | ||
| internal class ComputeSecurityProfile | ||
| { | ||
| /// <summary>Security type of the VM (e.g., TrustedLaunch, ConfidentialVM).</summary> | ||
| [JsonProperty("securityType")] | ||
| public string SecurityType { get; set; } | ||
| } | ||
| } |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.