Skip to content
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
======

### New Features
- Added `ManagedIdentityApplication.GetManagedIdentityCapabilitiesAsync(CancellationToken)` returning a `ManagedIdentityCapabilities` object that reports the detected managed identity `Source`, the host's `MaxSupportedBindingStrength` (new `MtlsBindingStrength` enum: `Bearer`, `Software`, `KeyGuard`), and a derived `IsMtlsPopSupportedByHost`. Replaces `GetManagedIdentitySourceAsync()`/`ManagedIdentitySourceResult`. The public `ManagedIdentitySource.ImdsV2` value is folded into `Imds` (v1/v2 routing remains internal). [#6049](https://github.com/AzureAD/microsoft-authentication-library-for-dotnet/pull/6049)
Comment thread
Robbie-Microsoft marked this conversation as resolved.
Outdated
- Added `CacheOptions.DisableInternalCacheOptions` static property and `CacheOptions.IsInternalCacheDisabled` to allow disabling MSAL's internal token cache. Added `CacheRefreshReason.CacheDisabled` and `MsalError.InternalCacheDisabled` to support this scenario. [#5947](https://github.com/AzureAD/microsoft-authentication-library-for-dotnet/pull/5947)
- Added `AuthenticationResultExtensions.GetRefreshToken()` extension method for accessing refresh tokens from `AuthenticationResult`. [#5947](https://github.com/AzureAD/microsoft-authentication-library-for-dotnet/pull/5947)
- Added `WithAttributeTokens` and `WithExtraBodyParameters` extension methods on `AbstractConfidentialClientAcquireTokenParameterBuilder` for enhanced extensibility. [#5888](https://github.com/AzureAD/microsoft-authentication-library-for-dotnet/pull/5888)
Expand Down
150 changes: 150 additions & 0 deletions issue6046_check.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
## Background

Spawned from #6044 (region discovery migration to IMDS `/compute`) and the follow-up discussion on Teams with @gladjohn and @neha-bhargava. Capturing the design while it's fresh.

## Motivation
Comment thread
Robbie-Microsoft marked this conversation as resolved.
Outdated

**IMDS is throttled aggressively.** The instance-metadata service rate-limits at ~5 requests/sec/VM ([docs](https://learn.microsoft.com/azure/virtual-machines/instance-metadata-service#rate-limiting)). Today, every IMDS-on-VM consumer in MSAL.NET stands up its own call, cache, retry policy, and telemetry ΓÇö `RegionManager`, future bound-token capability detection, future cloud-environment validation, future zone-aware routing, etc. With both `ConfidentialClientApplication` and `ManagedIdentityApplication` running in the same process and each spawning multiple consumers, the rate limit is genuinely reachable under load. That's the real pain ΓÇö duplication is secondary.

The newly-adopted `/metadata/instance/compute` (api-version `2021-02-01`) endpoint returns a full instance document (`location`, `zone`, `azEnvironment`, `vmId`, `vmSize`, `osType`, `securityProfile.{securityType, secureBootEnabled, virtualTpmEnabled, encryptionAtHost}`, `extendedLocation`, `subscriptionId`, `resourceGroupName`, etc.). One fetch yields data useful for many consumers ΓÇö provided we share it.

## Proposal

Introduce a process-wide, lazy, single-flight `/compute` snapshot **at the PlatformProxy layer**, so it is reachable from both CCA and MIA without duplicating the IMDS pipeline:

- One IMDS `/compute` call per process on first need.
- Single-flight: concurrent first-time callers share the in-flight `Task`.
- Consumers read typed properties off the cached snapshot (`Location`, `AzEnvironment`, `Zone`, `VmId`, `SecurityProfile`, ...) instead of issuing their own IMDS call.
- `RegionManager` becomes the first consumer; existing region-specific cache is replaced by a thin accessor over the snapshot.

### Concrete consumers (not hypothetical)

| Consumer | Fields needed | Notes |
|---|---|---|
| **Region discovery** | `location` | Existing #6044 consumer; first to migrate. |
| **Bound-token / mTLS PoP capability detection** | `securityProfile.securityType` (`TrustedLaunch` / `ConfidentialVM` / `Standard`), `securityProfile.secureBootEnabled`, `securityProfile.virtualTpmEnabled` | Determines whether a VM/VMSS is capable of bound tokens. Active feature area; concrete near-term consumer. |
| **Cloud-environment validation** | `azEnvironment` | Cross-check against authority/instance discovery. |
| **Zone-aware routing** | `zone`, `extendedLocation` | Future; but available "for free" once the snapshot exists. |

Both `ConfidentialClientApplication` (region) and `ManagedIdentityApplication` (bound-token capability) are first-class consumers ΓÇö having the cache at PlatformProxy means a single fetch serves both.

## Out of scope

- **Managed Identity host/source detection.** App Service, Functions, Azure Arc, Service Fabric, Cloud Shell, and Container Apps don't host `/compute` at all ΓÇö they're detected via env vars (`IDENTITY_ENDPOINT`, `IDENTITY_HEADER`, `IMDS_ENDPOINT`, etc.) and probe completely different URLs. The existing host-detection ladder is unchanged. `/compute` is consulted only **after** the host has been identified as VM/VMSS-IMDS.
- **Token endpoints** (`/metadata/identity/oauth2/token`). Unrelated; out of scope.

## Failure semantics (the load-bearing design question)

This is the part Neha specifically called out: *"what happens if the data is not fetched the first time due to IMDS endpoint not being available?"* Today's `RegionManager` answer ("cache failure for process lifetime, never retry") is fine when region's the only consumer because the fallback is harmless (global ESTS). With multiple shared consumers, that policy is too coarse ΓÇö a transient blip would permanently blind every consumer.

Proposed policy:

| Failure class | Examples | Cache duration | Rationale |
|---|---|---|---|
| **Structural** | Not on a VM (connection refused / 404 / DNS fail to `169.254.169.254`) | Process lifetime | Will not change for the life of the process; cheap to remember. |
| **Transient** | `5xx`, `408`, `429`, network timeout | Short TTL (e.g., 30 s) | A blip should not permanently blind every consumer. Bounded re-fetch protects against thundering herd against IMDS while allowing recovery. |
| **API-version drift** | `400` with `newest-versions` | Existing api-version probe behavior ΓÇö retry within the same call | Already covered by `RegionManager.GetImdsUriApiVersionAsync` pattern. |
| **Success** | `200` with parseable `location` | Process lifetime | Compute metadata is effectively immutable for the process. |

Snapshot API exposes `TryGet` / `IsAvailable` rather than throwing. **Per-consumer fallback policy stays with the consumer**:

- Region discovery → silent fallback to global ESTS (unchanged).
- Bound-token capability → fallback to bearer token (or whatever MIA's degraded mode is).
- Cloud-environment validation → log + skip cross-check.

This keeps the snapshot neutral and lets each consumer pick its own degraded mode.

### Throttling considerations

- One process-wide fetch puts the snapshot's IMDS load at O(1) for the lifetime of the process on the success path.
- Transient-failure re-fetch is bounded by TTL → at most `1 / TTL` calls/sec per process under sustained IMDS unavailability. With a 30 s TTL that's well below the 5 req/s/VM ceiling even with multiple processes.
- Single-flight on first fetch prevents thundering-herd across consumers.

## API shape (sketch, non-normative)

`csharp
internal interface IImdsComputeSnapshot
{
Task<ImdsComputeResult> GetAsync(CancellationToken cancellationToken);
}

internal sealed class ImdsComputeResult
{
public ImdsComputeStatus Status { get; } // Available, NotAvailable, TransientError
public ImdsComputeMetadata Metadata { get; } // null unless Available
public string FailureReason { get; } // null unless not Available
}

internal sealed class ImdsComputeMetadata
{
public string Location { get; }
public string AzEnvironment { get; }
public string Zone { get; }
public string VmId { get; }
public string VmSize { get; }
public string OsType { get; }
public string SubscriptionId { get; }
public string ResourceGroupName { get; }
public ImdsSecurityProfile SecurityProfile { get; }
// ... additional fields as consumers need them
}

internal sealed class ImdsSecurityProfile
{
public ImdsSecurityType SecurityType { get; } // Standard / TrustedLaunch / ConfidentialVm / Unknown
public bool SecureBootEnabled { get; }
public bool VirtualTpmEnabled { get; }
public bool EncryptionAtHostEnabled { get; }
}
`

Concrete shape (interface name, public vs. internal, normalization of IMDS string-bools to `bool`, enum vs. string for `securityType`) is for the spec PR. Above is a sketch only.

## Layering

The snapshot lives at the **PlatformProxy** layer (or a new sibling helper that PlatformProxy exposes), so:

- Both `ConfidentialClientApplicationBuilder` and `ManagedIdentityApplicationBuilder` paths can reach it via the existing `IPlatformProxy` injection.
- Tests can swap it in via the existing `IPlatformProxy` mocking pattern.
- It is reset per `PublicClientApplication` only at process scope; **not** per `CCA` / `MIA` instance, since IMDS data is host-machine state.

## Telemetry

- Add `ImdsSnapshotFetchOutcome` (`Success` / `StructuralFailure` / `TransientFailure` / `ApiVersionRetry`) to the existing ApiEvent surface, emitted on first fetch only.
- Existing `RegionAutodetectionSource` / `RegionOutcome` / `RegionDiscoveryFailureReason` semantics preserved verbatim ΓÇö region telemetry layered on top of the snapshot.
- Per-consumer telemetry (e.g., a future `BoundTokenCapability`) decoupled from snapshot telemetry.

## Migration plan

1. **Phase 0 ΓÇö done.** #6044 swaps `RegionManager` to `/compute` with no caching changes.
2. **Phase 1 ΓÇö this issue.** Introduce `IImdsComputeSnapshot` at PlatformProxy. `RegionManager` re-routed through it. Behavior-identical for region. Adds the `securityProfile` / `azEnvironment` / `zone` fields to the DTO so subsequent consumers don't need another DTO change.
3. **Phase 2.** First non-region consumer (likely bound-token capability detection) reads from the snapshot.

## Test plan (skeleton)

- First call → IMDS hit → `Available`; `location` and other fields populated.
- Concurrent first-call → single IMDS hit shared across awaiters.
- Repeat call after success → no IMDS hit; same result.
- `404` / connection-refused → `NotAvailable`; subsequent calls cached for process lifetime.
- `500` / `429` / timeout → `TransientError`; re-fetched after TTL elapses; not before.
- `400` → existing api-version probe → success on retry; treated as success.
- `200` with malformed JSON → `TransientError` (since IMDS returning malformed JSON is itself transient/anomalous).
- `200` with missing/empty `location` → `Available` for non-region fields; region-specific consumers get `FailedAutoDiscovery` (existing behavior).
- `RegionManager` tests after re-routing ΓÇö no behavior change observable.

## Open questions

- Visibility: does the snapshot need a public read API for hosts that want to reuse the data, or stay strictly internal? (Lean: internal until a clear public consumer emerges.)
- Forced refresh: do any consumers ever need to bypass the cache? (Lean: no ΓÇö process-lifetime caching is correct for compute metadata.)
- Multi-process / multi-AppDomain: out of scope ΓÇö caching is per `AppDomain` like today's region cache.

## References

- #6039 ΓÇö original request to migrate region discovery to `/compute`.
- #6044 ΓÇö region discovery migration spec (Phase 0). Future Work section in that spec links back here.
- `src/client/Microsoft.Identity.Client/Instance/Region/RegionManager.cs` ΓÇö first consumer.
- `src/client/Microsoft.Identity.Client/PlatformsCommon/Interfaces/IPlatformProxy.cs` ΓÇö proposed home for the snapshot service.
- IMDS docs:
- Compute schema: https://learn.microsoft.com/azure/virtual-machines/instance-metadata-service
- Rate limiting: https://learn.microsoft.com/azure/virtual-machines/instance-metadata-service#rate-limiting

Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.

namespace Microsoft.Identity.Client.AppConfig
{
/// <summary>
/// Describes the strength with which a token can be bound to a cryptographic key on the
/// current host. Higher values indicate stronger binding. The value reflects what the host
/// is capable of producing, not what a particular request used.
/// </summary>
/// <remarks>
/// This type is shared by managed identity and confidential client mTLS Proof-of-Possession
/// scenarios. A value greater than <see cref="None"/> means the host can bind a token to a
/// key; it does <b>not</b> by itself imply hardware attestation. Attestation corresponds to
/// the <see cref="KeyGuard"/> tier specifically.
/// </remarks>
public enum MtlsBindingStrength
{
/// <summary>
/// No key binding is available, so the host cannot perform mTLS Proof-of-Possession. This
/// is the floor of the range (for example, on .NET Framework 4.6.2, which does not support
/// PoP).
/// </summary>
None = 0,

/// <summary>
/// The token can be bound to a software-backed key (for example, a persisted CNG key on
/// Windows, or a software RSA key elsewhere). The key is not hardware-isolated.
/// </summary>
Software = 1,
Comment thread
Robbie-Microsoft marked this conversation as resolved.

// 2 is reserved for a future tier (for example, TPM-backed keys).
Comment thread
gladjohn marked this conversation as resolved.

/// <summary>
/// The token can be bound to a key isolated by Virtualization-based Security (VBS), such
/// as KeyGuard on a Trusted Launch (TVM) or Confidential (CVM) virtual machine. This is
/// the only tier that implies hardware-backed attestation.
/// </summary>
KeyGuard = 3
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,11 @@ protected AbstractManagedIdentity(RequestContext requestContext, ManagedIdentity
_sourceType = sourceType;
}

// True only for the IMDSv1 source. IMDSv1 and IMDSv2 both report
// <see cref="ManagedIdentitySource.Imds"/> publicly, so this flag preserves the
// v1-specific MSIv1 claims validation without relying on the (folded) source label.
protected virtual bool RequiresMsiV1ClaimsValidation => false;

private const string XmsAzNwperimid = "xms_az_nwperimid";

public virtual async Task<ManagedIdentityResponse> AuthenticateAsync(
Expand Down Expand Up @@ -65,16 +70,16 @@ public virtual async Task<ManagedIdentityResponse> AuthenticateAsync(
// ignoring the value and polluting the cache with keys the endpoint never saw.
if (!string.IsNullOrEmpty(parameters.ClientClaims))
{
if (_sourceType != ManagedIdentitySource.Imds && _sourceType != ManagedIdentitySource.ImdsV2)
if (_sourceType != ManagedIdentitySource.Imds)
{
throw new MsalClientException(
MsalError.InvalidRequest,
$"WithClaimsFromClient is only supported for IMDS-based managed identity sources. " +
$"The detected source is {_sourceType}. " +
"Only ManagedIdentitySource.Imds and ManagedIdentitySource.ImdsV2 support the 'claims' parameter.");
"Only ManagedIdentitySource.Imds supports the 'claims' parameter.");
}

if (_sourceType == ManagedIdentitySource.Imds)
if (RequiresMsiV1ClaimsValidation)
{
ValidateMsiv1Claims(parameters.ClientClaims);
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.

using Microsoft.Identity.Client.Platforms.net;
using JsonProperty = System.Text.Json.Serialization.JsonPropertyNameAttribute;

namespace Microsoft.Identity.Client.ManagedIdentity
{
/// <summary>
/// Represents compute metadata retrieved from the Azure Instance Metadata Service (IMDS).
/// </summary>
[JsonObject]
[Preserve(AllMembers = true)]
internal class ComputeMetadataResponse
{
/// <summary>Operating system type (e.g., Windows, Linux).</summary>
[JsonProperty("osType")]
public string OsType { get; set; }

/// <summary>
/// Security profile indicating platform security posture. May be null when IMDS
/// does not return security profile information for the current VM.
/// </summary>
[JsonProperty("securityProfile")]
public ComputeSecurityProfile SecurityProfile { get; set; }
}

/// <summary>
/// Represents the security profile of an Azure VM from IMDS compute metadata.
/// </summary>
[JsonObject]
[Preserve(AllMembers = true)]
internal class ComputeSecurityProfile
{
/// <summary>Security type of the VM (e.g., TrustedLaunch, ConfidentialVM).</summary>
[JsonProperty("securityType")]
public string SecurityType { get; set; }
}
}
Loading
Loading