Skip to content

Fix KnownInstanceMetadataIsUpToDateAsync: guard HTTP status codes on discovery endpoints#6048

Merged
gladjohn merged 2 commits into
mainfrom
copilot/fix-instance-discovery-integration-test
Jun 2, 2026
Merged

Fix KnownInstanceMetadataIsUpToDateAsync: guard HTTP status codes on discovery endpoints#6048
gladjohn merged 2 commits into
mainfrom
copilot/fix-instance-discovery-integration-test

Conversation

Copilot AI commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

KnownInstanceMetadataIsUpToDateAsync was throwing a cryptic System.Text.Json.JsonException: 'F' is an invalid start of a value in CI because login.windows-ppe.net (an internal/corp-only endpoint) returns 403 Forbidden from restricted agents — and the test was feeding that non-JSON body directly into JsonHelper.DeserializeFromJson without checking the HTTP status first.

Changes proposed in this request

  • HTTP status guards on both discovery calls — read the response body first, then check IsSuccessStatusCode. On failure, surface a clear message with the status code, reason phrase, and a truncated body snippet instead of letting the JSON parser choke.
  • PPE failure → Assert.Inconclusivelogin.windows-ppe.net is an environmental precondition (corp-only), not a product bug. On non-success, the test is now marked inconclusive with a descriptive message and exits before the assertion logic, so public-cloud coverage still ran.
  • Public endpoint failure → Assert.Faillogin.microsoftonline.com must be reachable; non-success there is a hard failure with the same diagnostic message format.
  • Truncate helper — private static Truncate(string s, int maxLength = 500) handles null/empty safely; used in both error messages to keep them readable.
if (!discoveryResponse.IsSuccessStatusCode)
{
    Assert.Fail(
        $"Public discovery endpoint returned {(int)discoveryResponse.StatusCode} " +
        $"{discoveryResponse.ReasonPhrase}. Body: {Truncate(discoveryJson)}");
}
// ...
if (!ppeDiscoveryResponse.IsSuccessStatusCode)
{
    Assert.Inconclusive(
        $"PPE discovery endpoint {validPpeDiscoveryUri} returned " +
        $"{(int)ppeDiscoveryResponse.StatusCode} {ppeDiscoveryResponse.ReasonPhrase}. " +
        $"This is typically an environmental issue (login.windows-ppe.net is restricted). " +
        $"Body: {Truncate(ppeDiscoveryJson)}");
}

No product code changed. The sovereign-cloud filter, InstanceDiscoveryMetadataEntryComparer, and KnownMetadataProvider are untouched. When both endpoints succeed, test behavior is identical to before.

Testing

Existing test logic is preserved end-to-end when both endpoints are reachable. The new paths (inconclusive/fail) are exercised when the respective endpoint returns non-2xx.

Performance impact

None.

Documentation

  • All relevant documentation is updated.
Original prompt

Problem

The integration test Microsoft.Identity.Test.Integration.HeadlessTests.InstanceDiscoveryIntegrationTests.KnownInstanceMetadataIsUpToDateAsync is failing with a confusing error:

System.Text.Json.JsonException: 'F' is an invalid start of a value. Path: $ | LineNumber: 0 | BytePositionInLine: 0.
---> System.Text.Json.JsonReaderException: 'F' is an invalid start of a value. LineNumber: 0 | BytePositionInLine: 0.
   at System.Text.Json.ThrowHelper.ThrowJsonReaderException(...)

Root cause

The test located at tests/Microsoft.Identity.Test.Integration.netcore/HeadlessTests/InstanceDiscoveryIntegrationTests.cs makes two raw HTTP GET calls and feeds the response body straight into JsonHelper.DeserializeFromJson<InstanceDiscoveryResponse> without checking the HTTP status code:

  1. https://login.microsoftonline.com/common/discovery/instance?... (public)
  2. https://login.windows-ppe.net/common/discovery/instance?... (PPE / pre-production)

The PPE endpoint login.windows-ppe.net is internal/restricted. From CI agents that aren't on the corp network / allowlist, it routinely returns a non-JSON error response (e.g. HTTP 403 with a body starting with Forbidden). That body's first byte F is what System.Text.Json is choking on at LineNumber: 0 | BytePositionInLine: 0.

This is environmental, not a regression in the recent sovereign-cloud filtering (Bleu/Delos/GovSG) — that code runs strictly after the deserialize call that threw.

Required changes

Edit tests/Microsoft.Identity.Test.Integration.netcore/HeadlessTests/InstanceDiscoveryIntegrationTests.cs, method KnownInstanceMetadataIsUpToDateAsync:

  1. Guard each HTTP call with a status-code check. If a discovery endpoint returns a non-success status, surface a clear message including the status code, reason phrase, and a snippet of the body, instead of letting System.Text.Json throw a cryptic 'F' is an invalid start of a value error.

  2. Treat PPE unreachability as inconclusive, not a failure. Because login.windows-ppe.net is an environmental precondition (not a product bug), use Assert.Inconclusive(...) when the PPE call fails. Keep failures on the public login.microsoftonline.com endpoint as a hard Assert.Fail (or just let it fail), since that endpoint must be reachable.

  3. Decouple PPE from the public-cloud assertion. If only PPE is unreachable, still verify the public metadata. Concretely: only concat actualPpeMetadata into the comparison if the PPE call succeeded; otherwise compare against just the public metadata (and note in the inconclusive message that PPE was skipped), OR split into a clean early-return after the inconclusive call. The simpler approach is: on PPE failure, call Assert.Inconclusive (which short-circuits the test) so the public-cloud half still ran the network call but we don't false-fail. Pick whichever is cleanest while keeping the existing assertion logic intact when both endpoints succeed.

Suggested shape (adapt as needed for code style in the file):

HttpResponseMessage discoveryResponse = await httpClient.SendAsync(
    new HttpRequestMessage(HttpMethod.Get, validDiscoveryUri)).ConfigureAwait(false);
string discoveryJson = await discoveryResponse.Content.ReadAsStringAsync().ConfigureAwait(false);
if (!discoveryResponse.IsSuccessStatusCode)
{
    Assert.Fail(
        $"Public discovery endpoint returned {(int)discoveryResponse.StatusCode} " +
        $"{discoveryResponse.ReasonPhrase}. Body: {Truncate(discoveryJson)}");
}

HttpResponseMessage ppeDiscoveryResponse = await httpClient.SendAsync(
    new HttpRequestMessage(HttpMethod.Get, validPpeDiscoveryUri)).ConfigureAwait(false);
string ppeDiscoveryJson = await ppeDiscoveryResponse.Content.ReadAsStringAsync().ConfigureAwait(false);
if (!ppeDiscoveryResponse.IsSuccessStatusCode)
{
    Assert.Inconclusive(
        $"PPE discovery endpoint {validPpeDiscoveryUri} returned " +
        $"{(int)ppeDiscoveryResponse.StatusCode} {ppeDiscoveryResponse.ReasonPhrase}. " +
        $"This is typically an environmental issue (login.windows-ppe.net is restricted). " +
        $"Body: {Truncate(ppeDiscoveryJson)}");
}

Where Truncate is a small local helper (or inline s.Substring(0, Math.Min(500, s.Length))) so the assertion message stays readable.

  1. Do not change the existing sovereignCloudsNotInDiscovery filter logic, the InstanceDiscoveryMetadataEntryComparer, or KnownMetadataProvider. The fix is scoped strictly to the HTTP/deserialization handling at the top of the test method.

  2. Properly dispose / using the HttpClient and HttpResponseMessage instances if the surrounding code style does so; otherwise leave allocation patterns as-is to minimize diff.

Acceptance criteria

  • When both discovery endpoints return JSON successfully, the test behaves identically to today (same comparison, same pass/fail outcome).
  • When login.windows-ppe.net returns a non-success status (e.g. 403), the...

This pull request was created from Copilot chat.

Copilot AI self-assigned this Jun 2, 2026
Copilot AI review requested due to automatic review settings June 2, 2026 18:05
Copilot AI review requested due to automatic review settings June 2, 2026 18:05
Copilot AI requested review from Copilot and removed request for Copilot June 2, 2026 18:09
Copilot AI changed the title [WIP] Fix instance discovery integration test for status code check Fix KnownInstanceMetadataIsUpToDateAsync: guard HTTP status codes on discovery endpoints Jun 2, 2026
Copilot AI requested a review from gladjohn June 2, 2026 18:10
@gladjohn gladjohn marked this pull request as ready for review June 2, 2026 18:21
Copilot AI review requested due to automatic review settings June 2, 2026 18:22
@gladjohn gladjohn requested a review from a team as a code owner June 2, 2026 18:22

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the KnownInstanceMetadataIsUpToDateAsync integration test by adding HTTP status-code guards before attempting to JSON-deserialize discovery responses, preventing cryptic JSON parsing failures when an endpoint returns non-JSON error content (notably login.windows-ppe.net in restricted environments).

Changes:

  • Read discovery response bodies and check IsSuccessStatusCode before deserialization, surfacing clear failure diagnostics (status code, reason phrase, truncated body).
  • Treat PPE discovery endpoint failures as environmental by marking the test Assert.Inconclusive, while keeping the public endpoint as a hard failure via Assert.Fail.
  • Add a small Truncate helper to keep diagnostic output readable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants