Skip to content

fix(telemetry): remove duplicate HTTP client spans#5668

Merged
thomhurst merged 9 commits intomainfrom
fix/duplicate-http-client-spans
Apr 22, 2026
Merged

fix(telemetry): remove duplicate HTTP client spans#5668
thomhurst merged 9 commits intomainfrom
fix/duplicate-http-client-spans

Conversation

@thomhurst
Copy link
Copy Markdown
Owner

Summary

  • HTTP propagation handlers (TUnitBaggagePropagationHandler for Aspire, ActivityPropagationHandler for ASP.NET Core) are now pure header propagators — no synthesized client span. Both delegate to a shared HttpActivityPropagator.Inject(Activity?, HttpRequestHeaders) in TUnit.Core.
  • TUnitTestCorrelationProcessor tags the tunit.test.id on both OnStart and OnEnd so ASP.NET Core server spans (remote-context parent, baggage populated by the propagator after StartActivity returns) still get tagged.
  • AspireHttpSourceName (unreleased, added in feat(aspire): emit client spans for HTTP #5666) removed; AspNetCoreHttpSourceName marked [Obsolete] — kept for binary compatibility since it shipped in v1.36.0.

Context

Originally reported via a trace timeline showing two identical POST spans per HTTP call under Aspire:

POST 324ms              ← TUnit.Aspire.Http   (synthesized in #5666)
  POST 324ms            ← System.Net.Http     (runtime, always there)
    HTTP wait_for_connection
  POST /api/products/   ← ASP.NET Core server

AspireFixture.CreateHttpClient wraps a real SocketsHttpHandler, so the runtime already emits a client span via System.Net.Http. The synthesized span was a second one.

The same cleanup applies to the ASP.NET Core side even though it didn't have a visible duplicate: ActivityPropagationHandler is also injected into every SUT-side IHttpClientFactory pipeline via TUnitHttpClientFilter, and that path does end in SocketsHttpHandler — so a SUT with HTTP instrumentation would also have seen duplicates there.

Could we have reused the runtime's DiagnosticsHandler? No — it's internal sealed, can't instantiate without reflection (violates AOT rule), and its tag shape drifts across .NET versions. Since the WAF in-memory case has no client span at all (neither runtime nor ours), the cleanest answer is to let the ASP.NET Core server span be a direct child of the ambient test Activity. Trace topology becomes:

test body
  POST /api/products/  ← ASP.NET Core server span, parent = test activity

Trace-correlation tag fix

Moving tagging to OnEnd fixes a latent issue: the processor previously read Activity.Current.GetBaggageItem(TagTestId) at OnStart, but ASP.NET Core Hosting populates baggage via Activity.AddBaggage(...) after ActivitySource.StartActivity returns. The synthesized client span used to mask this by being tagged itself and passing the test's "some span has the tag" assertion. With the synthesized span gone, the server span must be the tagged one — hence OnEnd.

Test plan

  • TUnit.AspNetCore.Tests — 35/35 pass (including AutoWires_TagsAspNetCoreSpans_WithTestId which now polls briefly for server-activity end)
  • TUnit.Aspire.Tests/BaggagePropagationHandlerTests — 10/10 pass (rewritten to assert on outgoing headers rather than synthesized spans)
  • TUnit.OpenTelemetry.Tests — 29/29 pass (new Processor_TagsOnEnd_WhenBaggageAddedAfterStart covers the new path)
  • TUnit.PublicAPI snapshots — 4/4 pass (reflects removed const + new OnEnd override + [Obsolete] attribute)
  • Verify in a real Aspire dashboard trace that the duplicate POST rows disappear

Strip span synthesis from both HTTP propagation handlers. .NET's
System.Net.Http ActivitySource already emits a properly-shaped client
span for real-socket traffic (Aspire), and the ASP.NET Core server span
carries HTTP semconv tags for in-memory WAF traffic — synthesizing a
second client span produced duplicate rows in trace timelines.

Both handlers collapse to a single call to a shared
HttpActivityPropagator that only injects traceparent + baggage so the
SUT can correlate requests to the originating test.

TUnitTestCorrelationProcessor now tags in OnEnd as well as OnStart:
spans with a remote-context parent (ASP.NET Core server spans created
from extracted traceparent) receive baggage via the propagator after
StartActivity returns, so OnStart alone couldn't see it — the previous
topology masked this by tagging the synthesized client span instead.

Public-API impact: AspireHttpSourceName removed (shipped only in the
unreleased #5666); AspNetCoreHttpSourceName marked [Obsolete] for
binary compatibility.
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: fix(telemetry): remove duplicate HTTP client spans

Overall: Excellent cleanup. This PR correctly identifies and resolves a real architectural problem — the synthetic client spans were doubling up traces — and the resulting code is dramatically simpler and more correct. The diagnosis in the PR description is thorough and accurate.


Architecture: Strong improvement

The core change is sound: both ActivityPropagationHandler and TUnitBaggagePropagationHandler shed ~144 lines of span-management logic in favour of a single 3-line delegation to the new HttpActivityPropagator.Inject. The shared utility in TUnit.Core is the right level of abstraction — it avoids duplicating the DistributedContextPropagator.Current.Inject + W3C-baggage fallback logic across two handlers.

The trace topology is now correct:

test body
  POST /api/products/  ← ASP.NET Core server span, parent = test activity
    System.Net.Http client span (runtime-emitted, real sockets only)

Observation: TryTag fallback + OnEnd interaction

TUnitTestCorrelationProcessor.TryTag has a subtle execution path worth considering:

var testId = activity.GetBaggageItem(TUnitActivitySource.TagTestId);
if (testId is null && !ReferenceEquals(Activity.Current, activity))
{
    testId = Activity.Current?.GetBaggageItem(TUnitActivitySource.TagTestId);
}

When OnEnd calls TryTag, the span being ended is typically still Activity.Current (it reverts to parent only after Stop()). The !ReferenceEquals guard prevents the Activity.Current fallback from firing on self, which is correct. For the ASP.NET Core server-span case being fixed, baggage is available via activity.GetBaggageItem() directly at OnEnd time — the Activity.Current fallback isn't exercised there.

There is one theoretical edge: if OnStart runs and neither the activity's own baggage chain nor Activity.Current's baggage contains the test ID, OnStart skips tagging. Later, OnEnd fires — but Activity.Current may now be a different test's activity in a concurrent-test scenario. The idempotency guard at the top of TryTag won't help because nothing was tagged in OnStart. This could cause a span to be incorrectly attributed to another test.

This is a pre-existing risk with OnStart too, and the new test Processor_TagsOnEnd_WhenBaggageAddedAfterStart correctly covers the intended scenario. If concurrent-test cross-contamination is a concern, explicitly binding the fallback check to the activity's trace ID (rather than Activity.Current) could make it more robust.


Minor: Polling loop CancellationToken edge case

In AutoConfigureOpenTelemetryTests.cs:

while (!cts.IsCancellationRequested)
{
    taggedSpan = _exported.FirstOrDefault(...);
    if (taggedSpan is not null) break;
    await Task.Delay(20, cts.Token);  // ← throws TaskCanceledException
}
await Assert.That(taggedSpan).IsNotNull();

If the 2-second timeout fires exactly while Task.Delay(20, cts.Token) is awaited, a TaskCanceledException propagates and bypasses the Assert.That assertion — the test surface-fails on cancellation rather than surfacing a clean "span not found" message. In practice this is a very narrow race, but wrapping the delay in a try/catch (OperationCanceledException) { break; } would make the failure mode clearer. Alternatively, Task.Delay(20).WaitAsync(cts.Token) doesn't throw on external cancellation.


AspireHttpSourceName removal vs AspNetCoreHttpSourceName deprecation

The asymmetric treatment (hard removal vs [Obsolete]) is justified by the PR's claim that AspireHttpSourceName shipped in #5666 but not in a release. If that's accurate, the hard removal avoids perpetuating a public constant for a source that never emits. Worth confirming the release history before merge.


Test quality: clear improvement

The reworked unit tests are strictly better — they verify observable header semantics rather than internal span-creation mechanics. SendAsync_ExistingBaggageHeader_IsPreserved is a good addition covering the "don't overwrite" invariant.

Removing ActivityListenerScope + RecordedActivity from both test files is the right call; those were testing implementation details of the now-deleted span creation logic.


Summary

This is a well-motivated, well-executed fix. The primary concern is the Activity.Current fallback in TryTag when called from OnEnd under concurrent tests — worth considering whether the current guard is sufficient. The polling edge case in the integration test is minor. Ship confidence is high.

@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Apr 22, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics -25 complexity

Metric Results
Complexity -25

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes. Give us feedback

- TUnitTestCorrelationProcessor: replace Activity.Current fallback with
  TraceRegistry.GetContextId keyed on the activity's own TraceId. The
  previous ambient-current fallback could cross-attribute a span to a
  different concurrent test if the span was stopped on a thread whose
  Activity.Current had swung to another test's context. Trace-id lookup
  is bound to the span itself and can't mis-attribute.
- AutoConfigureOpenTelemetryTests: polling loop now uses a monotonic
  deadline instead of CancellationToken threaded into Task.Delay, so
  timeout surfaces as "no tagged span found" rather than a
  TaskCanceledException.
- Added Processor_FallsBackToTraceRegistry_WhenActivityHasNoBaggage
  covering the new fallback path.
@thomhurst
Copy link
Copy Markdown
Owner Author

Thanks for the review — addressed in a516d9e.

TryTag fallback concurrency concern: Good catch. Switched the fallback from Activity.Current?.GetBaggageItem to TraceRegistry.GetContextId(activity.TraceId.ToString()). TraceRegistry is already populated in TestExecutor.cs:155 with traceId → contextId when each test activity starts, so the lookup is bound to the span's own trace ID rather than whatever ambient context happens to be current at OnEnd. That eliminates the cross-attribution risk under concurrent tests when a span is stopped on a thread whose Activity.Current has swung to another test.

Added Processor_FallsBackToTraceRegistry_WhenActivityHasNoBaggage to cover the new path.

Polling loop CancellationToken race: Switched to a monotonic Environment.TickCount64 deadline so Task.Delay no longer takes the token. A timeout now produces the clean "no tagged span found" assertion failure rather than a TaskCanceledException.

AspireHttpSourceName release history: Confirmed via git log -S. Added in commit ecb01f89b (the PR feature branch for #5666) and merged to main as 572118965 — no intermediate release tag. First appearance in the public-API verified snapshots is also only in that commit. Safe to hard-remove.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: fix(telemetry): remove duplicate HTTP client spans

Both concerns raised in any previous review have been correctly addressed. This is a clean, well-structured PR.


Resolved concerns

1. TryTag concurrent cross-attribution risk

The fix is correct and robust: TraceRegistry.GetContextId(activity.TraceId.ToString()) binds the lookup to the span's own trace ID, which is immutable and unique per test. No ambient-context dependency remains in TUnitTestCorrelationProcessor.TryTag. The new Processor_FallsBackToTraceRegistry_WhenActivityHasNoBaggage test covers this path cleanly.

2. Polling loop TaskCanceledException race

The CancellationTokenSource-based loop in AutoConfigureOpenTelemetryTests.cs is replaced with a monotonic Environment.TickCount64 deadline and uncancelled Task.Delay(20). A timeout now falls through to the Assert.That(taggedSpan).IsNotNull() assertion and produces a clear failure message rather than an unrelated TaskCanceledException. Good fix.


Remaining observations (informational, not blocking)

HttpActivityPropagator is #if NET-gated and internal

TUnit.Core/HttpActivityPropagator.cs is wrapped in #if NET. This is consistent with the rest of the telemetry code and is fine for current use cases. Worth noting that if a future consumer needs this utility on netstandard it will need to be refactored, but that's a bridge to cross if/when needed.

ActivityPropagationHandler.SendAsync uses fully-qualified System.Diagnostics.Activity.Current

HttpActivityPropagator.Inject(System.Diagnostics.Activity.Current, request.Headers);

The TUnitBaggagePropagationHandler equivalent uses using System.Diagnostics; and just writes Activity.Current. The fully-qualified form is harmless but slightly inconsistent — a using System.Diagnostics; import would make both files read uniformly. Minor nit.

TryBuildBaggageHeader double-emit edge case

HttpActivityPropagator.Inject calls DistributedContextPropagator.Current.Inject first, then checks headers.Contains(TUnitActivitySource.BaggageHeader) and calls TryBuildBaggageHeader only if absent. This is the correct order. One theoretical edge: if the configured propagator emits an empty baggage header, the Contains check returns true and TryBuildBaggageHeader is skipped, meaning TUnit's own baggage items won't be in the header. Unlikely configuration, but worth being aware of.

TraceRegistry.Register in unit tests writes to a process-global dictionary

Processor_FallsBackToTraceRegistry_WhenActivityHasNoBaggage calls TraceRegistry.Register(traceId, ...) with a randomly-generated child.TraceId. The random TraceId makes collision essentially impossible, and the registration persists for the process lifetime as a minor leak — but since TraceRegistry is already designed as a process-global store, this is acceptable in a test context.


Summary

The approach is architecturally sound — pure header propagation (no synthetic spans), TraceRegistry-anchored fallback for concurrent test safety, and clean public-API deprecation with [Obsolete]. The core design avoids the ambient-context pitfalls that plagued the previous iteration. Ready to merge.

Minor consistency nit surfaced in review — match the import style used
by the Aspire handler.
@thomhurst
Copy link
Copy Markdown
Owner Author

Thanks for the follow-up review. Picked up the import-style nit in da533fbSystem.Diagnostics is now imported in ActivityPropagationHandler.cs to match the Aspire handler.

Skipping the other two observations as non-blocking:

  • Empty baggage header edge case: no propagator in the BCL (W3C, Legacy, Pass-through) emits an empty baggage header — W3C simply omits it when there's nothing to emit. Guarding against a hypothetical custom propagator that emits baggage: (empty) would add complexity for a contract violation on the propagator's side. Skipped.
  • HttpActivityPropagator #if NET gating: consistent with TUnitActivitySource and the handlers themselves; netstandard2.0 doesn't carry the DistributedContextPropagator API anyway.
  • TraceRegistry.Register leak in test: reviewer already flagged as acceptable given process-global design.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: fix(telemetry): remove duplicate HTTP client spans

This is a clean, well-motivated refactor. The root cause analysis in the PR body is thorough and the solution is proportionate — removing the synthesized spans entirely rather than trying to conditionally suppress them is the right architectural call. The 629-line deletion speaks for itself.


✅ What's done well

Architecture decision is correct. Removing synthetic client spans and delegating to the runtime's System.Net.Http ActivitySource (for Aspire real-socket traffic) and directly parenting the ASP.NET Core server span to the test activity (for in-memory WebApplicationFactory traffic) is exactly the right model. No need to re-implement what the runtime already does.

HttpActivityPropagator extraction is the right abstraction. Pulling the shared header-injection logic into TUnit.Core eliminates the duplication between ActivityPropagationHandler and TUnitBaggagePropagationHandler. Both handlers are now trivially thin — a one-liner each — which is exactly where they should be.

OnEnd tagging fix is correct and well-explained. The ASP.NET Core hosting model populates baggage after StartActivity returns when propagating from a remote traceparent, so reading baggage in OnStart was always unreliable for server spans. Calling TryTag from both OnStart (fast path for in-process spans) and OnEnd (deferred path for remote-context spans) handles both cases cleanly. The GetTagItem idempotency guard ensures double-tagging never occurs.

Binary compatibility is handled correctly. [Obsolete] on AspNetCoreHttpSourceName for a shipped API, hard removal for the never-released AspireHttpSourceName. Test coverage (including new processor tests) is thorough.


Observations / Questions

1. HttpActivityPropagator.Inject method visibility

The class is internal but the method is public:

internal static class HttpActivityPropagator
{
    public static void Inject(Activity? activity, HttpRequestHeaders headers) // ← should be internal

In .NET convention, members of internal types are scoped to the assembly anyway, but using internal on the method itself makes the intent explicit and matches how other internal helpers in TUnit are styled. Minor, but worth making consistent.

2. Test polling loop is acceptable but worth a comment on the fragility

// AutoConfigureOpenTelemetryTests.cs
var deadline = Environment.TickCount64 + 2_000;
while (Environment.TickCount64 < deadline)
{
    taggedSpan = _exported.FirstOrDefault(...);
    if (taggedSpan is not null) break;
    await Task.Delay(20);
}

The comment explains why polling is needed (server activity stops on a continuation that outlives the response), which is good. The 2-second budget is generous for CI. One thing to be aware of: the test relies on the exporter batch flush completing within that window. If the exporter has a long flush interval (e.g. a slow OTLP endpoint), the exported collection might not be populated in time in integration test runs against a real backend. For the unit-test-with-in-memory-exporter case used here this is fine.

3. OnEnd mutation is safe, but the reasoning is non-obvious

Calling activity.SetTag(...) in OnEnd works because the OTel SDK processes all BaseProcessor.OnEnd callbacks synchronously inside Activity.Stop(), before export. But this ordering guarantee is not obvious to a future reader — they might worry that the exporter has already captured the activity state at OnStart. A brief comment in TryTag or in the class doc noting "tag mutations in OnEnd are visible to exporters because export runs after all processor callbacks complete" would help.

4. Confirm AspireHttpSourceName was never shipped

The PR body says it was added in #5666 and is "unreleased." If there's any chance it made it into a nuget package push between those two PRs, removal is a breaking change for any consumers who wired it up. The [Obsolete] path for AspNetCoreHttpSourceName was explicitly justified because it "shipped in v1.36.0." Worth a quick check of release tags/CHANGELOG to confirm the Aspire source name never appeared in a published package before hard-removing it. If uncertain, [Obsolete] costs nothing and keeps the option open.

5. Exception propagation behavior change (intended, but worth noting in changelog)

The old handlers caught exceptions from the inner handler to annotate the (now-removed) client span with error.type, exception event, and ActivityStatusCode.Error. That annotation no longer happens — exceptions from the inner handler now propagate transparently. This is correct (no span to annotate), but users who relied on the span-level exception recording in backends like Jaeger would no longer see it there. Probably worth a brief changelog note so users aren't surprised if their error traces change shape.


Summary

Excellent PR. The architectural simplification is large and justified. The deferred-baggage tagging fix (OnEnd) is subtle but correctly implemented and tested. The main asks are: make Inject internal on the method, confirm AspireHttpSourceName was truly unreleased, and consider a changelog note about the exception-annotation behavior change. None of these block the merge.

- HttpActivityPropagator.Inject: public -> internal. Class is internal,
  so the method visibility was redundant; explicit internal matches
  other internal helpers in TUnit.
- TUnitTestCorrelationProcessor: add a remarks paragraph explaining
  that OnEnd tag writes are visible to deferred-serialization exporters
  (BatchExportProcessor, InMemoryExporter). Synchronous exporters need
  this processor to be registered before them to observe the tag.
@thomhurst
Copy link
Copy Markdown
Owner Author

Addressed the two actionable observations in 84b195e:

  1. HttpActivityPropagator.Inject method visibilityinternal to match the enclosing class and the rest of TUnit's internal helper style.
  2. OnEnd mutation visibility reasoning → added a remarks paragraph to TUnitTestCorrelationProcessor noting that tag writes during OnEnd are visible to deferred-serialization exporters (BatchExportProcessor, InMemoryExporter) and that synchronous exporters need this processor registered before them.

Not taking:

  • Polling loop exporter-flush caveat: the test uses AddInMemoryExporter which captures by reference, so no flush-window fragility. The comment already captures why polling is needed.
  • AspireHttpSourceName release confirmation: confirmed earlier via git log -S "AspireHttpSourceName" — only commits are ecb01f89b (feature branch) and 572118965 (merge to main, unreleased). Also verified first appearance in public-API snapshots is only on those commits.
  • Exception-annotation behavior changelog note: the PR body already states "span synthesis is removed" which implicitly removes the span-level error annotations. The errors are still recorded on the runtime's System.Net.Http client span (Aspire) and ASP.NET Core server span (WAF) where they naturally belong — no one should be filtering on TUnit-synthesized spans specifically.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review #4

Previous reviews (1–3) raised several concerns. Here's the status:

All prior concerns addressed ✅

  • Activity.Current cross-test attribution → fixed via TraceRegistry.GetContextId(activity.TraceId.ToString()) (immutable trace ID, no ambient context)
  • CancellationToken timeout surfacing as a confusing exception → fixed with Environment.TickCount64 deadline + uncancelled Task.Delay(20)
  • Inject method visibility (public on internal class) → now correctly internal static void Inject
  • OnEnd mutation ordering guarantee → documented in the class-level <remarks> on TUnitTestCorrelationProcessor

Remaining concerns

1. Behavior change: exception span annotation silently removed (non-blocking, changelog-worthy)

The old handlers caught exceptions from base.SendAsync and annotated the synthesized client span with error.type and an exception event. The new pure propagators let exceptions propagate without any span annotation:

// Old behavior (removed):
catch (Exception ex) {
    TUnitActivitySource.RecordException(activity, ex);
    throw;
}

For Aspire's real-socket path this is fine — .NET's System.Net.Http ActivitySource emits the exception annotation on the runtime-owned client span. But for the WebApplicationFactory in-memory path there is now no client span at all, so transport errors become completely invisible in traces. The new SendAsync_PropagatesInnerHandlerException test correctly asserts the exception propagates, but nothing verifies the absence of an error annotation (which is intentional but silent).

Recommendation: Add a CHANGELOG entry noting this behavior change. Users who currently see error.type or exception events on TUnit.AspNetCore.Http spans will silently lose that data after upgrading.


2. AspireHttpSourceName hard-removed without [Obsolete] (needs confirmation)

AspNetCoreHttpSourceName is marked [Obsolete] for binary compat (it shipped in v1.36.0). AspireHttpSourceName is simply deleted. The PR body says "never shipped publicly" — if that holds, the hard removal is fine. But if any preview or pre-release NuGet included it, consumers referencing TUnitActivitySource.AspireHttpSourceName will get a compile error with no deprecation guidance.

Recommendation: Verify against the NuGet release history (e.g., check https://www.nuget.org/packages/TUnit.Aspire for any published versions that included this constant). If any published package shipped it, apply the same [Obsolete] treatment as AspNetCoreHttpSourceName.


3. SimpleExportProcessor ordering constraint not surfaced in user-facing docs (minor, informational)

The TUnitTestCorrelationProcessor class remarks correctly document:

Synchronous pipelines (e.g. SimpleExportProcessor) that serialize inside their own OnEnd only observe the tag if this processor is registered before them.

However, this constraint doesn't appear in docs/docs/examples/opentelemetry.md's custom pipeline example. A user who wires up their own SimpleExportProcessor before TUnitTestCorrelationProcessor will miss the tunit.test.id tag silently.

Recommendation: Add a one-line ordering note to the custom pipeline example in opentelemetry.md. Something like: "Register TUnitTestCorrelationProcessor before any synchronous (Simple) exporters to ensure OnEnd tagging is visible."


What's well done

  • The architecture is clean: pure propagators for header injection, tagging fully delegated to TUnitTestCorrelationProcessor. Single-responsibility is clearly honored.
  • TryTag's idempotency guard (GetTagItem check) is efficient and correct.
  • SendAsync on both handlers is now synchronous (no async state machine overhead for pure header manipulation).
  • The OnStart+OnEnd dual-hook correctly handles the ASP.NET Core hosting deferred-baggage case.
  • Test suite restructuring is thorough — old span-lifecycle assertions correctly replaced with header-semantic assertions.
  • Public API snapshots are consistent across all three TFMs.

Overall this is a solid, well-reasoned simplification. The two items above (behavior change changelog note, AspireHttpSourceName confirmation) are the only things worth resolving before shipping.

…agnosticsHandler

The Aspire handler injected traceparent before SocketsHttpHandler's internal
DiagnosticsHandler got to do it, so outgoing requests carried the test body's
span ID instead of the runtime-emitted client span ID. SUT server spans then
parented to test body as siblings of the client span rather than as children,
breaking the standard OTel client/server waterfall.

The fix is to stop injecting. AspireFixture.CreateHttpClient now returns
new HttpClient(new SocketsHttpHandler { SslOptions = ... }) and lets the
runtime handle everything:

- DiagnosticsHandler creates the client Activity
- DistributedContextPropagator.Current.Inject emits traceparent+baggage
  against that client Activity's span ID (W3C)
- baggage walks the parent chain so tunit.test.id flows to the SUT
- SUT server span parents correctly under the client span

Also subscribes the test-runner's TracerProvider to System.Net.Http so the
runtime-emitted client span is actually exported — without this the span is
created but not visible on dashboards, leaving server spans with orphan
parents in cross-process traces.

Users with TUNIT_KEEP_LEGACY_PROPAGATOR=1 no longer get the W3C baggage
belt-and-braces emission that the handler did; the runtime uses whatever
propagator is configured, which is the correct respect-the-opt-out behavior.

Verified end-to-end against Jaeger: the Aspire trace-demo test now produces
a clean 11-span waterfall with proper client/server pairing at every HTTP
boundary.
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #5668: fix(telemetry): remove duplicate HTTP client spans

This is a thorough and well-motivated architectural simplification. The root cause analysis is accurate: both handlers were synthesizing spans that the runtime's DiagnosticsHandler (Aspire) and ASP.NET Core's built-in diagnostics (WAF) were already emitting, causing duplicate rows in Aspire dashboard traces. The fix is correct and proportionate.


Prior Review Comments — Status

All concerns from rounds 1–4 have been addressed:

  • Activity.Current cross-attribution risk → fixed via TraceRegistry.GetContextId(activity.TraceId.ToString())
  • CancellationToken race in polling loop → fixed via Environment.TickCount64 deadline
  • HttpActivityPropagator.Inject visibility → correctly internal static void
  • OnEnd mutation ordering → now documented in class <remarks>
  • AspireHttpSourceName release confirmation → confirmed via git log (never shipped)

Architecture — Correct and Well-Executed

The decision to remove span synthesis from both handlers is right. Delegating handlers that create their own spans were violating single-responsibility: a propagation handler should only inject headers. Now both handlers do exactly one thing.

The extraction to HttpActivityPropagator.Inject in TUnit.Core is the right level of abstraction — identical logic existed in two places, now lives in one. The #if NET gate is consistent with the rest of TUnitActivitySource. The async/await removal from both SendAsync implementations is a genuine performance win (no state machine allocation per request in the propagation path).

The TraceRegistry-anchored fallback in TryTag is the correct long-term design:

var testId = activity.GetBaggageItem(TUnitActivitySource.TagTestId)
    ?? TraceRegistry.GetContextId(activity.TraceId.ToString());

Using activity.TraceId (immutable, not ambient) eliminates cross-test attribution entirely under async concurrency. This is the right invariant.


Remaining Concerns

1. Aspire: propagation silently breaks without System.Net.Http subscriber

The old TUnitBaggagePropagationHandler always injected traceparent/baggage headers into outgoing requests from AspireFixture, regardless of whether any OTel listener was active. The new path uses bare SocketsHttpHandler, which relies on the runtime's DiagnosticsHandlerbut DiagnosticsHandler only fires when something is subscribed to System.Net.Http.

AutoStart.cs correctly adds .AddSource("System.Net.Http"), so users going through TUnit.OpenTelemetry are fine. But a user who uses TUnit.Aspire without TUnit.OpenTelemetry (or without manually subscribing to "System.Net.Http") will silently get no traceparent/baggage propagation on outgoing Aspire client requests. The old behavior was unconditional.

This is a behavioral regression for that specific (arguably niche) configuration. At minimum, a note in the migration docs or CHANGELOG would set expectations.

2. Exception annotation silently removed for WAF in-memory path

The old ActivityPropagationHandler caught exceptions from base.SendAsync and annotated the synthetic client span with activity.SetStatus(ActivityStatusCode.Error) and activity.RecordException(ex). For the real-socket/Aspire path, the runtime's System.Net.Http source covers this. For the WebApplicationFactory in-memory path, there is now no client span at all.

Transport-level errors that don't reach ASP.NET Core middleware (e.g., HttpRequestException thrown by the WAF host crashing mid-connection) will no longer appear in traces. The server span records unhandled exceptions that propagate through the middleware pipeline, but not errors below that. This is a visible behavior change for WAF-based integration test telemetry and is worth a CHANGELOG entry — the current PR description and release notes are silent on it.

3. SimpleExportProcessor ordering constraint — docs gap

The class-level <remarks> on TUnitTestCorrelationProcessor correctly documents the constraint:

SimpleExportProcessor calls OnEnd synchronously and inline, so TUnitTestCorrelationProcessor must be added before any SimpleExportProcessor-based exporter

But the custom pipeline example in opentelemetry.md doesn't reflect this. A user writing their own pipeline who places a SimpleExportProcessor first will silently lose tunit.test.id tagging. A one-line code comment or note in the docs example would prevent this footgun.


Test Quality — Improved

The new tests are strictly better than the deleted ones. Testing the header contract (what goes over the wire) rather than internal span lifecycle is the correct philosophy for a delegating handler. SendAsync_ExistingBaggageHeader_IsPreserved is particularly valuable — it pins the "pre-existing header wins" invariant directly.

The Processor_FallsBackToTraceRegistry_WhenActivityHasNoBaggage test is sound. Using activity.TraceId as the TraceRegistry key avoids cross-test collision. The TraceRegistry.Register call leaks per test, but TraceRegistry is already process-global by design, so this is acceptable.

The Environment.TickCount64 polling in AutoConfigureOpenTelemetryTests is the right fix for the TaskCanceledException race. 2-second budget is generous.


What's Done Well

  • 608-line net deletion is justified — every removed line was doing something the runtime already does
  • Single-responsibility restored to both handlers
  • async removed from SendAsync — genuine hot-path allocation win
  • Documentation updated consistently across all three doc files and all public API snapshots
  • [Obsolete] on AspNetCoreHttpSourceName with clear message is the right call for a shipped constant
  • Public API snapshots consistent across all 3 TFMs

Summary

Ship confidence is high. The architecture is correct, the concurrency hazards from prior rounds are resolved, and the test suite is cleaner than before. The three remaining items above are non-blocking but worth addressing before or alongside the release:

  1. Aspire propagation regression without System.Net.Http subscriber — CHANGELOG note or mitigation
  2. Exception annotation removal from WAF in-memory path — CHANGELOG note
  3. SimpleExportProcessor ordering — one-line addition to the custom pipeline docs example

Great work on addressing the previous review rounds thoroughly.

…type

Drop the WHAT explanation of what the runtime's DiagnosticsHandler does
in AspireFixture.CreateHttpClient — the behavior is already documented
on the method's XML summary and is self-evident from the code.

Drop the WHAT sentence at AutoStart's AddSource("System.Net.Http"),
keep only the WHY (orphan-parent server spans without it).

Narrow `_httpHandler` field from `HttpMessageHandler?` to
`SocketsHttpHandler?` — accurate to what's actually stored, makes
SslOptions/PooledConnectionLifetime accessible without casts if we
need them later.
This was referenced Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant