Skip to content

Fix thread safety issues in WorkerNodeTelemetryData#13413

Merged
AR-May merged 3 commits intodotnet:mainfrom
AR-May:make-telemetry-thread-safe
Mar 20, 2026
Merged

Fix thread safety issues in WorkerNodeTelemetryData#13413
AR-May merged 3 commits intodotnet:mainfrom
AR-May:make-telemetry-thread-safe

Conversation

@AR-May
Copy link
Copy Markdown
Member

@AR-May AR-May commented Mar 18, 2026

Related to #12867

Context

Fixing two bugs in telemetry infrastructure when using /m /mt (in-proc multithreaded) mode:

Thread-safety crash: all in-proc nodes share a single TelemetryForwarderProvider singleton. Multiple RequestBuilder instances run on dedicated threads and call AddTask/AddTarget concurrently on the same WorkerNodeTelemetryData dictionary fields, causing race conditions and dictionary corruption.

Nx telemetry duplication: In /m /mt mode, N BuildRequestEngine instances share one TelemetryForwarderProvider singleton. Each engine calls FinalizeProcessing on shutdown, sending the entire accumulated data each time. The InternalTelemetryConsumingLogger merges all N copies, inflating every counter N times.

Reproduction: 20+ non-SDK .NET Framework library projects + 1 exe referencing all of them, built with MSBuild.exe Repro.sln /m /mt.

Changes Made

Fix

Batch-then-merge in RequestBuilder: Each RequestBuilder now accumulates task/target telemetry into a local WorkerNodeTelemetryData instance (zero contention), then merges once into the shared state via elemetryForwarder.MergeWorkerData().

Thread-safe TelemetryForwarder: Added an internal lock protecting both MergeWorkerData and FinalizeProcessing. The forwarder is a singleton shared across BuildRequestEngine instances in /m /mt mode, so concurrent access is expected.

Swap-and-send in FinalizeProcessing: Instead of sending the same accumulated data on every call, FinalizeProcessing atomically swaps the internal data with a fresh empty instance under the lock, then sends only if non-empty. This ensures:

  • First engine to finalize sends all data accumulated so far
  • Subsequent engines find empty data and skip sending (no duplication)
  • Late merges from other engines go into the new instance and are sent by the next FinalizeProcessing call (no
    data loss)

Testing

Locally tested that the issue is gone on a repro project.
Unit tests

Copilot AI review requested due to automatic review settings March 18, 2026 17:28
@AR-May AR-May marked this pull request as draft March 18, 2026 17:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a concurrency bug in MSBuild’s internal worker-node telemetry aggregation when running in in-proc multi-threaded mode (/m /mt), where multiple RequestBuilder instances can concurrently mutate shared telemetry dictionaries and corrupt their state.

Changes:

  • Switch telemetry reporting in RequestBuilder.UpdateStatisticsPostBuild from per-target/per-task updates to batching into a local WorkerNodeTelemetryData and merging once.
  • Replace ITelemetryForwarder.AddTask/AddTarget with a single MergeWorkerData API and update the provider implementations accordingly.
  • Add explicit locking around aggregation in the internal telemetry-consuming logger.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/Framework/Telemetry/WorkerNodeTelemetryData.cs Adds method documentation and minor refactor while preserving merge/aggregation behavior.
src/Build/TelemetryInfra/TelemetryForwarderProvider.cs Replaces per-item update APIs with MergeWorkerData and exposes key creation helper for batching.
src/Build/TelemetryInfra/InternalTelemetryConsumingLogger.cs Adds a lock around worker telemetry aggregation.
src/Build/TelemetryInfra/ITelemetryForwarder.cs Updates forwarder contract to support batched merging.
src/Build/BackEnd/Components/RequestBuilder/RequestBuilder.cs Implements local accumulation + single merge under a lock to prevent concurrent dictionary writes.

@AR-May AR-May force-pushed the make-telemetry-thread-safe branch 2 times, most recently from f2e0591 to 3e1fd80 Compare March 19, 2026 13:25
@AR-May AR-May force-pushed the make-telemetry-thread-safe branch from 3e1fd80 to 1a19fa3 Compare March 19, 2026 13:30
@AR-May AR-May marked this pull request as ready for review March 19, 2026 13:32
@AR-May
Copy link
Copy Markdown
Member Author

AR-May commented Mar 19, 2026

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@AR-May AR-May requested a review from Copilot March 19, 2026 15:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes telemetry thread-safety and counter inflation in in-proc multithreaded (/m /mt) builds by batching telemetry per RequestBuilder, merging into a shared forwarder under a lock, and preventing repeated “send the whole buffer” behavior during engine shutdown.

Changes:

  • Accumulate task/target telemetry in a per-RequestBuilder WorkerNodeTelemetryData and merge once into the shared forwarder.
  • Make TelemetryForwarder thread-safe and change finalization to “swap-and-send” to avoid Nx duplication across multiple BuildRequestEngine finalizers.
  • Add unit tests for WorkerNodeTelemetryData.IsEmpty and forwarder reset behavior after FinalizeProcessing.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/Framework/Telemetry/WorkerNodeTelemetryData.cs Adds IsEmpty and improves merge/add method clarity used by the forwarder swap-and-send logic.
src/Framework/Telemetry/TaskOrTargetTelemetryKey.cs Introduces a helper factory (Create) to centralize key construction used by RequestBuilder.
src/Build/TelemetryInfra/TelemetryForwarderProvider.cs Adds locking, batch merge entrypoint, and swap-and-send finalization to prevent races and duplicate sends.
src/Build/TelemetryInfra/ITelemetryForwarder.cs Replaces per-item APIs with a batch merge API (MergeWorkerData).
src/Build/BackEnd/Components/RequestBuilder/RequestBuilder.cs Switches to batch-then-merge telemetry collection to remove dictionary contention.
src/Build.UnitTests/Telemetry/Telemetry_Tests.cs Adds tests for the new reset/empty behavior and forwarder finalization semantics.

@AR-May AR-May enabled auto-merge (squash) March 20, 2026 13:48
@AR-May AR-May merged commit dfe5370 into dotnet:main Mar 20, 2026
10 checks passed
@AR-May AR-May self-assigned this Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants