Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds diagnostic metrics support to MSBuild using System.Diagnostics.Metrics.Meter, enabling real-time monitoring of build operations through dotnet-counters. The implementation tracks scheduler states, node counts, and configuration assignments to help analyze build performance and scheduling behavior.
Key Changes:
- Added System.Diagnostics.Metrics-based gauges throughout the scheduler, node manager, and configuration cache to expose operational metrics
- Set
DOTNET_EnableDiagnostics=0environment variable for spawned processes to prevent diagnostic port conflicts that cause hangs - Updated bootstrap SDK version to RC2
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Shared/CommunicationsUtilities.cs | Added DOTNET_EnableDiagnostics=0 to environment variables for spawned processes |
| src/Build/BackEnd/Components/Scheduler/SchedulingData.cs | Added observable gauges for tracking request counts by state and node configurations |
| src/Build/BackEnd/Components/Scheduler/Scheduler.cs | Added observable gauges for node counts and total request count |
| src/Build/BackEnd/Components/Communications/NodeManager.cs | Added gauge metric for active node counts with improved disposal pattern |
| src/Build/BackEnd/Components/Communications/NodeLauncher.cs | Set DOTNET_EnableDiagnostics environment variable in process startup |
| src/Build/BackEnd/Components/Caching/ConfigCache.cs | Added observable gauge for configurations per project |
| eng/Versions.props | Updated bootstrap SDK version from RC1 to RC2 |
Comments suppressed due to low confidence (1)
src/Build/BackEnd/Components/Caching/ConfigCache.cs:1
- The CreateObservableGauge call creates a gauge but doesn't assign it to the
_configurationsPerProjectGaugefield. This means the gauge instance is created but not referenced, which could lead to it being garbage collected and not functioning as intended. Assign the result to the field:_configurationsPerProjectGauge = _configurationMetrics.CreateObservableGauge(...)
// Licensed to the .NET Foundation under one or more agreements.
|
Also, note that all of this data is only for the central node - the process that the user is actually communicating with. The child worker nodes do not send counters back in this way (yet?). That's mostly why the counters I've made so far are focused on scheduling - since that happens on the central node. This makes these counters mostly-useless for multiproc mode, though they could be useful for multiarch mode. |
568e4d3 to
b52ea9f
Compare
|
This pull request has been automatically closed because it has been open for more than 180 days with no recent activity. If you believe this work is still relevant, please feel free to reopen or create a new pull request. Thank you for your contribution! Note 🔒 Integrity filter blocked 44 itemsThe following items were blocked because they don't meet the GitHub integrity level.
To allow these resources, lower tools:
github:
min-integrity: approved # merged | approved | unapproved | none
|
This is a draft PR to explore exporting some System.Diagnostics.Metrics.Meter-based counters for the operational metrics of the build. The aim would be to make it easy to collect operational data cross-platform for comparisons when trying things like scheduling algorithm changes, or to quickly in real-world scenarios how the scheduler is assigning configurations to nodes, etc.
To use:
dotnet-countersto view the counters and like all framework-dependent tools it gets confused when using bootstrap/local SDK installsdotnet tool install -g dotnet-countersdotnet-counters monitor --refresh-interval 1 --counters Microsoft.Build -- dotnet build rest_of_argsYou'll get output like this:

This method makes use of the dotnet diagnostics pipe - so when spawning processes we need to make sure that other dotnet processes don't inherit the diagnostics behaviors. For this reason I injected the "disable diagnostics" env var into node creation and TaskHost environments. Without this any process spawning will hang.
Alternative usage
Instead of needing to hack a global SDK install you can spawn a bootstrapped
dotnet buildinvocation with theDOTNET_DiagnosticPorts=my_diag_port1andDOTNET_EnableDiagnostics=1env vars, and then rundotnet counterswith the--diagnostic-portoption pointing to that same port name.Counter documentation
Metrics Overview
All metrics use the
System.Diagnostics.MetricsAPI and are exposed under theMicrosoft.Buildmeter name. They can be collected using standard tools likedotnet-counters, OpenTelemetry, or Prometheus exporters.Logging Metrics
msbuild_forwarded_log_messagesType: ObservableCounter (monotonically increasing)
Unit: messages
Description: Total number of log messages forwarded from worker nodes to the central node during distributed builds.
Tags:
source_node(int): The node ID that sent the log messageUse Cases:
Scheduler Metrics
msbuild_scheduler_node_countType: ObservableGauge
Unit: nodes
Description: Current count of active nodes in the scheduler.
Tags:
node.type: Node provider type"outofproc"- Out-of-process worker nodes"inproc"- In-process nodeUse Cases:
msbuild_request_blocked_eventsType: ObservableCounter (monotonically increasing)
Unit: events
Description: Count of request blocking events, categorized by the type of blocker that caused the wait.
Tags:
blocker_type: The reason a request was blocked"yield"- Request explicitly yielded execution"results_transfer"- Blocked waiting for result transfer"in_progress_target"- Blocked waiting for an in-progress target"new_requests"- Blocked by new child requestsUse Cases:
msbuild_circular_dependency_errorsType: ObservableCounter (monotonically increasing)
Unit: errors
Description: Count of circular dependency detections during scheduling.
Tags:
error_type: The context where the circular dependency was detected"in_progress_target"- Circular dependency involving in-progress targets"new_requests"- Circular dependency in new request chainUse Cases:
msbuild_cores_in_useType: ObservableGauge
Unit: cores
Description: Number of CPU cores currently allocated to build requests via
IBuildEngine9.RequestCores.Tags: None
Use Cases:
msbuild_pending_core_requestsType: ObservableGauge
Unit: requests
Description: Number of build requests currently waiting for core allocation.
Tags: None
Use Cases:
Scheduling Data Metrics
msbuild_scheduler_request_countType: ObservableGauge
Unit: requests
Description: Current count of requests in the scheduler by state.
Tags:
request.type: The state of the requests being counted"executing"- Currently executing requests"ready"- Requests ready to execute"blocked"- Requests blocked on dependencies"yielding"- Requests that have yielded"unscheduled"- Requests not yet scheduled to nodesUse Cases:
msbuild_scheduler_node_configuration_countType: ObservableGauge
Unit: configurations
Description: Current count of build configurations assigned to each node.
Tags:
node.id(int): The node IDUse Cases:
msbuild_scheduler_build_event_countType: ObservableGauge
Unit: events
Description: Total count of build events that have occurred during the current build.
Tags: None
Use Cases:
msbuild_node_idle_timeType: ObservableCounter (monotonically increasing)
Unit: ms (milliseconds)
Description: Total time each node has spent idle (not executing requests).
Tags:
node_id(int): The node IDUse Cases:
Build Request Engine Metrics
build_request_engine_requestsType: ObservableGauge
Unit: requests
Description: Number of active build requests in the BuildRequestEngine by state.
Tags:
nodeId(int): The node ID where the engine is runningstate: The state of the request (determined by callback implementation)Use Cases:
build_request_engine_work_queue_lengthType: ObservableGauge
Unit: items
Description: Number of work items pending in the BuildRequestEngine's work queue.
Tags:
nodeId(int): The node ID where the engine is runningUse Cases:
build_request_engine_statusType: ObservableGauge
Unit: status
Description: Current status of the BuildRequestEngine as an integer enum value.
Tags:
nodeId(int): The node ID where the engine is runningUse Cases:
msbuild_configuration_resolution_durationType: Histogram
Unit: ms (milliseconds)
Description: Time taken to resolve build configurations (round-trip from configuration request to response).
Tags: None
Use Cases:
msbuild_build_request_state_transitionsType: ObservableCounter (monotonically increasing)
Unit: transitions
Description: Count of build request state transitions.
Tags:
transition: The state transition in format "FromState->ToState""Active->Waiting","Waiting->Ready","Ready->Active","Active->Complete"Use Cases:
msbuild_request_wait_timeType: Histogram
Unit: ms (milliseconds)
Description: Time requests spend in the Waiting state, categorized by the reason for blocking.
Tags:
reason: The reason the request was waiting"blocking_target"- Waiting for another target to complete"unresolved_configuration"- Waiting for configuration resolution"child_requests"- Waiting for child build requestsUse Cases:
Metric Collection Examples
Using dotnet-counters
Using OpenTelemetry
Configure OpenTelemetry to collect metrics from the
Microsoft.Buildmeter:Common Analysis Scenarios
Identifying Parallelism Issues
msbuild_node_idle_time- High idle time suggests poor work distributionmsbuild_request_blocked_events- Excessive blocking reduces parallelismmsbuild_request_wait_timehistogram - Shows where requests spend time waitingmsbuild_scheduler_request_countwithrequest.type=blocked- Track blocked request countDetecting Resource Contention
msbuild_pending_core_requests- Non-zero indicates CPU contentionmsbuild_cores_in_usevs available cores - Shows resource utilizationbuild_request_engine_work_queue_length- Backlog indicates processing bottleneckAnalyzing Build Performance
msbuild_configuration_resolution_durationpercentiles - Identify slow configsmsbuild_request_wait_timeby reason - Find most impactful wait causesmsbuild_build_request_state_transitions- Understand request flow patternsmsbuild_forwarded_log_messages- Excessive logging can slow buildsTroubleshooting Build Issues
msbuild_circular_dependency_errors- Indicates dependency graph problemsmsbuild_build_request_state_transitions- Find abnormal state patternsbuild_request_engine_statusper node - Detect failed or stuck enginesmsbuild_scheduler_node_configuration_countdistribution - Uneven suggests issuesImplementation Notes