Skip to content

Improve cross-platform node discovery for reuse with NodeMode filtering#13256

Merged
baronfel merged 34 commits intomainfrom
copilot/improve-node-discovery-for-msbuild
Feb 25, 2026
Merged

Improve cross-platform node discovery for reuse with NodeMode filtering#13256
baronfel merged 34 commits intomainfrom
copilot/improve-node-discovery-for-msbuild

Conversation

Copy link
Contributor

Copilot AI commented Feb 12, 2026

Summary

Improves MSBuild node discovery to enable better node reuse by filtering candidate processes by NodeMode type. This ensures worker nodes only reuse worker nodes and task host nodes only reuse task host nodes, preventing incompatible node reuse.

Customer Impact

Positive Impact:

  • More efficient node reuse through NodeMode filtering (behind ChangeWaves.Wave18_6 feature flag)
  • Cross-platform support for discovering dotnet-hosted MSBuild processes on Windows, Linux, macOS, and BSD
  • Better build performance through accurate node type matching
  • Comprehensive diagnostic trace logging for troubleshooting node discovery issues
  • Optimized memory usage with ArrayPool in macOS/BSD command line retrieval

Opt-out Mechanism:

  • Feature is gated behind ChangeWaves.Wave18_6, allowing customers to disable via MSBUILDDISABLEFEATURESFROMVERSION=18.6 if issues arise

Regression?

  • Yes
  • No

This is new functionality for improved node discovery. The feature is opt-in via change wave and includes defensive error handling:

  • All WMI COM interop and P/Invoke calls wrapped in try/catch with logging
  • Falls back gracefully to existing behavior if command line retrieval fails
  • BSD and other Unix variants supported via sysctl (same code path as macOS)
  • TOCTOU issues fixed in file reading
  • Simplified HasExited check prevents race conditions
  • Comprehensive trace logging at every filtering decision point for debugging

Testing

Unit Tests:

  • Cross-platform ProcessExtensions.TryGetCommandLine() tests covering Windows, Linux, and macOS
  • BSD uses the same sysctl code path as macOS
  • Tests validate command line retrieval for running processes on all platforms
  • Tests verify graceful handling when command line unavailable
  • KillTree test validates process tree termination functionality
  • All tests pass on Linux

Code Quality:

  • Regex source generator pattern for .NET 10+ (optimal performance)
  • Static compiled regex for .NET Framework (good performance)
  • Regex pattern extracted to constant following EditorConfigFile.cs pattern
  • Uses ValueSpan on .NET to avoid string allocation when parsing NodeMode from command lines
  • Optimized macOS/BSD implementation:
    • Uses ArrayPool<byte>.Shared for buffer management (no native allocations)
    • Uses ArrayPool<char>.Shared for UTF-8 decoding (avoids intermediate string allocations)
    • Span-based slicing with MemoryMarshal.Read<int> for reading argc
    • Converts null bytes to spaces in decoded chars for safety
    • Proper try/finally blocks ensure ArrayPool buffers are returned
    • Links to Environment.ProcessPath / Environment.GetCommandLineArgs()[0] discrepancies runtime#101837 documenting missing runtime functionality
  • Nullable reference types enabled in all new code
  • Comprehensive diagnostic logging:
    • Logs when filtering begins with candidate count
    • Logs each process skipped (unable to get command line, not hosting MSBuild.dll, NodeMode mismatch)
    • Logs each process included with matching NodeMode
    • Logs final filtered count
    • Logs command line retrieval failures with exception details
  • Used MacOSOnlyFactAttribute from Microsoft.DotNet.XUnitExtensions

Implementation Details:

Process command line retrieval (ProcessExtensions.cs)

  • Windows: WMI via COM interop for both .NET Framework and .NET Core (robust, well-tested approach)
  • Linux: Parse /proc/{pid}/cmdline with UTF-8 encoding, fixed TOCTOU issue
  • macOS/BSD: sysctl with (CTL_KERN, KERN_PROCARGS2, pid), uses ArrayPool for efficient memory management, properly handles padding nulls between executable path and arguments
  • Returns false from TryGetCommandLine to allow fallback behavior on any failure

Enhanced node discovery (NodeProviderOutOfProcBase.cs)

  • Searches only for processes matching expectedProcessName (maintains framework vs core separation)
  • When expectedProcessName is dotnet, filters by MSBuild.dll presence in command line
  • Extracts and validates /nodemode:<value> parameter using NodeModeHelper.ExtractFromCommandLine() (consistent with DebugUtils)
  • Filters candidates by NodeMode when expectedNodeMode is provided
  • Uses Constants.DotnetProcessName for cross-platform dotnet detection
  • Excludes processes with unparseable command lines
  • Comprehensive trace logging with CommunicationsUtilities.Trace() at every decision point for debugging

NodeMode helpers (NodeMode.cs)

  • ToCommandLineArgument(): Format as /nodemode:{value}
  • TryParse(): Parse integer or enum name (case-insensitive), supports both string and ReadOnlySpan<char> overloads
  • ExtractFromCommandLine(): Extract NodeMode from command line using regex (with source generator for .NET 10+, static compiled regex for .NET Framework)
  • Regex pattern extracted to constant for consistency between source generator and runtime construction

Risk

  • Low
  • Medium
  • High

Risk Mitigation:

  • Feature gated behind ChangeWaves.Wave18_6 - customers can opt out via environment variable
  • Comprehensive error handling prevents crashes from COM interop and P/Invoke edge cases
  • Falls back gracefully to existing behavior if new logic fails
  • BSD and other Unix variants supported
  • Defensive coding: simplified null checks, TOCTOU fixes, proper resource handling with ArrayPool
  • All COM interop and P/Invoke calls isolated and wrapped in try/catch blocks
  • Regex uses source generator for .NET 10+ (compile-time validation)
  • Performance optimizations: uses ValueSpan on .NET to avoid string allocation during parsing, ArrayPool for buffer management
  • Extensive trace logging enables quick diagnosis of any filtering issues in production
Original prompt

Implement cross-platform MSBuild node discovery improvements for node reuse.

Background:
MSBuild currently discovers possible reusable nodes by deriving an expected process name and calling Process.GetProcessesByName(expectedProcessName). This can miss nodes launched under dotnet (dotnet exec MSBuild.dll) and cannot distinguish node types.

Requested changes:

  1. Expand discovery so candidate processes include:

    • msbuild.exe processes
    • dotnet processes that are hosting MSBuild.dll (either dotnet <path-to-MSBuild.dll> ... or dotnet exec <path-to-MSBuild.dll> ...).
  2. For all candidates, retrieve and parse their full command lines and categorize by node type using the /nodemode:<value> CLI parameter.

    • Use the parsed NodeMode to determine eligibility for reuse (not just diagnostics).
    • Exclude processes when command line cannot be retrieved or /nodemode cannot be parsed to a known NodeMode.
  3. The implementation MUST be cross-platform from the beginning:

    • Windows: retrieve command line via OS mechanisms (e.g., WMI / Win32_Process) or other reliable technique.
    • Linux: retrieve command line via /proc//cmdline.
    • macOS: retrieve command line via an appropriate native API (sysctl/proc APIs) with interop.
    • If platform retrieval fails for any reason, exclude the process.
  4. Ensure each node provider only reuses nodes of the appropriate mode:

    • Worker node provider should reuse only NodeMode.OutOfProcNode.
    • Task host provider should reuse only NodeMode.OutOfProcTaskHostNode.
    • Do not allow server or RAR nodes to be reused as worker nodes.
  5. Add/update unit tests as appropriate:

    • Test command line parsing for nodemode.
    • Test dotnet MSBuild.dll command line detection for both dotnet MSBuild.dll and dotnet exec MSBuild.dll forms.
    • Where OS command-line retrieval cannot be reliably tested, structure code so parsing/detection logic is testable without requiring real processes.
  6. Keep behavior safe:

    • When command line retrieval fails, exclude the process from reuse candidates.

Key locations discovered:

  • Node reuse process discovery is in src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcBase.cs, method GetPossibleRunningNodes.
  • NodeMode and parsing helper exist in src/Framework/NodeMode.cs (NodeModeHelper.TryParse).
  • There is existing regex scan example in src/Shared/Debugging/DebugUtils.cs for /nodemode:.

Deliverables:

  • Code changes implementing cross-platform command line retrieval and enhanced filtering.
  • Updated node discovery and reuse eligibility logic.
  • Tests.

Notes:

  • Exclude processes if command line retrieval fails.
  • Handle dotnet invocation forms dotnet <msbuild.dll> and dotnet exec <msbuild.dll>.

This pull request was created from Copilot chat.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI changed the title [WIP] Improve cross-platform MSBuild node discovery for reuse Improve cross-platform node discovery for reuse with NodeMode filtering Feb 12, 2026
Copilot AI requested a review from baronfel February 12, 2026 22:53
@baronfel baronfel marked this pull request as ready for review February 13, 2026 17:56
Copilot AI review requested due to automatic review settings February 13, 2026 17:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements cross-platform MSBuild node discovery improvements to enable better node reuse. The key change is extending node discovery beyond just Process.GetProcessesByName() to also inspect dotnet processes hosting MSBuild.dll and filter candidates by NodeMode type, ensuring worker nodes only reuse worker nodes and task host nodes only reuse task host nodes.

Changes:

  • Added cross-platform ProcessExtensions.GetCommandLine() to retrieve process command lines via WMI (Windows/.NET Framework), native Windows APIs (Windows/.NET Core+), /proc filesystem (Linux), and sysctl (macOS)
  • Enhanced GetPossibleRunningNodes() to discover both msbuild.exe and dotnet processes, filter by MSBuild.dll presence, and validate/filter by NodeMode using regex extraction
  • Added NodeMode helper methods ToCommandLineArgument() and TryParse() for consistent command line formatting and parsing

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
src/Shared/ProcessExtensions.cs Implements cross-platform command line retrieval with platform-specific implementations for Windows (WMI + native APIs), Linux (/proc), and macOS (sysctl)
src/Utilities/Microsoft.Build.Utilities.csproj Adds System.Management reference for .NET Framework to support WMI-based command line retrieval
src/Build/Microsoft.Build.csproj Adds System.Management reference for .NET Framework in Build project
src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcBase.cs Enhances node discovery to search both msbuild.exe and dotnet processes, extracts NodeMode via regex, and filters candidates by matching NodeMode values
src/Framework.UnitTests/NodeMode_Tests.cs Adds comprehensive unit tests for NodeMode command line formatting and parsing (integers, enum names, invalid values)
src/Utilities.UnitTests/ProcessExtensions_Tests.cs Adds tests for GetCommandLine() including null process, running process, exited process, and Unix-specific scenarios

Co-authored-by: baronfel <573979+baronfel@users.noreply.github.com>
auto-merge was automatically disabled February 23, 2026 15:04

Head branch was pushed to by a user without write access

Copy link
Contributor Author

Copilot AI commented Feb 23, 2026

@copilot the windows tests are failing to discover command lines consistently - please iterate on that and see if you can find the root cause.

I can see that WMI COM interop was recently added for Windows command line retrieval (commit e29eb81). WMI can be unreliable in test environments due to COM initialization issues, security contexts, and concurrency.

Could you provide more details about the test failures? Specifically:

  1. Is it the TryGetCommandLine_RunningProcess_ContainsExpectedExecutable or TryGetCommandLine_RunningProcess_ContainsArguments test that's failing?
  2. What's the error message? (e.g., timeout, COM error, null command line, assertion failure)
  3. How often does it fail? (every time, ~50%, rare)

The tests pass consistently on Linux, so this is Windows-specific. I suspect the WMI COM approach may need additional error handling or we might need to consider an alternative approach.

…n usage

Co-authored-by: baronfel <573979+baronfel@users.noreply.github.com>
Copilot AI requested a review from baronfel February 23, 2026 19:14
@baronfel baronfel dismissed their stale review February 23, 2026 21:42

Changes were made as requested.

Co-authored-by: baronfel <573979+baronfel@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: Engine Issues impacting the core execution of targets and tasks. User Experience

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants