Add MSBuild Coordinator for fair-share node allocation#13653
Add MSBuild Coordinator for fair-share node allocation#13653DustinCampbell wants to merge 52 commits into
Conversation
4a70919 to
29e2aa6
Compare
- Add MSBuild.Coodinator command-line application project (currently empty) - Ensure MSBuild.Coordinator files are copied to the MSBuild core and net472 bootstrap layouts - Update VSSetup to include MSBuild.Coordinator.exe and related files for x86, x64 and arm64.
Microsoft.Build.Framework.csproj was missing a package reference to Microsoft.IO.Redist. Without this, Microsoft.Build.Framework acquires its Microsoft.IO.Redist reference through the "UseFrozenMaintenancePackageVersions" machinery.
- Add MSBuild.Coordinator.UnitTests project configured as a test project. - Add project references to MSBuild.Coordinator and Microsoft.Build.UnitTests.Shared. - Ensure MSBuild.Coordinator.UnitTests has InternalsVisibleTo access to Microsoft.Build.Framework and MSBuild.Coordinator.
Introduce the shared protocol for communication between MSBuild and the build coordinator via named pipes. The coordinator manages a system-wide node budget to prevent parallel builds from oversubscribing resources. Protocol types (in Microsoft.Build.Framework.Coordinator): - ClientMessage/ServerMessage base records with Read/Write support - RequestNodes, ReleaseNodes, Heartbeat (client → coordinator) - NodeGrant, Wait, Error (coordinator → client) - Protocol: constants, env var names, pipe name helper Binary format: [version:byte][messageType:byte][payload...] Version validation on read with InternalErrorException on mismatch.
Implement the MSBuild coordinator server that manages a system-wide node budget to prevent parallel builds from oversubscribing resources. Server components: - BuildGrant: tracks per-build grant state (PID, requested/granted nodes, heartbeat timestamp) - NodeBudgetManager: fair-share allocation policy with wait queue, minimum grant of 1 node, and FIFO drain on release - ClientConnection: owns the named pipe stream and reader/writer for a single client alongside its BuildGrant - CoordinatorServer: named pipe listener, per-client async handler, heartbeat monitor with PID-based dead process detection, and auto-shutdown after configurable idle timeout - Program: entry point with named mutex for single-instance enforcement, environment variable configuration Tests: - NodeBudgetManager unit tests covering grant, queue, fair-share, release, drain, starvation prevention, and accounting consistency - CoordinatorServer integration tests over real named pipes covering single/multiple clients, fair-share with multiple waiters, client disconnect, waiting client disconnect, concurrent connections, heartbeat, invalid protocol, auto-shutdown, and PID reconnection
Introduce CoordinatorClient, which connects to the coordinator server over a named pipe, requests a node grant, maintains heartbeats, and releases the grant on dispose. If no coordinator is running, it attempts to launch one from the adjacent MSBuild.Coordinator binary. Wire the client into BuildManager.BeginBuild/EndBuild: when the MSBUILDUSECOORDINATOR environment variable is set, the client requests a grant before node infrastructure is initialized and caps MaxNodeCount to the granted value. The grant is released in EndBuild's finally block. Add a TryConnectToServer internal overload that accepts a pipe name and process ID for testing without launching a real coordinator. Add InternalsVisibleTo for MSBuild.Coordinator.UnitTests and a project reference to Microsoft.Build from the test project. Add CoordinatorClient_Tests covering grant negotiation, budget capping, no-server fallback, dispose/release flow, deferred grant after wait, double-dispose safety, and sequential fair-share allocation.
Introduce ICoordinatorLogger with an interpolated string handler to
avoid allocations when tracing is disabled. Both sides are gated on
MSBUILDDEBUGCOMM (Traits.Instance.DebugNodeCommunication).
Client-side: CoordinatorClient.DefaultLogger delegates to
CommunicationsUtilities.Trace so coordinator diagnostics are
interleaved with existing node communication traces in
MSBuild_CommTrace_PID_{pid}.txt.
Server-side: CoordinatorServer.DefaultLogger writes to a dedicated
MSBuild_CoordinatorTrace_PID_{pid}.txt since the coordinator process
only references Microsoft.Build.Framework.
Heartbeat messages are intentionally not traced to avoid flooding
the log files.
- Move src/Shared/NamedPipeUtil.cs to src/Framework/NamedPipeUtil.cs - Move NamedPipeUtil to the Microsoft.Build.BackEnd namespace - Update Protocol.GetPipeName() to use NamedPipUtil.
Add CoordinatorIntegration_Tests with two end-to-end tests that launch real bootstrap MSBuild processes with the coordinator enabled: - ParallelBuilds_BothSucceedWithCoordinator: runs two parallel builds each requesting 8 nodes against a budget of 4, verifying both succeed and the coordinator doesn't break anything. - SingleBuild_CoordinatorCapsMaxNodeCount: runs a single build requesting 16 nodes against a budget of 2, verifying $(MSBuildNodeCount) reports the capped value. Tests use MSBUILDCOORDINATORPIPENAME to isolate each run with a unique pipe name, preventing interference with other coordinator instances. Add PipeNameEnvironmentVariable to Protocol and update GetPipeName() to honor the override, routing through NamedPipeUtil.GetPlatformSpecificPipeName for Unix-safe pipe paths.
Route pipe names in CoordinatorServer_Tests and CoordinatorClient_Tests through NamedPipeUtil.GetPlatformSpecificPipeName so that tests use /tmp-based paths on Unix, avoiding Unix Domain Socket path length limits.
Move wire format logic into ClientMessage and ServerMessage base records using the Template Method pattern: non-virtual WriteTo handles the version + message type prefix and flush, while virtual WritePayload provides the extension point for subclass data. Add ReadFrom static factory methods on each base for deserialization, and introduce BinaryReader/BinaryWriter extension methods in Extensions.cs (ReadClientMessage, ReadServerMessage, Write) so call sites read naturally as reader.ReadClientMessage() and writer.Write(message). Update CoordinatorServer, CoordinatorClient, and all tests to use the new extension methods.
Replace direct InternalErrorException throws in ClientMessage.ReadFrom and ServerMessage.ReadFrom with FrameworkErrorUtilities.ThrowInternalError, consistent with how internal errors are raised elsewhere in Framework.
Create a unified CoordinatorSettings type in src/Framework/Coordinator that centralizes all coordinator client and server configuration, replacing scattered environment variable reads and hardcoded defaults throughout the codebase.
Changes:
1. Add CoordinatorSettings sealed record class (src/Framework/Coordinator/CoordinatorSettings.cs)
- Provides single source of truth for heartbeat interval, node budget, pipe name, shutdown timeout, connection timeout, and process id
- Exposes CoordinatorSettings.Default singleton with standard defaults
- Implements FromEnvironment() factory for environment-variable-based configuration
- Uses init-only properties with lazy evaluation and normalization
- Enables ergonomic test customization via "Default with { ... }" syntax
2. Consolidate defaults from Protocol to CoordinatorSettings
- Move PipeNameBase, DefaultHeartbeatIntervalMs, DefaultMissedHeartbeatsThreshold
- Keep Protocol focused on wire format (Version) and env var names only
- Cleaner separation: Protocol = wire contract, CoordinatorSettings = runtime config
3. Refactor coordinator client and server to consume CoordinatorSettings
- CoordinatorClient.TryConnect() accepts optional CoordinatorSettings for defaults
- CoordinatorClient.TryConnectToServer() simplified to settings-driven signature
- CoordinatorServer primary constructor now takes CoordinatorSettings
- Remove duplicated GetEnvironmentInt() parsing logic
- All connection parameters (pipe name, heartbeat, process id, timeout) flow through settings instead of separate arguments
4. Update coordinator program entrypoint (Program.cs)
- Calls CoordinatorSettings.FromEnvironment() once at startup
- Passes settings object to CoordinatorServer constructor
- Eliminates repeated env var parsing
5. Improve test infrastructure
- Migrate all coordinator unit test setup to "Default with { ... }" pattern
- Add CreateServerSettings() and CreateClientSettings() helpers to reduce duplication
- Ensure tests can construct custom CoordinatorSettings to avoid environment reads
- Split Protocol_Tests into CoordinatorSettings_Tests and Protocol_Tests classes
- Add explicit coverage for environment-override behavior and custom settings
- New test: TryConnect_CustomSettings_UsesSettingsPipeName()
Benefits:
- Single configuration entry point makes coordinator setup self-documenting
- Tests can easily override individual settings without env var pollution
- Easier to add new coordinator options (just add a property to CoordinatorSettings)
- Cleaner separation of concerns between protocol and runtime configuration
- All 48 coordinator unit tests pass with no regressions
Fixes unobserved exception warnings when HandleClientAsync tasks log after test teardown. Track spawned client tasks and wait for completion before RunAsync exits. Use continuation to remove completed tasks, preventing unbounded growth.
Prevent SendHeartbeat timer callback from running after Dispose() has freed resources. The race condition occurred because: - Timer.Dispose() only stops scheduling new callbacks; it doesn't wait for in-flight ones - SendHeartbeat checks _disposed flag, sees false, and starts to write to _writer - Meanwhile Dispose() sets _disposed and disposes _writer, _reader, _pipeStream - SendHeartbeat then crashes trying to use disposed resources The fix: - Make _disposed volatile to ensure thread visibility - Use Timer.Dispose(WaitHandle) overload, which blocks until any running callback completes - This guarantees: once _disposed is true and timer disposal returns, no callback can be executing - Safe to dispose resources without broad locking This is a targeted synchronization fix that relies on Timer's built-in coordination rather than adding lock contention to the hot path (SendHeartbeat runs every N milliseconds).
- Use ArgumentOutOfRangeException.ThrowIfNegativeOrZero(...) - Use RefArrayBuilder<BuildGrant> to avoid allocating an ImmutableArray<BuildGrant>.Builder.
Fix a race condition where a client reconnecting with the same PID could cause ReleaseConnection() to remove the wrong connection from the _connectionsByProcessId dictionary. Changes: - Add ReaderWriterLockSlim _connectionLock for coordinating connection access - Replace ConcurrentDictionary with Dictionary<int, ClientConnection> since all access is now protected by explicit locking - Add identity verification in ReleaseConnection() before removal to prevent removal of new connections with the same PID - Use write lock when storing new connections in HandleClientAsync() - Use read lock for connection lookups in ReleaseConnection() and CheckHeartbeats() - Refactor CheckHeartbeats() to capture a snapshot of connections upfront, eliminating enumerator invalidation risk and simplifying lock management - Use DisposableReadLock and DisposableWriteLock helpers for cleaner code The read-heavy/write-infrequent access pattern (frequent heartbeat checks, infrequent connect/disconnect) makes ReaderWriterLockSlim more efficient than a simple lock. Snapshot-based heartbeat checking avoids iterator complications and ensures clean separation of concerns.
Keep Protocol focused on wire-format concerns by removing coordinator environment variable name constants from Protocol and centralizing them in Traits, where environment toggles are already defined.
Introduce string polyfill helpers in src/Framework/Polyfills/StringExtensions.cs, including FastAllocateString and non-NET Create support for span-based initialization. Update src/Framework/EscapingUtilities.cs to use string.FastAllocateString in the non-NET encoding path. This keeps the hot path allocation strategy consistent across targets without depending on newer runtime-only APIs.
Coordinator mutex naming previously combined a Windows Global\\ prefix with a Unix pipe path, yielding invalid names like Global\\/tmp/... on Unix. Keep Global\\ naming on Windows and generate a stable SHA-256-based identifier for Unix to avoid path characters in mutex names. This prevents coordinator process startup failures and allows node-budget coordination behavior to apply on Unix builds.
29e2aa6 to
204371e
Compare
rainersigwald
left a comment
There was a problem hiding this comment.
Doc looks good now, thanks.
|
@JanProvaznik -- Feel free to reach out over Teams if you'd like to discuss anything in detail. |
|
Thanks a bunch for this @DustinCampbell! I think we're onto something here. 🚀 Some things that would be great to align around:
Btw. for experimentation it's great how self-encapsulated the implementation is here :-) |
Yeah, this is currently undone, but I agree that it's really important. I'll go ahead and log relevant messages to the terminal output/logger for the initial release. Edit: Logging messages added. |
Move coordinator connection in BuildManager.BeginBuild to after logging service initialization so that important messages are visible in the terminal logger. Log "waiting for resources" at High importance when the build is blocked, and "granted N nodes" at Normal importance on success. Rename ICoordinatorLogger to ICoordinatorOutput (and related types) to distinguish the debug trace output from MSBuild's ILoggingService.
- Extract TryConnectToServer into CoordinatorClient.TestAccessor partial class and update test call site - Add XML doc comments (<param>, <returns>) to CoordinatorClient, CoordinatorServer, NodeBudgetManager, BuildGrant, and ClientConnection - Reorder members in CoordinatorClient and CoordinatorServer to follow convention: fields → constructor → Dispose → public API → private methods in call-order
For sure! I think the important thing to get right is the tuning.
I'm definitely interested in design discussion or feedback around some of those issues.
Theoretically, I expect the coordinator to "just work" with server, multi-threaded, and static build graph with small adjustments. The coordinator sits outside of all builds -- server, multi-threaded, or otherwise -- and adjusts a build's "MaxNodeCount" parameter based on the resources available (i.e. processor cores). If no resources are available (because other builds have claimed them), that build waits until the coordinator is able to grant it resources. You can imagine coordinator as a sort of crude "global throttle" on build node counts. |
When multiple builds race to start the coordinator, each would launch a separate process. The server-side instance mutex prevents dual service, but the losing client could fail its retry if timing is unlucky. Add a client-side launch mutex (distinct from the server instance mutex) so only one client launches the coordinator while others wait, then connect to the now-running instance. - Extract TryLaunchAndConnect with mutex acquire → retry → launch → retry - Move mutex name generation into CoordinatorSettings (ServerMutexName, LaunchMutexName) using SHA256 hashing for Unix compatibility - Simplify Program.cs to use settings.ServerMutexName - Add CoordinatorLaunchTimedOut and CoordinatorFailedToConnect messages so the user knows when coordinator connection fails completely
jankratochvilcz
left a comment
There was a problem hiding this comment.
(in progress review) I didn't have enough time today to review the whole thing yet, will hopefully finish tomorrow. Checking in comments so far.
| // Send heartbeats while waiting so the server doesn't consider us stale. | ||
| using (Timer heartbeatPump = CreateHeartbeatPump(writer, settings.HeartbeatIntervalMs)) | ||
| { | ||
| ServerMessage grantAfterWait = reader.ReadServerMessage(); |
There was a problem hiding this comment.
This is the core blocking call that waits for the server to give the go-ahead, right? Can you please share what is the mechanism to make sure that the client is not stuck indefinitely when the server goes dark (but doesn't necessarily crash)?
There was a problem hiding this comment.
Yes, you're right that this is the core blocking call, but there isn't a mechanism to fail gracefully. If the server gets into some sort of infinite loop and never closes the pipe or crashes, the client will never move on. Although, that's by design. If the server's stuck, there isn't really much the client can do. A timeout isn't appropriate because the client could realistically be waiting for minutes if there a long-running builds ahead of it.
There was a problem hiding this comment.
This is a tough one - let's skip for now since you added the logging messages which should give some indication that the build is blocked on the coordinator. I think for a wider release we need to re-visit this one tho' as it could make the UX frustrating in some cases. Feel free to close the comment for now!
- Add success message when coordinator is launched and connection succeeds
- Make failure message more specific ("Failed to connect to newly launched coordinator")
- Remove redundant "Retrying connection after launch" message
- Move launch mutex comment to TryLaunchAndConnect where the mutex lives
Start a Stopwatch when a WaitMessage is received and include the elapsed time in diagnostic output when the deferred grant (or an unexpected response) arrives.
Add CoordinatorWaitDurationMs to BuildTelemetry to track how long builds spend waiting for a deferred node grant from the coordinator. The value is set from CoordinatorClient.WaitDuration after connecting and emitted with the rest of the per-build telemetry.
jankratochvilcz
left a comment
There was a problem hiding this comment.
Thanks for this @DustinCampbell, excited to merge this! I left some more feedback. The main part for me is to have enough telemetry to assess how this performs in the wild. Once we have that sorted happy to approve. @AR-May will be a second required reviewer to help with any de-risking and also onboard her into this workstream, we discussed it today with her. Will open a DM to discuss the collaboration model for this and future PRs.
For the fair distribution algorithm I'm quite curious how this also performs when working with it longer-term, but happy to suspend disbelief on that bit; it's not blocking :-)
| /// <summary> | ||
| /// Name of environment variable that overrides coordinator pipe name. | ||
| /// </summary> | ||
| public const string CoordinatorPipeNameEnvVarName = "MSBUILDCOORDINATORPIPENAME"; |
There was a problem hiding this comment.
What is a scenario you have in mind where you'd leverage this configurability?
| init => _missedHeartbeatsThreshold = value > 0 ? value : DefaultMissedHeartbeatsThreshold; | ||
| } | ||
|
|
||
| public int TotalNodeBudget |
There was a problem hiding this comment.
Imagining this rolling out - it seems like the default number here is something that we may want to play with based on the perf characteristics we see. I'm wondering if it makes sense to consider expressing this value as a multiple of current machine processor count (does not need to be in this first cut)? If we e.g., see that utilization numbers are insufficient, it seems like we could converge to e.g., 1.5x processor count defaults and if we have a relative number here we could fully encapsulate that default change without needing additional work from our callers like the SDK.
| init => _missedHeartbeatsThreshold = value > 0 ? value : DefaultMissedHeartbeatsThreshold; | ||
| } | ||
|
|
||
| public int TotalNodeBudget |
There was a problem hiding this comment.
Just trying to understand - is it fair to say that in the vast majority of SDK usage cases, MSBuild today requests node count == processor count and since that's also the default value here, in practical terms we are serializing builds, right? I'm trying to figure out what are the scenarios we will see where we have parallel builds on unequal node counts - this could give us some super interesting insights.
| // Fair share: divide available budget among this build and all waiting builds. | ||
| // This ensures a new arrival doesn't consume everything while others wait. | ||
| int contenders = _waitQueue.Count + 1; // +1 for the new arrival | ||
| int fairShare = Math.Max(1, available / contenders); |
There was a problem hiding this comment.
Ah, this is interesting. This means that we're not actually aiming to serialize here and instead share cores across parallel builds.
I'm a bit surprised about this one - my intuition would be that it's preferable to move builds serially at a quick pace compared to moving them in parallel at a slow pace. If I imagine e.g., the messy workflow of multiple agents without work-trees that are iterating, what could matter is that the first agent gets to the failure quickly to fix it instead everyone waiting for a reported failed build, especially if it's in a shared dir (I know for worktrees it's different). From a human point of view, I could also personally see preferring the serialized builds as they can feel more deterministic (i.e., I can see the position in queue and how it progresses) and I can adopt my workflow to it better compared to e.g., getting my builds meaningfully faster or slower because I'm competing with e.g., an agent in a worktree for cores.
What is your take on this?
Btw. I think this is a bit theoretical so we need to see the real-world usage pattern, but good to understand the intention here.
| /// Creates a new budget manager with the specified total node capacity. | ||
| /// </summary> | ||
| /// <param name="totalBudget">The maximum number of nodes available across all builds.</param> | ||
| public NodeBudgetManager(int totalBudget) |
There was a problem hiding this comment.
Can we produce telemetry here? I think here emitting the queue depth + processor allocations requested vs. granted could also give us quite good insights into what is the actual usage pattern like in the wild. Based on the separate executable design, I don't see the wiring that would enable us to trace the server-only signals; is it worth adding?
| // Send heartbeats while waiting so the server doesn't consider us stale. | ||
| using (Timer heartbeatPump = CreateHeartbeatPump(writer, settings.HeartbeatIntervalMs)) | ||
| { | ||
| ServerMessage grantAfterWait = reader.ReadServerMessage(); |
There was a problem hiding this comment.
This is a tough one - let's skip for now since you added the logging messages which should give some indication that the build is blocked on the coordinator. I think for a wider release we need to re-visit this one tho' as it could make the UX frustrating in some cases. Feel free to close the comment for now!
Overview
This PR introduces the MSBuild Coordinator, a resource management system that enforces fair-share allocation of build nodes across multiple simultaneous MSBuild processes.
What This Does
Changes
MSBuild.Coordinatorserver executable for resource managementCoordinatorClientintegration inBuildManagerDraft: This PR is in draft status while implementation and testing continue.