Skip to content

Add MSBuild Coordinator for fair-share node allocation#13653

Open
DustinCampbell wants to merge 52 commits into
dotnet:mainfrom
DustinCampbell:build-coordinator
Open

Add MSBuild Coordinator for fair-share node allocation#13653
DustinCampbell wants to merge 52 commits into
dotnet:mainfrom
DustinCampbell:build-coordinator

Conversation

@DustinCampbell
Copy link
Copy Markdown
Member

Overview

This PR introduces the MSBuild Coordinator, a resource management system that enforces fair-share allocation of build nodes across multiple simultaneous MSBuild processes.

What This Does

  • Global node budget enforcement (defaults to processor count)
  • Fair-share distribution of available nodes among competing builds
  • Health monitoring via periodic heartbeats
  • Named pipe-based IPC for cross-platform support

Changes

  • New MSBuild.Coordinator server executable for resource management
  • CoordinatorClient integration in BuildManager
  • Protocol layer and supporting infrastructure in Framework
  • Comprehensive unit and integration tests
  • Architecture documentation

Draft: This PR is in draft status while implementation and testing continue.

- Add MSBuild.Coodinator command-line application project (currently empty)
- Ensure MSBuild.Coordinator files are copied to the MSBuild core and net472 bootstrap layouts
- Update VSSetup to include MSBuild.Coordinator.exe and related files for x86, x64 and arm64.
Microsoft.Build.Framework.csproj was missing a package reference to Microsoft.IO.Redist. Without this, Microsoft.Build.Framework acquires its Microsoft.IO.Redist reference through the "UseFrozenMaintenancePackageVersions" machinery.
- Add MSBuild.Coordinator.UnitTests project configured as a test project.
- Add project references to MSBuild.Coordinator and Microsoft.Build.UnitTests.Shared.
- Ensure MSBuild.Coordinator.UnitTests has InternalsVisibleTo access to Microsoft.Build.Framework and MSBuild.Coordinator.
Introduce the shared protocol for communication between MSBuild and the
build coordinator via named pipes. The coordinator manages a system-wide
node budget to prevent parallel builds from oversubscribing resources.

Protocol types (in Microsoft.Build.Framework.Coordinator):
- ClientMessage/ServerMessage base records with Read/Write support
- RequestNodes, ReleaseNodes, Heartbeat (client → coordinator)
- NodeGrant, Wait, Error (coordinator → client)
- Protocol: constants, env var names, pipe name helper

Binary format: [version:byte][messageType:byte][payload...]
Version validation on read with InternalErrorException on mismatch.
Implement the MSBuild coordinator server that manages a system-wide node
budget to prevent parallel builds from oversubscribing resources.

Server components:
- BuildGrant: tracks per-build grant state (PID, requested/granted nodes,
  heartbeat timestamp)
- NodeBudgetManager: fair-share allocation policy with wait queue, minimum
  grant of 1 node, and FIFO drain on release
- ClientConnection: owns the named pipe stream and reader/writer for a
  single client alongside its BuildGrant
- CoordinatorServer: named pipe listener, per-client async handler,
  heartbeat monitor with PID-based dead process detection, and
  auto-shutdown after configurable idle timeout
- Program: entry point with named mutex for single-instance enforcement,
  environment variable configuration

Tests:
- NodeBudgetManager unit tests covering grant, queue, fair-share, release,
  drain, starvation prevention, and accounting consistency
- CoordinatorServer integration tests over real named pipes covering
  single/multiple clients, fair-share with multiple waiters, client
  disconnect, waiting client disconnect, concurrent connections,
  heartbeat, invalid protocol, auto-shutdown, and PID reconnection
Introduce CoordinatorClient, which connects to the coordinator server
over a named pipe, requests a node grant, maintains heartbeats, and
releases the grant on dispose. If no coordinator is running, it attempts
to launch one from the adjacent MSBuild.Coordinator binary.

Wire the client into BuildManager.BeginBuild/EndBuild: when the
MSBUILDUSECOORDINATOR environment variable is set, the client requests
a grant before node infrastructure is initialized and caps MaxNodeCount
to the granted value. The grant is released in EndBuild's finally block.

Add a TryConnectToServer internal overload that accepts a pipe name and
process ID for testing without launching a real coordinator. Add
InternalsVisibleTo for MSBuild.Coordinator.UnitTests and a project
reference to Microsoft.Build from the test project.

Add CoordinatorClient_Tests covering grant negotiation, budget capping,
no-server fallback, dispose/release flow, deferred grant after wait,
double-dispose safety, and sequential fair-share allocation.
Introduce ICoordinatorLogger with an interpolated string handler to
avoid allocations when tracing is disabled. Both sides are gated on
MSBUILDDEBUGCOMM (Traits.Instance.DebugNodeCommunication).

Client-side: CoordinatorClient.DefaultLogger delegates to
CommunicationsUtilities.Trace so coordinator diagnostics are
interleaved with existing node communication traces in
MSBuild_CommTrace_PID_{pid}.txt.

Server-side: CoordinatorServer.DefaultLogger writes to a dedicated
MSBuild_CoordinatorTrace_PID_{pid}.txt since the coordinator process
only references Microsoft.Build.Framework.

Heartbeat messages are intentionally not traced to avoid flooding
the log files.
- Move src/Shared/NamedPipeUtil.cs to src/Framework/NamedPipeUtil.cs
- Move NamedPipeUtil to the Microsoft.Build.BackEnd namespace
- Update Protocol.GetPipeName() to use NamedPipUtil.
Add CoordinatorIntegration_Tests with two end-to-end tests that launch
real bootstrap MSBuild processes with the coordinator enabled:

- ParallelBuilds_BothSucceedWithCoordinator: runs two parallel builds
  each requesting 8 nodes against a budget of 4, verifying both succeed
  and the coordinator doesn't break anything.
- SingleBuild_CoordinatorCapsMaxNodeCount: runs a single build requesting
  16 nodes against a budget of 2, verifying $(MSBuildNodeCount) reports
  the capped value.

Tests use MSBUILDCOORDINATORPIPENAME to isolate each run with a unique
pipe name, preventing interference with other coordinator instances.

Add PipeNameEnvironmentVariable to Protocol and update GetPipeName() to
honor the override, routing through NamedPipeUtil.GetPlatformSpecificPipeName
for Unix-safe pipe paths.
Route pipe names in CoordinatorServer_Tests and CoordinatorClient_Tests
through NamedPipeUtil.GetPlatformSpecificPipeName so that tests use
/tmp-based paths on Unix, avoiding Unix Domain Socket path length limits.
Move wire format logic into ClientMessage and ServerMessage base records
using the Template Method pattern: non-virtual WriteTo handles the
version + message type prefix and flush, while virtual WritePayload
provides the extension point for subclass data.

Add ReadFrom static factory methods on each base for deserialization,
and introduce BinaryReader/BinaryWriter extension methods in
Extensions.cs (ReadClientMessage, ReadServerMessage, Write) so call
sites read naturally as reader.ReadClientMessage() and
writer.Write(message).

Update CoordinatorServer, CoordinatorClient, and all tests to use
the new extension methods.
Replace direct InternalErrorException throws in ClientMessage.ReadFrom
and ServerMessage.ReadFrom with FrameworkErrorUtilities.ThrowInternalError,
consistent with how internal errors are raised elsewhere in Framework.
Create a unified CoordinatorSettings type in src/Framework/Coordinator that centralizes all coordinator client and server configuration, replacing scattered environment variable reads and hardcoded defaults throughout the codebase.

Changes:

1. Add CoordinatorSettings sealed record class (src/Framework/Coordinator/CoordinatorSettings.cs)
   - Provides single source of truth for heartbeat interval, node budget, pipe name, shutdown timeout, connection timeout, and process id
   - Exposes CoordinatorSettings.Default singleton with standard defaults
   - Implements FromEnvironment() factory for environment-variable-based configuration
   - Uses init-only properties with lazy evaluation and normalization
   - Enables ergonomic test customization via "Default with { ... }" syntax

2. Consolidate defaults from Protocol to CoordinatorSettings
   - Move PipeNameBase, DefaultHeartbeatIntervalMs, DefaultMissedHeartbeatsThreshold
   - Keep Protocol focused on wire format (Version) and env var names only
   - Cleaner separation: Protocol = wire contract, CoordinatorSettings = runtime config

3. Refactor coordinator client and server to consume CoordinatorSettings
   - CoordinatorClient.TryConnect() accepts optional CoordinatorSettings for defaults
   - CoordinatorClient.TryConnectToServer() simplified to settings-driven signature
   - CoordinatorServer primary constructor now takes CoordinatorSettings
   - Remove duplicated GetEnvironmentInt() parsing logic
   - All connection parameters (pipe name, heartbeat, process id, timeout) flow through settings instead of separate arguments

4. Update coordinator program entrypoint (Program.cs)
   - Calls CoordinatorSettings.FromEnvironment() once at startup
   - Passes settings object to CoordinatorServer constructor
   - Eliminates repeated env var parsing

5. Improve test infrastructure
   - Migrate all coordinator unit test setup to "Default with { ... }" pattern
   - Add CreateServerSettings() and CreateClientSettings() helpers to reduce duplication
   - Ensure tests can construct custom CoordinatorSettings to avoid environment reads
   - Split Protocol_Tests into CoordinatorSettings_Tests and Protocol_Tests classes
   - Add explicit coverage for environment-override behavior and custom settings
   - New test: TryConnect_CustomSettings_UsesSettingsPipeName()

Benefits:
- Single configuration entry point makes coordinator setup self-documenting
- Tests can easily override individual settings without env var pollution
- Easier to add new coordinator options (just add a property to CoordinatorSettings)
- Cleaner separation of concerns between protocol and runtime configuration
- All 48 coordinator unit tests pass with no regressions
Fixes unobserved exception warnings when HandleClientAsync tasks log after test
teardown. Track spawned client tasks and wait for completion before RunAsync exits. Use continuation to remove completed tasks, preventing unbounded growth.
Prevent SendHeartbeat timer callback from running after Dispose() has freed resources.

The race condition occurred because:
- Timer.Dispose() only stops scheduling new callbacks; it doesn't wait for in-flight ones
- SendHeartbeat checks _disposed flag, sees false, and starts to write to _writer
- Meanwhile Dispose() sets _disposed and disposes _writer, _reader, _pipeStream
- SendHeartbeat then crashes trying to use disposed resources

The fix:
- Make _disposed volatile to ensure thread visibility
- Use Timer.Dispose(WaitHandle) overload, which blocks until any running callback completes
- This guarantees: once _disposed is true and timer disposal returns, no callback can be executing
- Safe to dispose resources without broad locking

This is a targeted synchronization fix that relies on Timer's built-in coordination rather than adding lock contention to the hot path (SendHeartbeat runs every N milliseconds).
- Use ArgumentOutOfRangeException.ThrowIfNegativeOrZero(...)
- Use RefArrayBuilder<BuildGrant> to avoid allocating an ImmutableArray<BuildGrant>.Builder.
Fix a race condition where a client reconnecting with the same PID could cause ReleaseConnection() to remove the wrong connection from the _connectionsByProcessId dictionary.

Changes:
- Add ReaderWriterLockSlim _connectionLock for coordinating connection access
- Replace ConcurrentDictionary with Dictionary<int, ClientConnection> since all access is now protected by explicit locking
- Add identity verification in ReleaseConnection() before removal to prevent removal of new connections with the same PID
- Use write lock when storing new connections in HandleClientAsync()
- Use read lock for connection lookups in ReleaseConnection() and CheckHeartbeats()
- Refactor CheckHeartbeats() to capture a snapshot of connections upfront, eliminating enumerator invalidation risk and simplifying lock management
- Use DisposableReadLock and DisposableWriteLock helpers for cleaner code

The read-heavy/write-infrequent access pattern (frequent heartbeat checks, infrequent connect/disconnect) makes ReaderWriterLockSlim more efficient than a simple lock. Snapshot-based heartbeat checking avoids iterator complications and ensures clean separation of concerns.
Keep Protocol focused on wire-format concerns by removing coordinator
environment variable name constants from Protocol and centralizing them
in Traits, where environment toggles are already defined.
Introduce string polyfill helpers in src/Framework/Polyfills/StringExtensions.cs, including FastAllocateString and non-NET Create support for span-based initialization.

Update src/Framework/EscapingUtilities.cs to use string.FastAllocateString in the non-NET encoding path.

This keeps the hot path allocation strategy consistent across targets without depending on newer runtime-only APIs.
Coordinator mutex naming previously combined a Windows Global\\ prefix with a Unix pipe path, yielding invalid names like Global\\/tmp/... on Unix.

Keep Global\\ naming on Windows and generate a stable SHA-256-based identifier for Unix to avoid path characters in mutex names.

This prevents coordinator process startup failures and allows node-budget coordination behavior to apply on Unix builds.
@DustinCampbell DustinCampbell marked this pull request as ready for review May 5, 2026 23:38
Copilot AI review requested due to automatic review settings May 5, 2026 23:38
@DustinCampbell DustinCampbell requested a review from a team as a code owner May 5, 2026 23:38
Copy link
Copy Markdown
Member

@rainersigwald rainersigwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc looks good now, thanks.

@DustinCampbell
Copy link
Copy Markdown
Member Author

@JanProvaznik -- Feel free to reach out over Teams if you'd like to discuss anything in detail.

@jankratochvilcz
Copy link
Copy Markdown
Contributor

Thanks a bunch for this @DustinCampbell! I think we're onto something here. 🚀

Some things that would be great to align around:

  1. It would be great to collect some data after this is merged to understand how it performs in the wild. I'd be especially interested what is the incidence of concurrent builds and wait times this introduces. I think we expect concurrent builds on local to become more common as worktree-based workflows gain in popularity, but seems like a great opportunity to get some hard data, especially as we have this feature under opt-in for now.
  2. I didn't find any UX bits that help the user figure out what's going on - if i start a build while the number of nodes is maxed out, even getting a message into output that the build is waiting for allocation would be helpful (especially for trying it out initially)
  3. (not blocking for this PR, more longer-term) - what would be missing to make this opt-out in the longer-term assuming we prove the value? I think if we want adoption, that's the bar to go for :-)
  4. I'm not totally clear how this lines up with server / multithreaded / static build graph in the long-term.

Btw. for experimentation it's great how self-encapsulated the implementation is here :-)

@DustinCampbell
Copy link
Copy Markdown
Member Author

DustinCampbell commented May 20, 2026

I didn't find any UX bits that help the user figure out what's going on - if i start a build while the number of nodes is maxed out, even getting a message into output that the build is waiting for allocation would be helpful (especially for trying it out initially)

Yeah, this is currently undone, but I agree that it's really important. I'll go ahead and log relevant messages to the terminal output/logger for the initial release.

Edit: Logging messages added.

Move coordinator connection in BuildManager.BeginBuild to after logging service initialization so that important messages are visible in the terminal logger. Log "waiting for resources" at High importance when the build is blocked, and "granted N nodes" at Normal importance on success.

Rename ICoordinatorLogger to ICoordinatorOutput (and related types) to distinguish the debug trace output from MSBuild's ILoggingService.
- Extract TryConnectToServer into CoordinatorClient.TestAccessor partial class and update test call site
- Add XML doc comments (<param>, <returns>) to CoordinatorClient, CoordinatorServer, NodeBudgetManager, BuildGrant, and ClientConnection
- Reorder members in CoordinatorClient and CoordinatorServer to follow convention: fields → constructor → Dispose → public API → private methods in call-order
@DustinCampbell
Copy link
Copy Markdown
Member Author

(not blocking for this PR, more longer-term) - what would be missing to make this opt-out in the longer-term assuming we prove the value? I think if we want adoption, that's the bar to go for :-)

For sure! I think the important thing to get right is the tuning.

  • The coordinator uses a big hammer: resources are limited to the number of processor cores on the machine. However, that might ultimately be too conservative.
  • I also suspect that coordinator requests for resources should be more nuanced. Today, all nodes are treated the same, regardless of whether they're in-proc or out-of-proc. However, in-proc nodes (threads) are likely cheaper than out-of-proc nodes (processes) and could be counted separately.
  • Should builds with node reuse enabled by treated differently in some way?
  • In this initial pull request, the number of resources are granted to a build never changes. So, if build A and build B are running concurrently and both are using half of the available resources, build B will continue to use half of the available resources when build A completes. Instead, should the coordinator update build B with an increased grant, so it finishes faster? I haven't looked into implementing this yet (I kept things simpler for now), but I expect it will require changes to the scheduler.

I'm definitely interested in design discussion or feedback around some of those issues.

I'm not totally clear how this lines up with server / multithreaded / static build graph in the long-term.

Theoretically, I expect the coordinator to "just work" with server, multi-threaded, and static build graph with small adjustments. The coordinator sits outside of all builds -- server, multi-threaded, or otherwise -- and adjusts a build's "MaxNodeCount" parameter based on the resources available (i.e. processor cores). If no resources are available (because other builds have claimed them), that build waits until the coordinator is able to grant it resources. You can imagine coordinator as a sort of crude "global throttle" on build node counts.

When multiple builds race to start the coordinator, each would launch a separate process. The server-side instance mutex prevents dual service, but the losing client could fail its retry if timing is unlucky.

Add a client-side launch mutex (distinct from the server instance mutex) so only one client launches the coordinator while others wait, then connect to the now-running instance.

- Extract TryLaunchAndConnect with mutex acquire → retry → launch → retry
- Move mutex name generation into CoordinatorSettings (ServerMutexName, LaunchMutexName) using SHA256 hashing for Unix compatibility
- Simplify Program.cs to use settings.ServerMutexName
- Add CoordinatorLaunchTimedOut and CoordinatorFailedToConnect messages so the user knows when coordinator connection fails completely
Copy link
Copy Markdown
Contributor

@jankratochvilcz jankratochvilcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(in progress review) I didn't have enough time today to review the whole thing yet, will hopefully finish tomorrow. Checking in comments so far.

Comment thread src/Build/BackEnd/BuildManager/CoordinatorClient.cs Outdated
Comment thread src/Build/BackEnd/BuildManager/CoordinatorClient.cs
// Send heartbeats while waiting so the server doesn't consider us stale.
using (Timer heartbeatPump = CreateHeartbeatPump(writer, settings.HeartbeatIntervalMs))
{
ServerMessage grantAfterWait = reader.ReadServerMessage();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the core blocking call that waits for the server to give the go-ahead, right? Can you please share what is the mechanism to make sure that the client is not stuck indefinitely when the server goes dark (but doesn't necessarily crash)?

Copy link
Copy Markdown
Member Author

@DustinCampbell DustinCampbell May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right that this is the core blocking call, but there isn't a mechanism to fail gracefully. If the server gets into some sort of infinite loop and never closes the pipe or crashes, the client will never move on. Although, that's by design. If the server's stuck, there isn't really much the client can do. A timeout isn't appropriate because the client could realistically be waiting for minutes if there a long-running builds ahead of it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tough one - let's skip for now since you added the logging messages which should give some indication that the build is blocked on the coordinator. I think for a wider release we need to re-visit this one tho' as it could make the UX frustrating in some cases. Feel free to close the comment for now!

Comment thread src/Build/BackEnd/BuildManager/CoordinatorClient.cs
Comment thread src/Build/BackEnd/BuildManager/CoordinatorClient.cs
Comment thread src/Framework/Coordinator/Messages/ClientMessage.cs
Comment thread src/Framework/Coordinator/ICoordinatorOutput.cs
Comment thread src/Framework/Coordinator/Messages/RequestNodesMessage.cs
Comment thread src/Framework/Coordinator/CoordinatorSettings.cs
Comment thread src/Framework/Coordinator/CoordinatorSettings.cs
- Add success message when coordinator is launched and connection succeeds
- Make failure message more specific ("Failed to connect to newly launched coordinator")
- Remove redundant "Retrying connection after launch" message
- Move launch mutex comment to TryLaunchAndConnect where the mutex lives
Start a Stopwatch when a WaitMessage is received and include the elapsed time in diagnostic output when the deferred grant (or an unexpected response) arrives.
Add CoordinatorWaitDurationMs to BuildTelemetry to track how long builds spend waiting for a deferred node grant from the coordinator. The value is set from CoordinatorClient.WaitDuration after connecting and emitted with the rest of the per-build telemetry.
Copy link
Copy Markdown
Contributor

@jankratochvilcz jankratochvilcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @DustinCampbell, excited to merge this! I left some more feedback. The main part for me is to have enough telemetry to assess how this performs in the wild. Once we have that sorted happy to approve. @AR-May will be a second required reviewer to help with any de-risking and also onboard her into this workstream, we discussed it today with her. Will open a DM to discuss the collaboration model for this and future PRs.

For the fair distribution algorithm I'm quite curious how this also performs when working with it longer-term, but happy to suspend disbelief on that bit; it's not blocking :-)

Comment thread src/Framework/Traits.cs
/// <summary>
/// Name of environment variable that overrides coordinator pipe name.
/// </summary>
public const string CoordinatorPipeNameEnvVarName = "MSBUILDCOORDINATORPIPENAME";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a scenario you have in mind where you'd leverage this configurability?

init => _missedHeartbeatsThreshold = value > 0 ? value : DefaultMissedHeartbeatsThreshold;
}

public int TotalNodeBudget
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagining this rolling out - it seems like the default number here is something that we may want to play with based on the perf characteristics we see. I'm wondering if it makes sense to consider expressing this value as a multiple of current machine processor count (does not need to be in this first cut)? If we e.g., see that utilization numbers are insufficient, it seems like we could converge to e.g., 1.5x processor count defaults and if we have a relative number here we could fully encapsulate that default change without needing additional work from our callers like the SDK.

init => _missedHeartbeatsThreshold = value > 0 ? value : DefaultMissedHeartbeatsThreshold;
}

public int TotalNodeBudget
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just trying to understand - is it fair to say that in the vast majority of SDK usage cases, MSBuild today requests node count == processor count and since that's also the default value here, in practical terms we are serializing builds, right? I'm trying to figure out what are the scenarios we will see where we have parallel builds on unequal node counts - this could give us some super interesting insights.

// Fair share: divide available budget among this build and all waiting builds.
// This ensures a new arrival doesn't consume everything while others wait.
int contenders = _waitQueue.Count + 1; // +1 for the new arrival
int fairShare = Math.Max(1, available / contenders);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is interesting. This means that we're not actually aiming to serialize here and instead share cores across parallel builds.

I'm a bit surprised about this one - my intuition would be that it's preferable to move builds serially at a quick pace compared to moving them in parallel at a slow pace. If I imagine e.g., the messy workflow of multiple agents without work-trees that are iterating, what could matter is that the first agent gets to the failure quickly to fix it instead everyone waiting for a reported failed build, especially if it's in a shared dir (I know for worktrees it's different). From a human point of view, I could also personally see preferring the serialized builds as they can feel more deterministic (i.e., I can see the position in queue and how it progresses) and I can adopt my workflow to it better compared to e.g., getting my builds meaningfully faster or slower because I'm competing with e.g., an agent in a worktree for cores.

What is your take on this?

Btw. I think this is a bit theoretical so we need to see the real-world usage pattern, but good to understand the intention here.

/// Creates a new budget manager with the specified total node capacity.
/// </summary>
/// <param name="totalBudget">The maximum number of nodes available across all builds.</param>
public NodeBudgetManager(int totalBudget)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we produce telemetry here? I think here emitting the queue depth + processor allocations requested vs. granted could also give us quite good insights into what is the actual usage pattern like in the wild. Based on the separate executable design, I don't see the wiring that would enable us to trace the server-only signals; is it worth adding?

// Send heartbeats while waiting so the server doesn't consider us stale.
using (Timer heartbeatPump = CreateHeartbeatPump(writer, settings.HeartbeatIntervalMs))
{
ServerMessage grantAfterWait = reader.ReadServerMessage();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tough one - let's skip for now since you added the logging messages which should give some indication that the build is blocked on the coordinator. I think for a wider release we need to re-visit this one tho' as it could make the UX frustrating in some cases. Feel free to close the comment for now!

Comment thread src/Framework/Coordinator/CoordinatorSettings.cs
@jankratochvilcz jankratochvilcz requested review from AR-May and removed request for JanProvaznik May 22, 2026 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants