Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
0d3ef03
Stage 1: Packet infrastructure + IsRunningMultipleNodes callback
JanProvaznik Jan 29, 2026
7e61770
Fix MSB5025 error code collision - use MSB5027 instead
JanProvaznik Jan 30, 2026
a90b7fe
Add integration test for IsRunningMultipleNodes in MT mode
JanProvaznik Jan 30, 2026
bcf91d3
Consolidate TaskHost callback tests into TaskHostCallback_Tests.cs
JanProvaznik Jan 30, 2026
8bf2412
Make new code files nullable clean
JanProvaznik Feb 9, 2026
2a67efb
Remove redundant TaskHostCallback sample
JanProvaznik Feb 9, 2026
aabe5b1
Remove unnecessary #if !CLR2COMPATIBILITY guards
JanProvaznik Feb 9, 2026
7d847c5
Remove localized TaskHostCallbackConnectionLost error
JanProvaznik Feb 9, 2026
54c55bc
Fail on invalid packet type in HandleCallbackResponse
JanProvaznik Feb 9, 2026
0afb492
Throw NotImplementedException for unknown query types
JanProvaznik Feb 9, 2026
ae179c6
Align callback cancellation with in-process mode: remove _taskCancell…
JanProvaznik Feb 9, 2026
e441126
Add TaskHost threading model documentation
JanProvaznik Feb 9, 2026
f24da9d
Gate TaskHost callbacks behind version check + Traits escape hatch
JanProvaznik Feb 16, 2026
9af9103
Revert accidental xlf whitespace changes
JanProvaznik Feb 16, 2026
c9caf0e
Add .NET Core TaskHost E2E test for callback support
JanProvaznik Feb 17, 2026
4c99f40
Apply suggestion from @JanProvaznik
JanProvaznik Feb 23, 2026
9109e0c
Update documentation/specs/multithreading/taskhost-threading.md
JanProvaznik Feb 23, 2026
3c9243e
Simplify callback wait: replace polling with fail-on-disconnect
JanProvaznik Feb 23, 2026
5f92f5a
Use plain string for connection-lost exception (not a user-facing mes…
JanProvaznik Feb 23, 2026
b2b3f5e
Replace 'parent' terminology with 'owning worker node' per review
JanProvaznik Feb 23, 2026
a1aed77
Remove accidentally committed session state
JanProvaznik Feb 23, 2026
49168b3
Document TaskHost lifecycle: task reuse, state, and shutdown
JanProvaznik Feb 23, 2026
5e849a7
Fix docs: task object cache is disposed per build, not per process
JanProvaznik Feb 23, 2026
a8cf5f3
Call out cancellation-aware callbacks as future opportunity
JanProvaznik Feb 23, 2026
7475320
Split packet serialization tests into separate class
JanProvaznik Feb 23, 2026
992f0fe
Clarify IsRunningMultipleNodes is config-based, not runtime
JanProvaznik Feb 23, 2026
5f410a0
Fix fallback test: assert MSB5022 error via logger, not OverallResult
JanProvaznik Feb 23, 2026
20cd863
Cross-reference duplicate test tasks with explanatory comments
JanProvaznik Feb 23, 2026
2d182bb
Deduplicate test task: link IsRunningMultipleNodesTask into ExampleTask
JanProvaznik Feb 23, 2026
c4988a5
E2E test: target TestTask directly, skip restore
JanProvaznik Feb 23, 2026
3124da5
Use InternalErrorException for unknown query type
JanProvaznik Feb 23, 2026
e81359f
Fix TaskHostLifecycle test: restore ExampleTask copy to output
JanProvaznik Feb 23, 2026
efa132d
simplify: the only query is IsRunningMultipleNodes
JanProvaznik Feb 24, 2026
62b4ae3
Merge branch 'main' into ibuildengine-callbacks-stage1
JanProvaznik Feb 24, 2026
cf07e2b
Use == '1' pattern for MSBUILDENABLETASKHOSTCALLBACKS
JanProvaznik Feb 24, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
138 changes: 138 additions & 0 deletions documentation/specs/multithreading/taskhost-threading.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# Threading in TaskHost Processes

MSBuild can run tasks in a separate process called a **TaskHost** (`OutOfProcTaskHostNode`). This happens when a task requires a different runtime, architecture, or when multithreaded mode (`-mt`) ejects a non-thread-safe task out of the worker node. The TaskHost process communicates with the owning worker node over a named pipe.

## Thread Model

The TaskHost has two threads:

### Main Thread (Communication Thread)

The main thread runs `OutOfProcTaskHostNode.Run()`, a `WaitHandle.WaitAny` loop that services four events:

| Index | Event | Handler |
|-------|-------|---------|
| 0 | `_shutdownEvent` | `HandleShutdown()` — joins the task thread, cleans up, exits |
| 1 | `_packetReceivedEvent` | `HandlePacket()` — dispatches incoming IPC packets |
| 2 | `_taskCompleteEvent` | `CompleteTask()` — sends `TaskHostTaskComplete` to owning worker node |
| 3 | `_taskCancelledEvent` | `CancelTask()` — calls `ICancelableTask.Cancel()` on the task |

This thread is responsible for all IPC: receiving packets from the owning worker node (task configuration, cancellation, callback responses) and sending packets back (log messages, task completion, callback requests).

### Task Runner Thread

When the main thread receives a `TaskHostConfiguration` packet, it spawns the task runner thread (`RunTask`). This thread:

1. Sets up the environment (working directory, env vars, culture)
2. Loads the task assembly and instantiates the task
3. Sets task parameters via reflection
4. Calls `task.Execute()`
5. Collects output parameters
6. Packages the result into `TaskHostTaskComplete` and signals `_taskCompleteEvent`

The task runner thread is where user task code runs. Any `IBuildEngine` calls from the task (logging, property queries, building other projects) are serviced on this thread.

## IBuildEngine Callback Flow (added in Stage 1)

Before callback support, the two threads had a simple lifecycle: the main thread spawned the task thread, waited for completion, and sent the result. Communication was one-directional (worker node → TaskHost for configuration/cancellation, TaskHost → worker node for logs/completion).

With callback support, the task can query the owning worker node for information it doesn't have locally (e.g., `IsRunningMultipleNodes`, and in future stages: `RequestCores`, `BuildProjectFile`). This introduces **bidirectional IPC** between the threads:

```mermaid
sequenceDiagram
participant TR as Task Runner Thread
participant MT as Main Thread
participant PP as Owning Worker Node

TR->>MT: IBuildEngine.Foo()<br/>(sends request packet, blocks)
activate TR

MT->>PP: request packet
Note over PP: (processes request)

PP-->>MT: response packet
MT->>MT: HandleCallbackResponse()<br/>(sets TCS result)

MT-->>TR: TCS unblocks
deactivate TR
```

### How It Works

1. **Task thread** calls an `IBuildEngine` method (e.g., `IsRunningMultipleNodes`).
2. This calls `SendCallbackRequestAndWaitForResponse<T>()`, which:
- Assigns a unique request ID
- Registers a `TaskCompletionSource` in `_pendingCallbackRequests`
- Sends the request packet via `_nodeEndpoint.SendData()`
- Blocks on `tcs.Task.GetAwaiter().GetResult()` until the TCS is completed
3. **Main thread** receives the response packet from the owning worker node, looks up the TCS by request ID, and calls `TrySetResult()`.
4. **Task thread** wakes up, retrieves the typed response, and returns it to the caller.

### Cancellation Semantics

The callback wait intentionally does **not** check `_taskCancelledEvent`. This aligns with how in-process `TaskHost` (regular worker node mode) handles callbacks:

- In regular mode, `IBuildEngine` callbacks are direct method calls that always complete. Cancellation never interrupts a callback mid-flight. Instead, cancellation causes the *work behind* the callback to fail fast (e.g., the scheduler cancels a child build started by `BuildProjectFile`), and the callback returns normally with a failure result.
- In TaskHost mode, the owning worker node continues processing callback requests even after sending `TaskHostTaskCancelled`. The response is **guaranteed** to arrive because the worker node's packet loop only exits upon receiving `TaskHostTaskComplete`, which cannot be sent until the task finishes, which cannot happen until the callback returns.

Cancellation is handled cooperatively: after the callback returns, the task checks its cancellation state (set by `ICancelableTask.Cancel()`) and exits.

> **Future opportunity:** Unlike in-process mode where callbacks are direct method calls that cannot be interrupted, the IPC-based callback mechanism *could* support cancellation-aware callbacks — for example, by failing the pending `TaskCompletionSource` when `_taskCancelledEvent` is signaled. This would let long-running callbacks like `BuildProjectFile` abort immediately on cancellation rather than waiting for the worker node to process and respond. This is not implemented today for consistency with in-process behavior, but the mechanism is in place if needed.

The only exception path is connection loss (owning worker node killed), detected by `OnLinkStatusChanged` which fails all pending `TaskCompletionSource` entries with `InvalidOperationException`. This unblocks task threads immediately.

### Response Guarantee (Why the Callback Cannot Deadlock)

There is a causal dependency chain that prevents deadlock:

```
Worker node sends callback response
→ TaskHost callback returns
→ task finishes Execute()
→ TaskHost sends TaskHostTaskComplete
→ worker node exits packet loop
```

The worker node cannot exit its packet loop without first receiving `TaskHostTaskComplete`. But `TaskHostTaskComplete` cannot be sent until the task finishes. And the task cannot finish while it is blocked waiting for a callback response. Therefore, the worker node **must** process the callback request and send the response before it can ever stop.

## TaskHost Lifecycle

The TaskHost process can execute multiple tasks sequentially. After finishing one task, it returns to an idle state and waits for either a new task or a shutdown signal.

### Event Loop Cycle

```mermaid
stateDiagram-v2
[*] --> Idle: Process starts, endpoint connects
Idle --> Running: TaskHostConfiguration packet arrives
Running --> Idle: CompleteTask() sends result, clears config
Idle --> Shutdown: NodeBuildComplete or connection loss
Running --> Shutdown: _taskCancelledEvent during idle transition
Shutdown --> [*]: HandleShutdown() exits
```

1. **Idle**: `WaitAny()` blocks on the four wait handles. No task thread exists. `_currentConfiguration` is null.
2. **TaskHostConfiguration arrives**: `HandleTaskHostConfiguration()` stores the config and spawns `_taskRunnerThread` to call `RunTask()`. The main thread immediately returns to `WaitAny()`.
3. **Task executes**: `RunTask()` sets up the environment, loads the task assembly, calls `task.Execute()`, collects output parameters, and packages the result into `_taskCompletePacket`. On completion (success or failure), it signals `_taskCompleteEvent`.
4. **CompleteTask()**: The main thread wakes on index 2, sends `_taskCompletePacket` to the owning worker node, and sets `_currentConfiguration = null`. The node is now idle again.
5. **Back to step 1**: The main thread loops back to `WaitAny()`, ready for another `TaskHostConfiguration` or a `NodeBuildComplete`.

### State Between Tasks

Each new `TaskHostConfiguration` carries a full environment snapshot, task parameters, and warning settings. The task runner thread resets per-task state at the start of `RunTask()`:

**Reset per task:** `_isTaskExecuting`, `_currentConfiguration`, `_debugCommunications`, `_updateEnvironment`, `WarningsAsErrors`/`WarningsNotAsErrors`/`WarningsAsMessages`, `_fileAccessData`

**Persists across tasks (within a single build):**
- `s_mismatchedEnvironmentValues` (static) — environment variable fixups for bitness differences, computed once per process
- `_registeredTaskObjectCache` — task object cache with `Build` lifetime scope, disposed at end of each build (in `HandleShutdown()`), recreated fresh on the next `Run()` call
- `_pendingCallbackRequests` / `_nextCallbackRequestId` — callback tracking (should be empty between tasks)

### Shutdown vs. Reuse

When the owning worker node sends `NodeBuildComplete`, `HandleNodeBuildComplete()` decides whether to exit or stay alive:

- **Sidecar TaskHost** (`_nodeReuse = true`): Always sets `BuildCompleteReuse`. The sidecar process persists across builds, re-entering the `Run()` outer loop to accept new connections.
- **Regular TaskHost** (`_nodeReuse = false`): Sets `BuildCompleteReuse` only if `buildComplete.PrepareForReuse` is true **and** `Traits.Instance.EscapeHatches.ReuseTaskHostNodes` is enabled. Otherwise sets `BuildComplete` and the process exits. This avoids holding assembly locks on custom task DLLs between builds.

There is **no idle timeout**. The `WaitAny()` call has no timeout parameter — the TaskHost waits indefinitely until it receives a shutdown signal or the connection drops.
32 changes: 32 additions & 0 deletions src/Build.UnitTests/BackEnd/IsRunningMultipleNodesTask.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using Microsoft.Build.Framework;
using Microsoft.Build.Utilities;

namespace Microsoft.Build.UnitTests.BackEnd
{
/// <summary>
/// A simple task that queries IsRunningMultipleNodes from the build engine.
/// Used by TaskHostCallback_Tests (in-process) and NetTaskHost_E2E_Tests (cross-runtime).
/// The E2E project includes this file via linked compile to avoid duplication.
/// </summary>
public class IsRunningMultipleNodesTask : Task
{
[Output]
public bool IsRunningMultipleNodes { get; set; }

public override bool Execute()
{
if (BuildEngine is IBuildEngine2 engine2)
{
IsRunningMultipleNodes = engine2.IsRunningMultipleNodes;
Log.LogMessage(MessageImportance.High, $"IsRunningMultipleNodes = {IsRunningMultipleNodes}");
return true;
}

Log.LogError("BuildEngine does not implement IBuildEngine2");
return false;
}
}
}
50 changes: 50 additions & 0 deletions src/Build.UnitTests/BackEnd/TaskHostCallbackPacket_Tests.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using Microsoft.Build.BackEnd;
using Shouldly;
using Xunit;

namespace Microsoft.Build.UnitTests.BackEnd
{
/// <summary>
/// Pure unit tests for TaskHost callback packet serialization.
/// No I/O or BuildManager — just round-trip translation.
/// </summary>
public class TaskHostCallbackPacket_Tests
{
[Fact]
public void TaskHostIsRunningMultipleNodesRequest_RoundTrip_Serialization()
{
var request = new TaskHostIsRunningMultipleNodesRequest();
request.RequestId = 42;

ITranslator writeTranslator = TranslationHelpers.GetWriteTranslator();
request.Translate(writeTranslator);

ITranslator readTranslator = TranslationHelpers.GetReadTranslator();
var deserialized = (TaskHostIsRunningMultipleNodesRequest)TaskHostIsRunningMultipleNodesRequest.FactoryForDeserialization(readTranslator);

deserialized.RequestId.ShouldBe(42);
deserialized.Type.ShouldBe(NodePacketType.TaskHostIsRunningMultipleNodesRequest);
}

[Theory]
[InlineData(true)]
[InlineData(false)]
public void TaskHostIsRunningMultipleNodesResponse_RoundTrip_Serialization(bool isRunningMultipleNodes)
{
var response = new TaskHostIsRunningMultipleNodesResponse(123, isRunningMultipleNodes);

ITranslator writeTranslator = TranslationHelpers.GetWriteTranslator();
response.Translate(writeTranslator);

ITranslator readTranslator = TranslationHelpers.GetReadTranslator();
var deserialized = (TaskHostIsRunningMultipleNodesResponse)TaskHostIsRunningMultipleNodesResponse.FactoryForDeserialization(readTranslator);

deserialized.RequestId.ShouldBe(123);
deserialized.IsRunningMultipleNodes.ShouldBe(isRunningMultipleNodes);
deserialized.Type.ShouldBe(NodePacketType.TaskHostIsRunningMultipleNodesResponse);
}
}
}
144 changes: 144 additions & 0 deletions src/Build.UnitTests/BackEnd/TaskHostCallback_Tests.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System.Collections.Generic;
using System.IO;
using Microsoft.Build.BackEnd;
using Microsoft.Build.Execution;
using Microsoft.Build.UnitTests.Shared;
using Shouldly;
using Xunit;
using Xunit.Abstractions;

namespace Microsoft.Build.UnitTests.BackEnd
{
/// <summary>
/// Integration tests for IBuildEngine callback support in TaskHost.
/// These tests use BuildManager to run real builds with TaskHostFactory.
/// For packet serialization tests, see <see cref="TaskHostCallbackPacket_Tests"/>.
/// </summary>
public class TaskHostCallback_Tests
{
private readonly ITestOutputHelper _output;

public TaskHostCallback_Tests(ITestOutputHelper output)
{
_output = output;
}

/// <summary>
/// Verifies IsRunningMultipleNodes callback works when task is explicitly run in TaskHost via TaskHostFactory.
/// IsRunningMultipleNodes is configuration-based (MaxNodeCount > 1), not based on actual running nodes.
/// See TaskHost.IsRunningMultipleNodes: returns _host.BuildParameters.MaxNodeCount > 1 || _disableInprocNode.
/// </summary>
[Theory]
[InlineData(1, false)] // MaxNodeCount=1 → IsRunningMultipleNodes=false
[InlineData(4, true)] // MaxNodeCount=4 → IsRunningMultipleNodes=true (even with one project)
public void IsRunningMultipleNodes_WorksWithExplicitTaskHostFactory(int maxNodeCount, bool expectedResult)
{
using TestEnvironment env = TestEnvironment.Create(_output);
env.SetEnvironmentVariable("MSBUILDENABLETASKHOSTCALLBACKS", "1");

string projectContents = $@"
<Project>
<UsingTask TaskName=""{nameof(IsRunningMultipleNodesTask)}"" AssemblyFile=""{typeof(IsRunningMultipleNodesTask).Assembly.Location}"" TaskFactory=""TaskHostFactory"" />
<Target Name=""Test"">
<{nameof(IsRunningMultipleNodesTask)}>
<Output PropertyName=""Result"" TaskParameter=""IsRunningMultipleNodes"" />
</{nameof(IsRunningMultipleNodesTask)}>
</Target>
</Project>";

TransientTestProjectWithFiles project = env.CreateTestProjectWithFiles(projectContents);
ProjectInstance projectInstance = new(project.ProjectFile);

BuildResult buildResult = BuildManager.DefaultBuildManager.Build(
new BuildParameters { MaxNodeCount = maxNodeCount, EnableNodeReuse = false },
new BuildRequestData(projectInstance, targetsToBuild: ["Test"]));

buildResult.OverallResult.ShouldBe(BuildResultCode.Success);
bool.Parse(projectInstance.GetPropertyValue("Result")).ShouldBe(expectedResult);
}

/// <summary>
/// Verifies IsRunningMultipleNodes callback works when unmarked task is auto-ejected to TaskHost in MT mode.
/// </summary>
[Theory]
[InlineData(1, false)]
[InlineData(4, true)]
public void IsRunningMultipleNodes_WorksWhenAutoEjectedInMultiThreadedMode(int maxNodeCount, bool expectedResult)
{
using TestEnvironment env = TestEnvironment.Create(_output);
env.SetEnvironmentVariable("MSBUILDENABLETASKHOSTCALLBACKS", "1");
string testDir = env.CreateFolder().Path;

// IsRunningMultipleNodesTask lacks MSBuildMultiThreadableTask attribute, so it's auto-ejected to TaskHost in MT mode
string projectContents = $@"
<Project>
<UsingTask TaskName=""{nameof(IsRunningMultipleNodesTask)}"" AssemblyFile=""{typeof(IsRunningMultipleNodesTask).Assembly.Location}"" />
<Target Name=""Test"">
<{nameof(IsRunningMultipleNodesTask)}>
<Output PropertyName=""Result"" TaskParameter=""IsRunningMultipleNodes"" />
</{nameof(IsRunningMultipleNodesTask)}>
</Target>
</Project>";

string projectFile = Path.Combine(testDir, "Test.proj");
File.WriteAllText(projectFile, projectContents);

var logger = new MockLogger(_output);
BuildResult buildResult = BuildManager.DefaultBuildManager.Build(
new BuildParameters
{
MultiThreaded = true,
MaxNodeCount = maxNodeCount,
Loggers = [logger],
EnableNodeReuse = false
},
new BuildRequestData(projectFile, new Dictionary<string, string?>(), null, ["Test"], null));

buildResult.OverallResult.ShouldBe(BuildResultCode.Success);

// Verify task was ejected to TaskHost
logger.FullLog.ShouldContain("external task host");

// Verify callback returned correct value
logger.FullLog.ShouldContain($"IsRunningMultipleNodes = {expectedResult}");
}

/// <summary>
/// Verifies that accessing IsRunningMultipleNodes when callbacks are disabled
/// logs error MSB5022 (BuildEngineCallbacksInTaskHostUnsupported).
/// This preserves the pre-callback behavior where unsupported IBuildEngine
/// methods in TaskHost log an error.
/// </summary>
[Fact]
public void IsRunningMultipleNodes_LogsErrorWhenCallbacksNotSupported()
{
using TestEnvironment env = TestEnvironment.Create(_output);

// Explicitly do NOT set MSBUILDENABLETASKHOSTCALLBACKS — callbacks should be disabled
string projectContents = $@"
<Project>
<UsingTask TaskName=""{nameof(IsRunningMultipleNodesTask)}"" AssemblyFile=""{typeof(IsRunningMultipleNodesTask).Assembly.Location}"" TaskFactory=""TaskHostFactory"" />
<Target Name=""Test"">
<{nameof(IsRunningMultipleNodesTask)}>
<Output PropertyName=""Result"" TaskParameter=""IsRunningMultipleNodes"" />
</{nameof(IsRunningMultipleNodesTask)}>
</Target>
</Project>";

TransientTestProjectWithFiles project = env.CreateTestProjectWithFiles(projectContents);
ProjectInstance projectInstance = new(project.ProjectFile);

var logger = new MockLogger(_output);
BuildResult buildResult = BuildManager.DefaultBuildManager.Build(
new BuildParameters { MaxNodeCount = 4, EnableNodeReuse = false, Loggers = [logger] },
new BuildRequestData(projectInstance, targetsToBuild: ["Test"]));

// MSB5022 error should be logged — the callback was not forwarded
logger.ErrorCount.ShouldBeGreaterThan(0);
logger.FullLog.ShouldContain("MSB5022");
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
<CopyLocalLockFileAssemblies>true</CopyLocalLockFileAssemblies>

<!-- Suppression in needed to build ExampleTaskX86 that targets x86 architecture. -->
<NoWarn>$(NoWarn);MSB3270</NoWarn>
<NoWarn>$(NoWarn);MSB3270;CS0436</NoWarn>
</PropertyGroup>

<ItemGroup>
Expand Down
Loading