[IBuildEngine callbacks] Stage 1: Packet infrastructure + IsRunningMultipleNodes#13149
Conversation
3c57f6f to
684a479
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
Stage 1 of enabling IBuildEngine callback support for out-of-proc TaskHost execution by introducing request/response packet infrastructure and implementing IBuildEngine2.IsRunningMultipleNodes via a TaskHost→parent query/response roundtrip.
Changes:
- Added TaskHost callback packet types and correlation interface (
ITaskHostCallbackPacket) plus query request/response packets. - Implemented
IsRunningMultipleNodesforwarding from TaskHost to parent (OutOfProcTaskHostNode↔TaskHostTask). - Added unit + integration tests validating packet serialization and TaskHost callback behavior.
Reviewed changes
Copilot reviewed 28 out of 28 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Shared/TaskHostQueryResponse.cs | Adds response packet for TaskHost query callbacks. |
| src/Shared/TaskHostQueryRequest.cs | Adds request packet and query enum for simple callback queries. |
| src/Shared/ITaskHostCallbackPacket.cs | Defines request/response correlation contract via RequestId. |
| src/Shared/INodePacket.cs | Reserves and defines new NodePacketType values for TaskHost callbacks. |
| src/Samples/TaskHostCallback/TestIsRunningMultipleNodesTask.cs | Sample task exercising IBuildEngine2.IsRunningMultipleNodes in TaskHost. |
| src/Samples/TaskHostCallback/TestIsRunningMultipleNodes.proj | Sample project wiring task via TaskHostFactory. |
| src/Samples/TaskHostCallback/TaskHostCallback.csproj | Builds the sample TaskHost callback task assembly. |
| src/MSBuild/Resources/xlf/Strings.zh-Hant.xlf | Adds localized entry for new TaskHost connection-loss string. |
| src/MSBuild/Resources/xlf/Strings.zh-Hans.xlf | Adds localized entry for new TaskHost connection-loss string. |
| src/MSBuild/Resources/xlf/Strings.tr.xlf | Adds localized entry for new TaskHost connection-loss string. |
| src/MSBuild/Resources/xlf/Strings.ru.xlf | Adds localized entry for new TaskHost connection-loss string. |
| src/MSBuild/Resources/xlf/Strings.pt-BR.xlf | Adds localized entry for new TaskHost connection-loss string. |
| src/MSBuild/Resources/xlf/Strings.pl.xlf | Adds localized entry for new TaskHost connection-loss string. |
| src/MSBuild/Resources/xlf/Strings.ko.xlf | Adds localized entry for new TaskHost connection-loss string. |
| src/MSBuild/Resources/xlf/Strings.ja.xlf | Adds localized entry for new TaskHost connection-loss string. |
| src/MSBuild/Resources/xlf/Strings.it.xlf | Adds localized entry for new TaskHost connection-loss string. |
| src/MSBuild/Resources/xlf/Strings.fr.xlf | Adds localized entry for new TaskHost connection-loss string. |
| src/MSBuild/Resources/xlf/Strings.es.xlf | Adds localized entry for new TaskHost connection-loss string. |
| src/MSBuild/Resources/xlf/Strings.de.xlf | Adds localized entry for new TaskHost connection-loss string. |
| src/MSBuild/Resources/xlf/Strings.cs.xlf | Adds localized entry for new TaskHost connection-loss string. |
| src/MSBuild/Resources/Strings.resx | Adds new TaskHost connection-loss resource string. |
| src/MSBuild/OutOfProcTaskHostNode.cs | Implements callback request plumbing + forwards IsRunningMultipleNodes to parent. |
| src/MSBuild/MSBuild.csproj | Includes new Shared callback packet source files in MSBuild build. |
| src/Build/Microsoft.Build.csproj | Includes new Shared callback packet source files in Microsoft.Build build. |
| src/Build/Instance/TaskFactories/TaskHostTask.cs | Handles TaskHost query requests and replies with forwarded results. |
| src/Build.UnitTests/BackEnd/TaskHostQueryPacket_Tests.cs | Adds roundtrip serialization tests for query request/response packets. |
| src/Build.UnitTests/BackEnd/TaskHostFactory_Tests.cs | Adds integration test verifying IsRunningMultipleNodes works from TaskHost. |
| src/Build.UnitTests/BackEnd/IsRunningMultipleNodesTask.cs | Adds test task used by integration test to call IsRunningMultipleNodes. |
Implements callback infrastructure for TaskHost IBuildEngine callbacks: - Add packet type enum values for TaskHost callbacks (0x20-0x27 range) - Create ITaskHostCallbackPacket interface for request/response correlation - Add TaskHostQueryRequest/Response packets for property queries - Implement IsRunningMultipleNodes forwarding in OutOfProcTaskHostNode - Handle query requests in TaskHostTask (parent side) - Add integration tests for IsRunningMultipleNodes callback This enables tasks running in TaskHost to correctly query whether the build is running with multiple nodes, which is needed for multithreaded mode (-mt) support. Partial fix for dotnet#12863
MSB5025 was already used in Strings.shared.resx for solution filter JSON errors. Changed TaskHostCallbackConnectionLost to use unused MSB5027.
Tests that IBuildEngine2.IsRunningMultipleNodes callback works correctly when an unmarked task is automatically ejected to TaskHost in multithreaded mode. This is the key end-to-end test for Stage 1 callback infrastructure.
- Created TaskHostCallback_Tests.cs with: - Packet serialization tests (minimized from 8 to 3) - IsRunningMultipleNodes integration tests (both explicit TaskHostFactory and MT auto-eject) - Removed TaskHostQueryPacket_Tests.cs (redundant property tests) - Removed duplicate tests from TaskHostFactory_Tests.cs and TaskRouter_IntegrationTests.cs
Remove #nullable disable from: - TaskHostQueryRequest.cs - TaskHostQueryResponse.cs - IsRunningMultipleNodesTask.cs - TaskHostCallback_Tests.cs
38160e8 to
8bf2412
Compare
The sample duplicates IsRunningMultipleNodesTask which is already in the test project, and proper integration tests exist in TaskHostCallback_Tests.cs.
These files are not included in MSBuildTaskHost.csproj, so the CLR2 guards are not needed.
This error can only occur in TaskHost when the connection to parent is lost, meaning there's no way to log it anyway. Use a simple hardcoded string instead of a localized resource.
Throw InternalError if HandleCallbackResponse receives a packet that doesn't implement ITaskHostCallbackPacket (programming error). Unknown request ID is still silently ignored since it's expected when a request was cancelled before the response arrived.
Unknown query type is a programming bug, not a runtime condition. Fail fast to make bugs immediately visible rather than returning a silent false that would cause subtle, hard-to-debug behavior.
…edEvent from wait In regular (non-TaskHost) mode, IBuildEngine callbacks are direct method calls that always complete - cancel never interrupts a callback mid-flight. Instead, cancellation causes the work behind the callback (e.g. child build in BuildProjectFile) to fail fast, and the callback returns normally. Align TaskHost to match: the parent continues processing callback requests even after sending TaskHostTaskCancelled, so the response always arrives. Cooperative cancellation via ICancelableTask.Cancel() handles the rest. Only InvalidOperationException on connection loss remains (parent process killed), detected via 1000ms LinkStatus polling.
There was a problem hiding this comment.
This looks good to me overall. My only concern is that this PR extends the same IBuildEngine callback capability not only to the sidecar task host, but also to the .NET task host. Unlike the sidecar task host, however, we cannot assume that the same MSBuild version is used - it is able to communicate cross-version.
This leads to four possible scenarios:
- Old MSBuild node, old .NET task host
- New MSBuild node, old .NET task host
- Old MSBuild node, new .NET task host
- New MSBuild node, new .NET task host
Cases 1 and 4 are fine.
In case 2, the old .NET task host would never attempt to initiate a callback request, so no callbacks would occur.
In case 3, however, the new .NET task host may attempt to initiate the callback request, which would not be understood by the old MSBuild node. We should prevent this scenario. The task host should check the negotiated communication version
msbuild/src/Shared/CommunicationsUtilities.cs
Line 179 in 5a78f0f
Additionally, we should ensure that all packet changes introduced by this PR and by future PRs are shipped together under a single new communication version. We do not want to support intermediate versions of the new task host communication packets.
At the same time, we should still be able to develop these changes incrementally. I suggest make the IBuildEngine callbacks disabled by default in the task host until the communication packets are finalized.
Address AR-May's cross-version compatibility concern: a new .NET TaskHost connecting to an old MSBuild node would send callback packets the old node doesn't understand. - Add EnableTaskHostCallbacks to Traits.cs (MSBUILDENABLETASKHOSTCALLBACKS env var) - Store parentPacketVersion in OutOfProcTaskHostNode, add CallbacksSupported check - Callbacks enabled when: negotiated version >= 3 OR env var is set - PacketVersion stays at 2 until all callback stages are complete - IsRunningMultipleNodes returns false (safe default) when callbacks not supported - Add test verifying graceful fallback when callbacks disabled - Existing tests set MSBUILDENABLETASKHOSTCALLBACKS=1 to enable callbacks - Restore accidentally-removed StringArrayWithNullsDoesNotCrashTaskHost test
Tests the cross-runtime scenario where .NET Framework MSBuild.exe spawns a .NET Core TaskHost with Runtime='NET' and TaskHostFactory. Uses bootstrap dotnet to ensure TaskHost loads locally-built MSBuild.dll.
Co-authored-by: Rainer Sigwald <raines@microsoft.com>
Remove ManualResetEvent + ContinueWith bridge and 1s polling loop. Block on tcs.Task.GetAwaiter().GetResult() directly. Connection loss is handled by OnLinkStatusChanged failing all pending TCS immediately. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…sage) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update all comments and documentation in OutOfProcTaskHostNode.cs, TaskHostConfiguration.cs, TaskHostTaskComplete.cs, ITaskHostCallbackPacket.cs, TaskHostQueryRequest.cs, TaskHostQueryResponse.cs, TaskHostCallback_Tests.cs, and taskhost-threading.md to use 'owning worker node' instead of 'parent' to avoid confusion with the many tree/graph structures in MSBuild. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add 'TaskHost Lifecycle' section to taskhost-threading.md covering: - Event loop cycle (idle → task → idle) - State reset vs persistence between tasks - Shutdown vs reuse decision (sidecar vs regular) - No idle timeout behavior Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
HandleShutdown() runs DisposeCacheObjects(Build) on every Run() exit, including BuildCompleteReuse. A fresh cache is created on the next Run(). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per review feedback: the IPC mechanism could support interrupting callbacks on cancellation (unlike in-process direct method calls). Document this as a conscious design choice, not an oversight. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move pure unit tests (no I/O, no BuildManager) for TaskHostQueryRequest and TaskHostQueryResponse round-trip serialization into TaskHostCallbackPacket_Tests. Keep integration tests in TaskHostCallback_Tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add doc comments explaining that IsRunningMultipleNodes returns MaxNodeCount > 1, regardless of how many nodes are actually running. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The test now uses MockLogger to verify the error is logged when callbacks are disabled, without relying on OverallResult (which doesn't reflect logged errors when the task returns true). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Both exist because E2E tests need a separate .NET Core assembly while integration tests use the task compiled into the test DLL. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove CallbackTestTask.cs and link IsRunningMultipleNodesTask.cs into ExampleTask.csproj instead. Set ReferenceOutputAssembly=false on the ProjectReference to avoid CS0436. Remove redundant EmbeddedResource entry for global.json (already covered by TestAssets\**\* glob). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per review: minimize 'real project build' stuff we aren't testing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Surfaces as 'report this as an MSBuild bug' via IsCriticalException path, which is appropriate since unknown query types indicate an internal version mismatch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ReferenceOutputAssembly=false prevented ExampleTask.dll from being copied to the test output directory, breaking TaskHostLifecycle E2E tests that locate ExampleTask.dll relative to the test assembly. Revert to normal ProjectReference and suppress CS0436 instead (the warning is harmless since both types come from the same linked source). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Avoids footgun where setting the env var to '0' would still enable callbacks due to !string.IsNullOrEmpty check. Aligns with the == '1' pattern used by ForceAllTasksOutOfProcToTaskHost, InProcNodeDisabled, ReuseTaskHostNodes and other similar Traits flags. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ltipleNodes (dotnet#13149) ### Context TaskHost (OutOfProcTaskHostNode) doesn't support IBuildEngine callbacks - tasks that call IsRunningMultipleNodes, BuildProjectFile, RequestCores, etc. fail with MSB5022 or NotImplementedException. This blocks multithreaded mode (-mt) where non-thread-safe tasks are forced to run in TaskHost. This PR is Stage 1 of implementing callback support. It establishes the infrastructure and implements IsRunningMultipleNodes as a proof-of-concept for the pattern. Partial fix for dotnet#12863 ### Changes Made #### Packet Infrastructure: - Add packet type enum values (0x20-0x27 range) for TaskHost callbacks - Create ITaskHostCallbackPacket interface with RequestId for request/response correlation - Add TaskHostIsRunningMultipleNodesRequest/Response packets #### OutOfProcTaskHostNode (TaskHost side): - Add _pendingCallbackRequests dictionary mapping RequestId → TaskCompletionSource - Add SendCallbackRequestAndWaitForResponse() - sends request, blocks task thread until response - Add HandleCallbackResponse() - routes incoming responses to waiting TCS by RequestId - Implement IsRunningMultipleNodes to forward query to owning worker node instead of logging error #### TaskHostTask (Worker node side): - Register handler for TaskHostIsRunningMultipleNodesRequest packets - Forward IsRunningMultipleNodes query to real IBuildEngine and return result ### Cross-Version Compatibility New TaskHost (with callback support) may be launched by an old worker node MSBuild (without callback support). In that scenario, sending callback packets to the old worker node would cause a crash because the old worker node doesn't know how to deserialize them. **Solution: Version-gated callbacks with `Traits.cs` escape hatch** - `PacketVersion` stays at **2** throughout the staged PRs. It will be bumped to **3** only when all callback stages are complete and the feature is ready to ship. - The TaskHost stores the worker node's `PacketVersion` (received during handshake in `Run()`). - A `CallbacksSupported` property gates all callback usage: ```csharp CallbacksSupported = _parentPacketVersion >= CallbacksMinPacketVersion // 3 || Traits.Instance.EnableTaskHostCallbacks; ``` - When `CallbacksSupported` is **false**, callbacks return safe defaults (e.g. `IsRunningMultipleNodes → false`) — matching the pre-callback behavior. No packets are sent. - **For development and testing**: Set `MSBUILDENABLETASKHOSTCALLBACKS=1` to bypass the version check. This uses the existing `Traits.cs` pattern (like `ForceAllTasksOutOfProcToTaskHost`, `EnableRarNode`) which auto-refreshes per test via `BuildEnvironmentState.s_runningTests`. - Integration tests set the env var to enable callbacks. A separate test verifies that callbacks are correctly disabled (returns safe defaults) when the env var is not set. This ensures: 1. **Old worker node + new TaskHost** → callbacks disabled, safe defaults, no crash 2. **New worker node + new TaskHost (dev/test)** → env var enables callbacks before version bump 3. **Final ship** → `PacketVersion` bumped to 3, env var no longer needed, callbacks always enabled ### Testing - Added TaskHostCallbackPacket_Tests.cs - packet serialization round-trip - Added IsRunningMultipleNodesTask.cs + integration test in TaskHostCallback_Tests.cs - Added E2E cross-runtime test in NetTaskHost_E2E_Tests.cs - Added test verifying callbacks disabled when env var not set (logs MSB5022 error) - All callback and lifecycle tests pass ### Notes #### Thread Model: [taskhost-threading.md](documentation/specs/multithreading/taskhost-threading.md) for detailed threading documentation #### Error Handling: - Connection loss throws InvalidOperationException on pending callbacks - New error string MSB5027: TaskHostCallbackConnectionLost #### Next Stages: - Stage 2: RequestCores/ReleaseCores - Stage 3: BuildProjectFile/BuildProjectFilesInParallel - Stage 4: Yield/Reacquire --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Context
TaskHost (OutOfProcTaskHostNode) doesn't support IBuildEngine callbacks - tasks that call
IsRunningMultipleNodes, BuildProjectFile, RequestCores, etc. fail with MSB5022 or
NotImplementedException. This blocks multithreaded mode (-mt) where non-thread-safe tasks
are forced to run in TaskHost.
This PR is Stage 1 of implementing callback support. It establishes the infrastructure and
implements IsRunningMultipleNodes as a proof-of-concept for the pattern.
Partial fix for #12863
Changes Made
Packet Infrastructure:
OutOfProcTaskHostNode (TaskHost side):
TaskHostTask (Worker node side):
Cross-Version Compatibility
New TaskHost (with callback support) may be launched by an old worker node MSBuild (without callback support). In that scenario, sending callback packets to the old worker node would cause a crash because the old worker node doesn't know how to deserialize them.
Solution: Version-gated callbacks with
Traits.csescape hatchPacketVersionstays at 2 throughout the staged PRs. It will be bumped to 3 only when all callback stages are complete and the feature is ready to ship.PacketVersion(received during handshake inRun()).CallbacksSupportedproperty gates all callback usage:CallbacksSupportedis false, callbacks return safe defaults (e.g.IsRunningMultipleNodes → false) — matching the pre-callback behavior. No packets are sent.MSBUILDENABLETASKHOSTCALLBACKS=1to bypass the version check. This uses the existingTraits.cspattern (likeForceAllTasksOutOfProcToTaskHost,EnableRarNode) which auto-refreshes per test viaBuildEnvironmentState.s_runningTests.This ensures:
PacketVersionbumped to 3, env var no longer needed, callbacks always enabledTesting
Notes
Callback Flow:
sequenceDiagram participant Task as Task Thread<br/>(TaskHost) participant Main as Main Thread<br/>(TaskHost) participant Worker as TaskHostTask<br/>(Worker Node) Task->>Task: BuildEngine.IsRunningMultipleNodes Task->>Task: Create TaskHostIsRunningMultipleNodesRequest(requestId) Task->>Task: Store TCS in _pendingCallbackRequests[requestId] Task->>Worker: SendData(request) Task-->>Task: Block on TCS.Task Worker->>Worker: HandleIsRunningMultipleNodesRequest(request) Worker->>Worker: result = _buildEngine.IsRunningMultipleNodes Worker->>Main: SendData(TaskHostIsRunningMultipleNodesResponse) Main->>Main: HandleCallbackResponse(response) Main->>Main: _pendingCallbackRequests[requestId].SetResult(response) Task-->>Task: TCS unblocks Task->>Task: return response.IsRunningMultipleNodesThread Model:
Cancellation Semantics:
ICancelableTask.Cancel(), which runs on the task thread after the current callback returnsError Handling:
Next Stages: