Skip to content

Include effective temp directory in node-reuse handshake salt (fixes #13594)#13651

Closed
JanProvaznik wants to merge 1 commit into
dotnet:mainfrom
JanProvaznik:fix/13594-tempdir-handshake-salt
Closed

Include effective temp directory in node-reuse handshake salt (fixes #13594)#13651
JanProvaznik wants to merge 1 commit into
dotnet:mainfrom
JanProvaznik:fix/13594-tempdir-handshake-salt

Conversation

@JanProvaznik
Copy link
Copy Markdown
Member

@JanProvaznik JanProvaznik commented Apr 29, 2026

Fixes #13594

Two MSBuild invocations launched with different TMP/TEMP/TMPDIR environments could bind to each other''s reusable worker nodes because the handshake salt was derived only from MSBUILDNODEHANDSHAKESALT and the tools-directory path. When build A finished and cleaned its per-build temp folder, the still-running build B (now driving a node that thought A''s temp was its own) failed with warnings such as MSB5018 or fatal "system cannot find the path specified" errors pointing at A''s deleted folder.

Fix

Mix Path.GetTempPath() into the salt for non-TaskHost handshakes in src/Framework/BackEnd/Handshake.cs. Both parent and child read the same env at handshake-construction time (the launcher copies the parent env wholesale and only ever sets DOTNET_ROOT* overrides — never temp vars), so matching environments still produce matching salts (legitimate worker-node reuse keeps working) and distinct temp environments now produce distinct salts and refuse to bind. ServerNodeHandshake derives from Handshake, so MSBuild Server pipe names and the per-server mutex are isolated per temp env as well.

The exclusion is implemented as a single IsHandshakeOptionEnabled(nodeType, HandshakeOptions.TaskHost) check. Worker nodes (HandshakeOptions.NodeReuse without TaskHost) and ServerNodeHandshake (HandshakeOptions.None) keep the new salt input. TaskHost paths skip it.

Why TaskHost paths are exempted (and why that''s a non-issue in practice)

Cross-build TaskHost reuse is not on the default path:

  • MSBUILDREUSETASKHOSTNODES=1 is required. MSBuildTaskHost/OutOfProcTaskHostNode.cs:509-512 and the parallel MSBuild/OutOfProcTaskHostNode.cs:1275:
    // TaskHostNodes lock assemblies with custom tasks produced by build scripts if NodeReuse is on.
    // This causes failures if the user builds twice.
    _shutdownReason = buildComplete.PrepareForReuse && Traits.Instance.EscapeHatches.ReuseTaskHostNodes
        ? NodeEngineShutdownReason.BuildCompleteReuse
        : NodeEngineShutdownReason.BuildComplete;
    And Framework/Traits.cs:334: ReuseTaskHostNodes = Environment.GetEnvironmentVariable("MSBUILDREUSETASKHOSTNODES") == "1". Default is off. Even though the parent advertises PrepareForReuse=true, every TaskHost shuts down at end-of-build unless the user has flipped this rarely-used escape hatch.
  • Crashed parents kill sidecars. MSBuildTaskHost/OutOfProcTaskHostNode.cs:570-578:
    case LinkStatus.ConnectionFailed:
    case LinkStatus.Failed:
        _shutdownReason = NodeEngineShutdownReason.ConnectionFailed;
        _shutdownEvent.Set();
    If the parent dies abruptly the named-pipe link breaks, the sidecar TaskHost detects it and shuts itself down. So sidecars don''t outlive their parent under any normal failure mode.

For #13594 to bite on the TaskHost path, all of the following must hold simultaneously: MSBUILDREUSETASKHOSTNODES=1 set, Runtime="NET" (or MSBUILDFORCEALLTASKSOUTOFPROC=1) tasks actually used, the first parent exits cleanly, the second parent has a different TMP, and tasks in the second build consume Path.GetTempPath()-derived files. That combination is essentially nonexistent in real workloads, and users who do hit it have always had MSBUILDDISABLENODEREUSE=1 and per-build MSBUILDNODEHANDSHAKESALT as workarounds (both of which the PR leaves in place and continues to honor).

The exemption is therefore not a deferred-problem caveat — it is the right design:

  • The NET TaskHost handshake is the only handshake that intentionally tolerates parent (.NET Framework MSBuild bundled in VS) ↔ child (MSBuild bundled in a separately-released .NET SDK) version skew, via the magic NetTaskHostHandshakeVersion = 99 in the file-version slots. Adding a salt input that isn''t synchronized across both release trains would introduce a hard handshake mismatch on any VS+SDK pairing where one side has the change and the other doesn''t. Since the bug doesn''t manifest there in practice, the salt input is correctly omitted.
  • CLR2 TaskHost has NodeReuse disabled by design (NodeProviderOutOfProcTaskHost.IsNodeReuseEnabled, line 762: && !Handshake.IsHandshakeOptionEnabled(hostContext, HandshakeOptions.CLR2)), so each CLR2 host is single-shot and exits at end of build. No bug surface.

Before — handshake salt does not depend on temp dir

sequenceDiagram
    autonumber
    participant ShellA as Shell A<br/>TMP=C:\tmpA
    participant BuildA as Build A (parent)
    participant Node as Reusable worker node
    participant BuildB as Build B (parent)
    participant ShellB as Shell B<br/>TMP=C:\tmpB

    ShellA->>BuildA: msbuild ...
    BuildA->>BuildA: salt = hash(saltVar + toolsDir)
    BuildA->>Node: spawn (inherits TMP=C:\tmpA)
    Node->>Node: opens C:\tmpA\MSBuildTemp{guid}
    Note over BuildA,Node: salt matches → bound

    ShellB->>BuildB: msbuild ...
    BuildB->>BuildB: salt = hash(saltVar + toolsDir)
    Note over BuildA,BuildB: ⚠ same salt despite different TMP
    BuildB->>Node: discover & reuse (handshake passes)
    Note over BuildB,Node: Node is still rooted under C:\tmpA

    BuildA-->>ShellA: done
    ShellA->>ShellA: rm -rf C:\tmpA  (per-build cleanup)
    Node->>Node: write to C:\tmpA\... 💥
    Node-->>BuildB: MSB5018 / "path not found"
    BuildB-->>ShellB: ❌ build fails
Loading

After — temp dir is part of the salt for worker nodes

sequenceDiagram
    autonumber
    participant ShellA as Shell A<br/>TMP=C:\tmpA
    participant BuildA as Build A (parent)
    participant NodeA as Worker node A
    participant NodeB as Worker node B
    participant BuildB as Build B (parent)
    participant ShellB as Shell B<br/>TMP=C:\tmpB

    ShellA->>BuildA: msbuild ...
    BuildA->>BuildA: salt = hash(saltVar + toolsDir + "C:\tmpA\")
    BuildA->>NodeA: spawn (inherits TMP=C:\tmpA)
    NodeA->>NodeA: salt = hash(... + "C:\tmpA\")  ✅ matches BuildA

    ShellB->>BuildB: msbuild ...
    BuildB->>BuildB: salt = hash(saltVar + toolsDir + "C:\tmpB\")
    BuildB->>NodeA: probe (handshake)
    NodeA-->>BuildB: salt mismatch → reject
    BuildB->>NodeB: spawn fresh (inherits TMP=C:\tmpB)
    NodeB->>NodeB: salt = hash(... + "C:\tmpB\")  ✅ matches BuildB

    BuildA-->>ShellA: done
    ShellA->>ShellA: rm -rf C:\tmpA
    NodeA-->>NodeA: idles (no other build can bind)
    NodeB->>NodeB: keeps writing under C:\tmpB ✅
    BuildB-->>ShellB: ✔ build succeeds
Loading

Salt composition

flowchart LR
    subgraph BEFORE [Before]
        A1[MSBUILDNODEHANDSHAKESALT env]
        A2[toolsDirectory]
        A1 --> AC{{concat}}
        A2 --> AC
        AC --> AH[GetHashCode → int salt]
    end

    subgraph AFTER [After - non-TaskHost only]
        B1[MSBUILDNODEHANDSHAKESALT env]
        B2[toolsDirectory]
        B3[Path.GetTempPath]
        B1 --> BC{{concat}}
        B2 --> BC
        B3 --> BC
        BC --> BH[GetHashCode → int salt]
    end

    subgraph AFTER_TH [After - TaskHost paths unchanged]
        C1[MSBUILDNODEHANDSHAKESALT env]
        C2[toolsDirectory]
        C1 --> CC{{concat}}
        C2 --> CC
        CC --> CH[GetHashCode → int salt]
    end
Loading

Why this approach

  • Symmetric by construction. Parent and child both compute Path.GetTempPath() from the same inherited environment at handshake time.
  • Reuse is preserved. Identical envs ⇒ identical salts ⇒ reuse still works. Only genuinely-distinct temp envs are isolated.
  • Smallest possible change. ~10 lines of production code in a single file, gated by one IsHandshakeOptionEnabled(..., TaskHost) check.
  • No cross-release breakage. Worker nodes and the MSBuild Server are shipped in the same drop as the parent that talks to them, so they always agree on the formula. TaskHost paths — the only ones with parent/child release skew — are explicitly opted out.
  • No new env var, flag, or ChangeWave. Worker reuse becomes strictly more conservative; no new warnings or errors are emitted.
  • Self-invalidating on upgrade. Pre-upgrade worker nodes have a different salt than upgraded parents and are naturally ignored; they idle out within the existing reuse timeout.

Alternatives considered and rejected: hashing the full environment block (kills all reuse — PWD, _, OLDPWD, CMDCMDLINE etc. churn per shell), moving the per-node temp folder to a fixed system path (ignores users who deliberately set TMP, doesn''t reach ToolTask / user task code that re-resolves Path.GetTempPath()), pinning temp at startup (same gap), runtime detection + respawn (reactive and racy — corruption already happened), full env diff over the wire (much larger handshake, no extra benefit over hashing into the salt), and documenting MSBUILDNODEHANDSHAKESALT as the workaround (the existing workaround — pushes the burden to every caller).

Blast radius

Scope Change
Microsoft.Build worker nodes Salt now varies by temp env. Same-env reuse unchanged; distinct-env reuse correctly refused.
MSBuild Server (MSBuildServer-{hash} pipe + Global\msbuild-server-running-{hash} mutex) Now scoped per temp env. Different TMP values no longer share a server. --shutdown from a different temp env will not see the foreign server, but stale servers idle out via existing timeouts.
NET TaskHost (parent .NET Framework MSBuild ↔ child SDK MSBuild) Unchanged by design. Cross-build NET TaskHost reuse is gated on MSBUILDREUSETASKHOSTNODES=1 (off by default) and on the parent surviving long enough to send a clean shutdown; the bug from #13594 cannot manifest on this path under normal conditions, and excluding the salt input avoids introducing a handshake-protocol skew across the VS+SDK release boundary.
CLR2 TaskHost Unchanged. NodeReuse is disabled there by design, so no bug surface.
MSBuildTaskHost.exe (legacy CLR2 binary) Unchanged — single-shot process, no reuse to protect.

Tests

src/Build.UnitTests/BackEnd/HandshakeTempDir_Tests.cs (new) covers:

Test Purpose
Handshake_DifferentTempDirectory_ProducesDifferentKey Regression for #13594 — failed before the fix, passes after.
Handshake_SameTempDirectory_ProducesSameKey Sanity — legitimate worker reuse not regressed.
ServerNodeHandshake_DifferentTempDirectory_ProducesDifferentHash MSBuild Server pipe name is isolated per temp env.
Handshake_NetTaskHost_DifferentTempDirectory_ProducesSameKey Pins the TaskHost exemption so a future change cannot accidentally re-introduce a NET-TaskHost handshake-protocol skew.

All tests use the standard ITestOutputHelper injection pattern so test diagnostics are captured. Existing tests in the related neighborhood (AppHostSupport_Tests, UnixNodeReuseFixes_Tests, TaskHostNodeKey_Tests) continue to pass.

Copilot AI review requested due to automatic review settings April 29, 2026 13:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes incorrect MSBuild node reuse across concurrent invocations that differ only in their effective temp directory, by incorporating Path.GetTempPath() into the node-reuse handshake salt (and corresponding MSBuild server handshake hash).

Changes:

  • Include Path.GetTempPath() in the handshake salt for framework node reuse (src/Framework/BackEnd/Handshake.cs).
  • Mirror the same temp-dir salt contribution in the legacy CLR2 TaskHost handshake (src/MSBuildTaskHost/CommunicationsUtilities.cs).
  • Add unit tests validating same-temp produces same key and different-temp produces different keys/hashes (src/Build.UnitTests/BackEnd/HandshakeTempDir_Tests.cs).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/MSBuildTaskHost/CommunicationsUtilities.cs Adds effective temp directory to TaskHost handshake salt to prevent cross-temp node reuse.
src/Framework/BackEnd/Handshake.cs Adds effective temp directory to core node handshake salt (affects node reuse and server scoping).
src/Build.UnitTests/BackEnd/HandshakeTempDir_Tests.cs New regression tests for handshake key/hash differences across temp directories.

Comment thread src/Build.UnitTests/BackEnd/HandshakeTempDir_Tests.cs
Comment thread src/Build.UnitTests/BackEnd/HandshakeTempDir_Tests.cs Outdated
@JanProvaznik JanProvaznik force-pushed the fix/13594-tempdir-handshake-salt branch from c9c33ea to 7f27bee Compare April 29, 2026 15:43
Two MSBuild invocations launched with different TMP/TEMP/TMPDIR
environment values were able to bind to each other's reusable worker
nodes because the handshake salt was computed only from
MSBUILDNODEHANDSHAKESALT and the tools-directory path. When build A
finished and cleared its per-build temp folder, a still-running build B
that had inherited the same node would break with warnings such as
MSB5018 or fatal "system cannot find the path specified" errors
pointing at A's deleted folder.

Fixes dotnet#13594

Mix Path.GetTempPath() into the salt for non-TaskHost handshakes in
src/Framework/BackEnd/Handshake.cs. Both parent and child read the same
env at handshake construction time, so matching environments still
produce matching salts (legitimate worker-node reuse continues to
work) while distinct temp environments now produce distinct salts and
refuse to bind. ServerNodeHandshake derives from Handshake, so the
MSBuildServer pipe name and per-server mutex are likewise temp-dir
scoped.

TaskHost paths are intentionally excluded from the temp-dir salt input:
- NET TaskHost (parent .NET Framework MSBuild ↔ child shipped from a
  separately-released .NET SDK) tolerates version skew via the magic
  NetTaskHostHandshakeVersion. Adding a salt input that isn't
  synchronized across both release trains would break VS+SDK
  combinations until both sides picked up the change. The temp-dir fix
  for NET TaskHost is left for a coordinated VS+SDK rollout follow-up.
- CLR2 TaskHost has NodeReuse disabled by design (see
  NodeProviderOutOfProcTaskHost.IsNodeReuseEnabled), so there is no
  cross-build reuse surface to protect.

Add HandshakeTempDir_Tests covering:
- Worker handshake key differs across distinct temp dirs (regression).
- Worker handshake key is stable for identical temp dirs (sanity).
- ServerNodeHandshake.ComputeHash differs across distinct temp dirs.
- NET TaskHost handshake key is unchanged across distinct temp dirs
  (locks in the deferred behavior that preserves VS+SDK compat).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@JanProvaznik JanProvaznik force-pushed the fix/13594-tempdir-handshake-salt branch from 7f27bee to 9972459 Compare April 30, 2026 08:22
@JanProvaznik
Copy link
Copy Markdown
Member Author

@rainersigwald I see 2 options

  1. say this "node reuse when you have changed temp meanwhile" is not supported
    a. just close issue
    b. figure out how to log message this is not supported
  2. take this pr to change handshake of worker nodes to include temp directory

@JanProvaznik
Copy link
Copy Markdown
Member Author

too heavy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Nodes are reused between builds with different temp directories

2 participants