.NET: Foundry.Hosting IT - eliminate MSBuild parallel-output races#5725
.NET: Foundry.Hosting IT - eliminate MSBuild parallel-output races#5725rogerbarreto wants to merge 2 commits intomainfrom
Conversation
Two surgical changes inside the dotnet-foundry-hosted-it job: 1. Replace dotnet build <slnx> -f net10.0 with dotnet build <test.csproj>. The test csproj pins TargetFrameworks=net10.0 and its ProjectReference closure gives MSBuild a single-rooted graph, eliminating the duplicate inner-builds that race on bin/obj. Drops the two New-FilteredSolution.ps1 steps. 2. In it-build-image.ps1, drop the -UsePrebuiltProjectReferences switch and always pass --no-dependencies to dotnet publish. Publish now resolves TestContainer's framework refs by reading prebuilt DLLs and never re-touches them. Replaces the partial-mitigation in PR #5689 with a structural fix. Local validation confirmed published Foundry.dll has identical mtime and bytes as the prebuild output.
There was a problem hiding this comment.
Pull request overview
This PR updates the .NET Foundry.Hosting hosted-agent integration-test (IT) CI job to eliminate recurring Linux MSBuild output races by switching to a single-rooted prebuild graph and ensuring the image publish step never rebuilds ProjectReferences.
Changes:
- Build the
Foundry.Hosting.IntegrationTests.csprojdirectly in CI (instead of generating filtered.slnxfiles with a global-foverride) to prevent parallel inner-build collisions. - Update
it-build-image.ps1to always rundotnet publish --no-dependenciesand add a preflight check for required prebuilt DLL outputs; remove the prior-UsePrebuiltProjectReferencesswitch. - Update the workflow to run the hosted-agent tests via the test project directly, matching the new prebuild approach.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
dotnet/tests/Foundry.Hosting.IntegrationTests/scripts/it-build-image.ps1 |
Always publishes without rebuilding ProjectReferences and adds prebuild-output validation. |
.github/workflows/dotnet-build-and-test.yml |
Simplifies the hosted IT job to build/test the pinned net10.0 csproj directly and removes filtered-solution generation. |
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 94%
✓ Correctness
The PR correctly eliminates two MSBuild parallel-output races by (1) building the test csproj directly instead of a filtered slnx with a global TFM override, and (2) always passing --no-dependencies to dotnet publish instead of the opt-in -UsePrebuiltProjectReferences switch. All references to the removed targetFramework env var are updated consistently. The test project confirms it pins TargetFrameworks=net10.0, the TestContainer project is correctly excluded from the prebuild graph (justifying --no-dependencies over --no-build), and the new prebuild DLL probes provide a clear error for local dev when the prerequisite build hasn't run. No correctness issues found.
✓ Security Reliability
This PR cleanly addresses two MSBuild race conditions by building the test csproj directly (eliminating parallel inner-build races) and using
--no-dependenciesfor publish (eliminating file-handle races with VBCSCompiler). The preflight probe correctly checks for the two leaf dependency DLLs, and transitive deps would cause publish to fail with a clear error if missing. No injection risks, no secrets exposed, no resource leaks. The change from opt-inUsePrebuiltProjectReferencesto always-on--no-dependenciesis a valid trade-off: CI always prebuilds so correctness is guaranteed, and local dev gets a clear error message if prebuilt DLLs are absent. The workflow changes (removing filtered solution steps, building csproj directly, using--projectfordotnet test) are consistent and correct.
✓ Test Coverage
This PR is a CI/build infrastructure change that eliminates MSBuild parallel-output races by building a csproj directly instead of a filtered slnx and using --no-dependencies for publish. No application code is changed. The existing integration tests (C# test files) are untouched and continue to run via the same
dotnet test --no-buildstep. The PowerShell script's new preflight probe for prebuilt DLLs is a fail-fast guard validated by the CI pipeline itself. There are no existing Pester tests forit-build-image.ps1, so this PR neither introduces nor widens a testing gap. The test invocation correctly drops the-f net10.0flag since the csproj pins<TargetFrameworks>net10.0</TargetFrameworks>(line 9 of the csproj), making it redundant. No test coverage concerns.
✗ Design Approach
I found one design-level regression in the new
it-build-image.ps1flow: it now always consumes prebuilt project outputs, but the new preflight only proves that two DLLs exist, not that the full hashed ProjectReference closure was rebuilt. That can silently publish stale framework bits under a fresh image tag in local/dev flows.
Flagged Issues
- The new always-
--no-dependenciespath can silently publish staleMicrosoft.Agents.AI.Abstractions/Microsoft.Agents.AI.Workflowsbits. The image tag hashes those source trees (it-build-image.ps1:82-88), andFoundry.Hostingdirectly references them (Microsoft.Agents.AI.Foundry.Hosting.csproj:41-43), but the preflight guard only checks forFoundry.dllandFoundry.Hosting.dll. Either validate the full ProjectReference closure/freshness or keep the old explicit opt-in outside CI.
Automated review by rogerbarreto's agents
Problem
Across the last 10
merge_groupruns ofdotnet-build-and-test, 9 failures came fromdotnet-foundry-hosted-it. PR #5689 patched one symptom (MSB3026at publish time) but the underlying races kept surfacing in many forms:MSB3026,MSB3491file already exists,MSB4018GenerateDepsFile,MSB3883onref/*.dll. All on Linux, all on multi-targeted dependencies (Abstractions,AI,Workflows,Workflows.Declarative,Foundry.Hosting).Two distinct races, both inside the IT job runner:
dotnet buildinvocation.dotnet build dotnet/filtered-foundry-hosted.slnx -f net10.0forces a global TFM override on a multi-rooted slnx. Multi-targeted projects get scheduled twice from different parent nodes and write the samebin/Release/net10.0/files concurrently.dotnet publishvs the precedingdotnet build. Publish propagates-r linux-musl-x64to ProjectReferences and re-touches the samesrc/<lib>/bin/Release/net10.0/*.dllpaths while VBCSCompiler from the prebuild still holds file handles. PR .NET: Foundry.Hosting IT - avoid MSB3026 in publish step #5689's-UsePrebuiltProjectReferenceswas a partial mitigation gated behind an opt-in switch.Fix
Two surgical changes inside the existing
dotnet-foundry-hosted-itjob. No new workflow file. The job stays indotnet-build-and-test-check.needsand continues to gate the merge queue.1) Prebuild: build the test csproj directly (closes Race A).
The test project
Foundry.Hosting.IntegrationTests.csprojpins<TargetFrameworks>net10.0</TargetFrameworks>. Building the csproj instead of a filtered slnx gives MSBuild a single-rooted graph; each multi-targeted dependency is invoked exactly once fornet10.0. The twoNew-FilteredSolution.ps1steps are deleted.dotnet build dotnet/tests/Foundry.Hosting.IntegrationTests/Foundry.Hosting.IntegrationTests.csproj -c Release --warnaserror2) Publish: always pass
--no-dependencies, drop the-UsePrebuiltProjectReferencesswitch (closes Race B).Publish builds only the TestContainer and resolves its framework references by reading the prebuilt DLLs. It never re-touches
src/<lib>/bin/Release/net10.0/*.dll, so the file-handle race is structurally impossible. Replaces the partial-mitigation switch from #5689 with a standard CLI flag and removes ~80 lines of preflight machinery.A small new preflight in
it-build-image.ps1errors with a clearrun dotnet build firstmessage when the prebuilt outputs are missing (for local dev, since CI always prebuilds).Verification
Test branch
bugfix/fix-hosted-agents-it-concurrency-errordispatched twice:Local validation also confirmed the published
Microsoft.Agents.AI.Foundry.dllis byte-identical and same mtime as the prebuild DLL, proving publish reuses the prebuilt output.Why
--no-dependencies, not--no-buildTestContainer is not in the prebuild graph (the prebuild builds
IntegrationTests.csproj, which referencesFoundry,Foundry.Hosting,AgentConformance.IntegrationTests). TestContainer is a separate project that publish must build itself, with-r linux-musl-x64.--no-buildwould refuse because TestContainer's outputs do not exist;--no-dependenciesbuilds TestContainer but skips its ProjectReferences, which is exactly the right behavior.