fix(postmerge #5379): qemu-full-install-test 4 Copilot P1 fixes — iter-5.1 marker (was pre-install) + artifact upload + trigger paths + early-exit race#5382
Conversation
…SUCCESS_MARKER iter-5.3 → iter-5.1 (was BEFORE nixos-install) + workspace-relative serial log + workflow trigger paths + QEMU early-exit race 4 substantive Copilot findings on freshly-merged PR #5379 (B-0831 Slice 1 starter). All P1; all legit substrate-engineering corrections: 1. SUCCESS_MARKER changed from '[iter-5.3]' to '[iter-5.1]' Copilot was right: '[iter-5.3]' (Step 6.55 at zeta-install.sh:372) fires BEFORE the actual 'sudo nixos-install' invocation (line 980). So the test could pass without proving the install actually completed. '[iter-5.1]' (Step 6.7 wifi persistence at line 527) correctly fires AFTER nixos-install + iter-4.2 + iter-5.3 (which gracefully skips in CI when stdin returns empty) + iter-5.2 hostname injection, AND BEFORE iter-5.4.0 gh auth prompt (which would hang in CI without B-0833 mock device-code). 2. Workspace-relative serial log path via SERIAL_LOG_OUT_PATH env var. Test reads process.env.SERIAL_LOG_OUT_PATH (defaults to tmpdir path for local testing). CI workflow sets it to ${{ github.workspace }}/qemu-full-install-serial.log so the log survives test step + can be uploaded as artifact. 3. Added actions/upload-artifact step (v4.6.2 SHA-pinned) with if: always() so log survives even when test fails (the most interesting case for debug). 7-day retention. 4. Added missing workflow trigger paths: - tools/ci/qemu-boot-test.ts (cascade #5) - tools/ci/qemu-full-install-test.ts (cascade #6 Slice 1) - tools/ci/test-iter-54-install-flow.test.ts (Layer 2a structural) Without these, a PR changing ONLY these helpers wouldn't trigger the workflow (the test wouldn't run on its own changes). 5. QEMU early-exit race via Promise.race between marker-wait and child-exit. Per Copilot finding: 'If QEMU exits early (bad args, KVM/device failure, disk attach error), the script still waits the full 30-minute marker timeout because the polling loop never observes qemuExited.' Now races; whichever resolves first wins; both check the serial log + report consistently. Security: github.workspace in env: block (GitHub-controlled value; expands to the workspace path); no attacker-controllable interpolation in run: commands. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Pull request overview
Addresses 5 P1 Copilot findings from freshly-merged PR #5379 for the QEMU full-install CI test: corrects the success marker to one that actually fires after nixos-install, ensures serial logs survive for artifact upload, expands workflow trigger paths, and adds a QEMU early-exit race to avoid the 30-min timeout when QEMU dies before the marker can appear.
Changes:
- Switch
SUCCESS_MARKERfrom[iter-5.3](pre-install) to[iter-5.1](post-install, before gh auth prompt that would hang in CI). - Make serial log path env-overridable (
SERIAL_LOG_OUT_PATH) so the workflow can point it inside the workspace, and add anactions/upload-artifact@v4.6.2step (SHA-pinned,if: always()) plus extra trigger paths for the qemu helpers. - Race the marker-wait against QEMU's
exitevent so early QEMU termination produces an immediate, descriptive failure instead of timing out.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| tools/ci/qemu-full-install-test.ts | Updates success marker comment+value, env-overrides serial log path, and adds Promise.race against QEMU early exit. |
| .github/workflows/build-ai-cluster-iso.yml | Adds qemu helper trigger paths to both pull_request and push filters, sets SERIAL_LOG_OUT_PATH env, and adds artifact-upload step. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
4 substantive Copilot findings on freshly-merged PR #5379. All P1; all legit substrate-engineering corrections.
Fixes
1. SUCCESS_MARKER iter-5.3 → iter-5.1
Copilot was right: `[iter-5.3]` (zeta-install.sh:372) fires BEFORE the actual `sudo nixos-install` invocation (line 980). Test could pass without proving install completed.
`[iter-5.1]` (Step 6.7 wifi persistence at line 527) correctly comes AFTER:
AND BEFORE:
2. Workspace-relative serial log path
`SERIAL_LOG_OUT_PATH` env var lets workflow point log to `${{ github.workspace }}/qemu-full-install-serial.log` so it survives test step.
3. Artifact upload step
`actions/upload-artifact@v4.6.2` (SHA-pinned) with `if: always()` so log survives even when test fails. 7-day retention.
4. Workflow trigger paths
Added missing trigger paths so PRs changing ONLY these helpers actually run the workflow:
5. QEMU early-exit race (5th fix; same scope)
`Promise.race` between marker-wait and QEMU child-exit. If QEMU exits early (bad args / KVM failure / disk error), test fails immediately instead of waiting full 30min timeout.
Security
`github.workspace` in `env:` block — GitHub-controlled value (expands to workspace path); no attacker-controllable interpolation in `run:` commands.
🤖 Generated with Claude Code