Skip to content

fix(postmerge #5379): qemu-full-install-test 4 Copilot P1 fixes — iter-5.1 marker (was pre-install) + artifact upload + trigger paths + early-exit race#5382

Merged
AceHack merged 1 commit into
mainfrom
fix-postmerge-5379-qemu-full-install-test-iter-51-marker-artifact-upload-trigger-path-early-exit-race-otto-cli-2026-05-26
May 27, 2026
Merged

fix(postmerge #5379): qemu-full-install-test 4 Copilot P1 fixes — iter-5.1 marker (was pre-install) + artifact upload + trigger paths + early-exit race#5382
AceHack merged 1 commit into
mainfrom
fix-postmerge-5379-qemu-full-install-test-iter-51-marker-artifact-upload-trigger-path-early-exit-race-otto-cli-2026-05-26

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 27, 2026

What

4 substantive Copilot findings on freshly-merged PR #5379. All P1; all legit substrate-engineering corrections.

Fixes

1. SUCCESS_MARKER iter-5.3 → iter-5.1

Copilot was right: `[iter-5.3]` (zeta-install.sh:372) fires BEFORE the actual `sudo nixos-install` invocation (line 980). Test could pass without proving install completed.

`[iter-5.1]` (Step 6.7 wifi persistence at line 527) correctly comes AFTER:

  • nixos-install
  • iter-4.2 SSH pubkey
  • iter-5.3 password (skipped gracefully in CI on empty stdin)
  • iter-5.2 hostname injection

AND BEFORE:

  • iter-5.4.0 gh auth prompt (which would hang in CI without B-0833 mock device-code endpoint)

2. Workspace-relative serial log path

`SERIAL_LOG_OUT_PATH` env var lets workflow point log to `${{ github.workspace }}/qemu-full-install-serial.log` so it survives test step.

3. Artifact upload step

`actions/upload-artifact@v4.6.2` (SHA-pinned) with `if: always()` so log survives even when test fails. 7-day retention.

4. Workflow trigger paths

Added missing trigger paths so PRs changing ONLY these helpers actually run the workflow:

  • `tools/ci/qemu-boot-test.ts`
  • `tools/ci/qemu-full-install-test.ts`
  • `tools/ci/test-iter-54-install-flow.test.ts`

5. QEMU early-exit race (5th fix; same scope)

`Promise.race` between marker-wait and QEMU child-exit. If QEMU exits early (bad args / KVM failure / disk error), test fails immediately instead of waiting full 30min timeout.

Security

`github.workspace` in `env:` block — GitHub-controlled value (expands to workspace path); no attacker-controllable interpolation in `run:` commands.

🤖 Generated with Claude Code

…SUCCESS_MARKER iter-5.3 → iter-5.1 (was BEFORE nixos-install) + workspace-relative serial log + workflow trigger paths + QEMU early-exit race

4 substantive Copilot findings on freshly-merged PR #5379 (B-0831
Slice 1 starter). All P1; all legit substrate-engineering corrections:

1. SUCCESS_MARKER changed from '[iter-5.3]' to '[iter-5.1]'
   Copilot was right: '[iter-5.3]' (Step 6.55 at zeta-install.sh:372)
   fires BEFORE the actual 'sudo nixos-install' invocation (line 980).
   So the test could pass without proving the install actually
   completed. '[iter-5.1]' (Step 6.7 wifi persistence at line 527)
   correctly fires AFTER nixos-install + iter-4.2 + iter-5.3 (which
   gracefully skips in CI when stdin returns empty) + iter-5.2
   hostname injection, AND BEFORE iter-5.4.0 gh auth prompt (which
   would hang in CI without B-0833 mock device-code).

2. Workspace-relative serial log path via SERIAL_LOG_OUT_PATH env
   var. Test reads process.env.SERIAL_LOG_OUT_PATH (defaults to
   tmpdir path for local testing). CI workflow sets it to
   ${{ github.workspace }}/qemu-full-install-serial.log so the log
   survives test step + can be uploaded as artifact.

3. Added actions/upload-artifact step (v4.6.2 SHA-pinned) with
   if: always() so log survives even when test fails (the most
   interesting case for debug). 7-day retention.

4. Added missing workflow trigger paths:
   - tools/ci/qemu-boot-test.ts (cascade #5)
   - tools/ci/qemu-full-install-test.ts (cascade #6 Slice 1)
   - tools/ci/test-iter-54-install-flow.test.ts (Layer 2a structural)
   Without these, a PR changing ONLY these helpers wouldn't trigger
   the workflow (the test wouldn't run on its own changes).

5. QEMU early-exit race via Promise.race between marker-wait and
   child-exit. Per Copilot finding: 'If QEMU exits early (bad args,
   KVM/device failure, disk attach error), the script still waits
   the full 30-minute marker timeout because the polling loop never
   observes qemuExited.' Now races; whichever resolves first wins;
   both check the serial log + report consistently.

Security: github.workspace in env: block (GitHub-controlled value;
expands to the workspace path); no attacker-controllable
interpolation in run: commands.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Copilot AI review requested due to automatic review settings May 27, 2026 02:13
@AceHack AceHack enabled auto-merge (squash) May 27, 2026 02:13
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Addresses 5 P1 Copilot findings from freshly-merged PR #5379 for the QEMU full-install CI test: corrects the success marker to one that actually fires after nixos-install, ensures serial logs survive for artifact upload, expands workflow trigger paths, and adds a QEMU early-exit race to avoid the 30-min timeout when QEMU dies before the marker can appear.

Changes:

  • Switch SUCCESS_MARKER from [iter-5.3] (pre-install) to [iter-5.1] (post-install, before gh auth prompt that would hang in CI).
  • Make serial log path env-overridable (SERIAL_LOG_OUT_PATH) so the workflow can point it inside the workspace, and add an actions/upload-artifact@v4.6.2 step (SHA-pinned, if: always()) plus extra trigger paths for the qemu helpers.
  • Race the marker-wait against QEMU's exit event so early QEMU termination produces an immediate, descriptive failure instead of timing out.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
tools/ci/qemu-full-install-test.ts Updates success marker comment+value, env-overrides serial log path, and adds Promise.race against QEMU early exit.
.github/workflows/build-ai-cluster-iso.yml Adds qemu helper trigger paths to both pull_request and push filters, sets SERIAL_LOG_OUT_PATH env, and adds artifact-upload step.

@AceHack AceHack merged commit 5514ea9 into main May 27, 2026
35 checks passed
@AceHack AceHack deleted the fix-postmerge-5379-qemu-full-install-test-iter-51-marker-artifact-upload-trigger-path-early-exit-race-otto-cli-2026-05-26 branch May 27, 2026 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants