Conversation
… QEMU boot smoke-test can capture systemd/getty output via serial console (Aaron 2026-05-26) First-cycle QEMU boot smoke-test on main (PR #5322 cascade #5) failed: ISO boots fine (GRUB + kernel + initrd loaded successfully per serial log) but timed out waiting for 'zeta-installer login:' prompt. Root cause: NixOS installer default outputs only to VGA tty1; QEMU's -display none hides VGA; serial console (-serial file:...) gets no kernel/systemd/getty output after bootloader stage. Fix: add boot.kernelParams = [ "console=ttyS0,115200n8" "console=tty1" ] to installer config so systemd/getty mirrors to serial console at standard 115200 8N1. tty1 stays primary (keyboard-attached install flow); ttyS0 is secondary for QEMU capture + real hardware with serial headers (some Beelinks; most server-class boards; debugging scenarios). Substrate-honest framing: the QEMU test isn't broken; it correctly caught that serial console wasn't configured. The ISO itself isn't broken (boots cleanly). The MISSING config (serial console) was a real gap that the new cascade #5 surfaced on first cycle — which is exactly what the test was designed to do per its commit message: 'catches the bug class where the ISO builds + audits pass but the kernel/initrd combination fails to actually boot' (or in this case, fails to surface boot output through the channels tests can observe). Composes with: PR #5322 (the QEMU boot smoke-test workflow); B-0754 iter-3 firmware substrate (similar UX-cleanliness motivation: surface less mysterious behavior); the canonical zflash + zeta-install flow (no behavioral change for the keyboard-attached install flow since tty1 stays primary). Authored from fresh independent clone per B-0828 multi-AI shared-checkout convention.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Pull request overview
Enables serial console output for the NixOS installer ISO so the CI QEMU boot smoke-test (run with -display none) can observe the installer reaching the zeta-installer login: prompt via captured serial output, while preserving the VGA/keyboard-attached install flow.
Changes:
- Add
boot.kernelParamsentries to enablettyS0(115200 8N1) console output in addition totty1. - Document the rationale and expected behavior (tty1 primary, ttyS0 mirrored) inline in the installer configuration.
This was referenced May 26, 2026
AceHack
added a commit
that referenced
this pull request
May 26, 2026
…turn terminology distinction + split 3-PR-cleanup + follow-up-fix-PR correctly (#5329) Both Copilot findings verified + addressed: (1) multi-turn (overall conversation length) vs zero-turn (pathogen-decryption-protocol cost) are distinct scopes; clarified terminology in title + table-intro + empirical-generalization paragraph so readers don't read the table's Zero-turn entries as contradicting the multi-turn claim. (2) USB cleanup arc had 3-PR cleanup sequence (#5311 + #5320 + #5322) + follow-up fix (#5324) — split for narrative consistency. No semantic change; clarification only. Co-authored-by: Lior <lior@zeta.dev>
AceHack
added a commit
that referenced
this pull request
May 26, 2026
…n (eliminate routine human physical USB test) (#5343) * docs(backlog): B-0831 — CI cascade #6 full-install + cluster-auto-join (eliminate routine human physical USB test) Per Aaron 2026-05-26: "zflash is the thing plus cluster auto joining after boot from iso use we want that in ci not needing human to test everytime." Files B-0831 as P1 substrate-engineering target. 3-slice decomposition: - Slice 1: full-install-in-QEMU (boot installer ISO + auto-fire first-boot service + greedy N-disk install + reboot + verify login banner with auto-generated hostname). Extends cascade #5 (qemu-boot-test.ts) to validate the full install flow not just the live-ISO boot. - Slice 2: cluster-auto-join verification via mock cluster control-plane (capture + verify B-0812 iter-5.4.1 self-registration payload shape). - Slice 3: ArgoCD reconciliation verification (most coupled to live cluster state; deferrable to push-to-main only for latency reasons). Acceptance criteria scoped per slice; each ships independently. Overall acceptance: human physical-USB-test is no longer the routine gate for substrate landings; physical test reserved for real-hardware quirks + periodic sanity-checks. Composes_with cross-refs: - tools/ci/qemu-boot-test.ts (cascade #5) - build-ai-cluster-iso.yml workflow - zeta-install.sh + zeta-first-boot.sh + configuration.nix - B-0812 (node self-registration substrate) - B-0813 (ArgoCD reconciliation substrate) - B-0814 (deregister sibling) - B-0816 (architectural principle: maximize ArgoCD + minimize NixOS- native lock-in) - B-0754 (zero-typing first-boot scope; substrate exercised in Slice 1) - B-0818 (isoName nixpkgs 25.11 regression; orthogonal) - flash-cluster-iso SKILL.md (operator-side 0-human-typing analog of this CI-side cascade) - 2026-05-26 USB test gate empirical anchor (PR #5324 + 179a8d2 build) Operational implication: Slice 1 adds ~5-10 min to PR-build latency; acceptable trade-off for eliminating human physical-test as routine gate. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(B-0831): add operator's reframing — physical test BECOMES the hardware-support test (not eliminated; assigned distinct first-class scope) Per the human maintainer 2026-05-26: "yes physcal test become actually hardware support test". Adds reframing section to B-0831: - Routine substrate validation → CI cascade #6 (QEMU-emulatable) - Hardware-support test → physical USB on real hardware (validates BIOS/UEFI variants, motherboard NIC drivers, SAS controller support, GPU detection on actual silicon, real-hardware quirks QEMU cannot emulate) Physical test becomes first-class hardware-compatibility-matrix gate fired for: onboarding new hardware, iterating on hardware-specific code paths, periodic compatibility sanity-checks across fleet's hardware diversity. This composes with broader cluster-bringup substrate work: hardware- support-test results inform hardware-compatibility-matrix that drives provisioning decisions. The reframing turns physical-test from "annoying routine gate" INTO "valuable hardware-compatibility-matrix substrate." Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(PR-5343): Copilot 4 findings — cluster-nodes path shape + HTML-tag-ambiguity (<10 / <1) wording 1+2. Lines 70+126: cluster-nodes path was `maintainers/cluster-nodes/` which doesn't match B-0794 + B-0812 per-maintainer convention. Corrected to `maintainers/<operator>/cluster-nodes/<hostname>/...` for the registration step + `maintainers/*/cluster-nodes/**` glob for the ArgoCD watch path. 3. Line 102: replaced `<10 min` with `under 10 min` to avoid HTML-tag- ambiguity in Markdown renderers; also restructured the parenthetical list to avoid `+` continuation in bullet (which markdownlint parses as nested list-item). 4. Line 173 (now 192): replaced `(<1 min ...)` with `(under 1 min ...)` for the same HTML-tag-ambiguity reason. No substantive scope changes; consistency-fixes + cross-renderer-safety. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First-cycle outcome from PR #5322's new QEMU boot smoke-test (cascade #5): the test correctly caught a real config gap — serial console wasn't enabled in the installer NixOS config. ISO boots fine (GRUB + kernel + initrd loaded per serial log) but timed out waiting for
zeta-installer login:because all systemd/getty output went to VGA tty1, which QEMU's-display nonehides.Fix
Add to
full-ai-cluster/usb-nixos-installer/nixos/installer/configuration.nix:Why this is the right fix
Substrate-honest framing per the QEMU test's commit message goal ("catches the bug class where the ISO builds + audits pass but the kernel/initrd combination fails to actually boot"): the test is doing its job. The MISSING config (serial console) was a real gap surfaced on first cycle.
Beneficiaries of serial console output:
What this is NOT
Test plan
otto-cli/*conventionComposes with