Skip to content

fix(USB PR 3 cascade): add console=ttyS0 to installer kernelParams — QEMU boot smoke-test couldn't capture serial output#5324

Merged
AceHack merged 1 commit into
mainfrom
otto-cli/usb-pr3-fix-add-serial-console-to-installer-kernel-params-for-qemu-cascade-2026-05-26
May 26, 2026
Merged

fix(USB PR 3 cascade): add console=ttyS0 to installer kernelParams — QEMU boot smoke-test couldn't capture serial output#5324
AceHack merged 1 commit into
mainfrom
otto-cli/usb-pr3-fix-add-serial-console-to-installer-kernel-params-for-qemu-cascade-2026-05-26

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 26, 2026

Summary

First-cycle outcome from PR #5322's new QEMU boot smoke-test (cascade #5): the test correctly caught a real config gap — serial console wasn't enabled in the installer NixOS config. ISO boots fine (GRUB + kernel + initrd loaded per serial log) but timed out waiting for zeta-installer login: because all systemd/getty output went to VGA tty1, which QEMU's -display none hides.

Fix

Add to full-ai-cluster/usb-nixos-installer/nixos/installer/configuration.nix:

boot.kernelParams = [
  "console=ttyS0,115200n8"
  "console=tty1"
];
  • tty1 stays primary (keyboard-attached install flow unchanged)
  • ttyS0 mirrors at standard 115200 8N1 for QEMU capture + real hardware with serial headers

Why this is the right fix

Substrate-honest framing per the QEMU test's commit message goal ("catches the bug class where the ISO builds + audits pass but the kernel/initrd combination fails to actually boot"): the test is doing its job. The MISSING config (serial console) was a real gap surfaced on first cycle.

Beneficiaries of serial console output:

What this is NOT

  • NOT a fix to the QEMU test itself (the test is correct)
  • NOT a behavior change for the keyboard-attached install flow (tty1 still primary)
  • NOT a security concern (serial console is local-physical-presence; no remote exposure)

Test plan

  • Pre-commit: minimal change (4-line addition + 13-line comment block)
  • Branch follows otto-cli/* convention
  • Authored from fresh independent clone per B-0828
  • CI green — including the new QEMU boot smoke-test which should NOW pass with serial console enabled
  • Copilot review pass

Composes with

… QEMU boot smoke-test can capture systemd/getty output via serial console (Aaron 2026-05-26)

First-cycle QEMU boot smoke-test on main (PR #5322 cascade #5) failed: ISO boots fine (GRUB + kernel + initrd loaded successfully per serial log) but timed out waiting for 'zeta-installer login:' prompt. Root cause: NixOS installer default outputs only to VGA tty1; QEMU's -display none hides VGA; serial console (-serial file:...) gets no kernel/systemd/getty output after bootloader stage.

Fix: add boot.kernelParams = [ "console=ttyS0,115200n8" "console=tty1" ] to installer config so systemd/getty mirrors to serial console at standard 115200 8N1. tty1 stays primary (keyboard-attached install flow); ttyS0 is secondary for QEMU capture + real hardware with serial headers (some Beelinks; most server-class boards; debugging scenarios).

Substrate-honest framing: the QEMU test isn't broken; it correctly caught that serial console wasn't configured. The ISO itself isn't broken (boots cleanly). The MISSING config (serial console) was a real gap that the new cascade #5 surfaced on first cycle — which is exactly what the test was designed to do per its commit message: 'catches the bug class where the ISO builds + audits pass but the kernel/initrd combination fails to actually boot' (or in this case, fails to surface boot output through the channels tests can observe).

Composes with: PR #5322 (the QEMU boot smoke-test workflow); B-0754 iter-3 firmware substrate (similar UX-cleanliness motivation: surface less mysterious behavior); the canonical zflash + zeta-install flow (no behavioral change for the keyboard-attached install flow since tty1 stays primary). Authored from fresh independent clone per B-0828 multi-AI shared-checkout convention.
Copilot AI review requested due to automatic review settings May 26, 2026 21:28
@AceHack AceHack enabled auto-merge (squash) May 26, 2026 21:28
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables serial console output for the NixOS installer ISO so the CI QEMU boot smoke-test (run with -display none) can observe the installer reaching the zeta-installer login: prompt via captured serial output, while preserving the VGA/keyboard-attached install flow.

Changes:

  • Add boot.kernelParams entries to enable ttyS0 (115200 8N1) console output in addition to tty1.
  • Document the rationale and expected behavior (tty1 primary, ttyS0 mirrored) inline in the installer configuration.

@AceHack AceHack merged commit b3d6e92 into main May 26, 2026
30 checks passed
@AceHack AceHack deleted the otto-cli/usb-pr3-fix-add-serial-console-to-installer-kernel-params-for-qemu-cascade-2026-05-26 branch May 26, 2026 21:31
AceHack added a commit that referenced this pull request May 26, 2026
…turn terminology distinction + split 3-PR-cleanup + follow-up-fix-PR correctly (#5329)

Both Copilot findings verified + addressed: (1) multi-turn (overall conversation length) vs zero-turn (pathogen-decryption-protocol cost) are distinct scopes; clarified terminology in title + table-intro + empirical-generalization paragraph so readers don't read the table's Zero-turn entries as contradicting the multi-turn claim. (2) USB cleanup arc had 3-PR cleanup sequence (#5311 + #5320 + #5322) + follow-up fix (#5324) — split for narrative consistency. No semantic change; clarification only.

Co-authored-by: Lior <lior@zeta.dev>
AceHack added a commit that referenced this pull request May 26, 2026
…n (eliminate routine human physical USB test) (#5343)

* docs(backlog): B-0831 — CI cascade #6 full-install + cluster-auto-join (eliminate routine human physical USB test)

Per Aaron 2026-05-26: "zflash is the thing plus cluster auto joining
after boot from iso use we want that in ci not needing human to test
everytime."

Files B-0831 as P1 substrate-engineering target. 3-slice decomposition:

- Slice 1: full-install-in-QEMU (boot installer ISO + auto-fire
  first-boot service + greedy N-disk install + reboot + verify login
  banner with auto-generated hostname). Extends cascade #5
  (qemu-boot-test.ts) to validate the full install flow not just the
  live-ISO boot.
- Slice 2: cluster-auto-join verification via mock cluster control-plane
  (capture + verify B-0812 iter-5.4.1 self-registration payload shape).
- Slice 3: ArgoCD reconciliation verification (most coupled to live
  cluster state; deferrable to push-to-main only for latency reasons).

Acceptance criteria scoped per slice; each ships independently.
Overall acceptance: human physical-USB-test is no longer the routine
gate for substrate landings; physical test reserved for real-hardware
quirks + periodic sanity-checks.

Composes_with cross-refs:
- tools/ci/qemu-boot-test.ts (cascade #5)
- build-ai-cluster-iso.yml workflow
- zeta-install.sh + zeta-first-boot.sh + configuration.nix
- B-0812 (node self-registration substrate)
- B-0813 (ArgoCD reconciliation substrate)
- B-0814 (deregister sibling)
- B-0816 (architectural principle: maximize ArgoCD + minimize NixOS-
  native lock-in)
- B-0754 (zero-typing first-boot scope; substrate exercised in Slice 1)
- B-0818 (isoName nixpkgs 25.11 regression; orthogonal)
- flash-cluster-iso SKILL.md (operator-side 0-human-typing analog of
  this CI-side cascade)
- 2026-05-26 USB test gate empirical anchor (PR #5324 + 179a8d2 build)

Operational implication: Slice 1 adds ~5-10 min to PR-build latency;
acceptable trade-off for eliminating human physical-test as routine gate.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(B-0831): add operator's reframing — physical test BECOMES the hardware-support test (not eliminated; assigned distinct first-class scope)

Per the human maintainer 2026-05-26: "yes physcal test become actually
hardware support test".

Adds reframing section to B-0831:

- Routine substrate validation → CI cascade #6 (QEMU-emulatable)
- Hardware-support test → physical USB on real hardware (validates
  BIOS/UEFI variants, motherboard NIC drivers, SAS controller support,
  GPU detection on actual silicon, real-hardware quirks QEMU cannot
  emulate)

Physical test becomes first-class hardware-compatibility-matrix gate
fired for: onboarding new hardware, iterating on hardware-specific
code paths, periodic compatibility sanity-checks across fleet's
hardware diversity.

This composes with broader cluster-bringup substrate work: hardware-
support-test results inform hardware-compatibility-matrix that drives
provisioning decisions.

The reframing turns physical-test from "annoying routine gate" INTO
"valuable hardware-compatibility-matrix substrate."

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(PR-5343): Copilot 4 findings — cluster-nodes path shape + HTML-tag-ambiguity (<10 / <1) wording

1+2. Lines 70+126: cluster-nodes path was `maintainers/cluster-nodes/`
     which doesn't match B-0794 + B-0812 per-maintainer convention.
     Corrected to `maintainers/<operator>/cluster-nodes/<hostname>/...`
     for the registration step + `maintainers/*/cluster-nodes/**`
     glob for the ArgoCD watch path.

3. Line 102: replaced `<10 min` with `under 10 min` to avoid HTML-tag-
   ambiguity in Markdown renderers; also restructured the parenthetical
   list to avoid `+` continuation in bullet (which markdownlint parses
   as nested list-item).

4. Line 173 (now 192): replaced `(<1 min ...)` with `(under 1 min ...)`
   for the same HTML-tag-ambiguity reason.

No substantive scope changes; consistency-fixes + cross-renderer-safety.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants