docs(backlog): B-0831 — CI cascade #6 full-install + cluster-auto-join (eliminate routine human physical USB test)#5343
Merged
AceHack merged 3 commits intoMay 26, 2026
Conversation
…n (eliminate routine human physical USB test) Per Aaron 2026-05-26: "zflash is the thing plus cluster auto joining after boot from iso use we want that in ci not needing human to test everytime." Files B-0831 as P1 substrate-engineering target. 3-slice decomposition: - Slice 1: full-install-in-QEMU (boot installer ISO + auto-fire first-boot service + greedy N-disk install + reboot + verify login banner with auto-generated hostname). Extends cascade #5 (qemu-boot-test.ts) to validate the full install flow not just the live-ISO boot. - Slice 2: cluster-auto-join verification via mock cluster control-plane (capture + verify B-0812 iter-5.4.1 self-registration payload shape). - Slice 3: ArgoCD reconciliation verification (most coupled to live cluster state; deferrable to push-to-main only for latency reasons). Acceptance criteria scoped per slice; each ships independently. Overall acceptance: human physical-USB-test is no longer the routine gate for substrate landings; physical test reserved for real-hardware quirks + periodic sanity-checks. Composes_with cross-refs: - tools/ci/qemu-boot-test.ts (cascade #5) - build-ai-cluster-iso.yml workflow - zeta-install.sh + zeta-first-boot.sh + configuration.nix - B-0812 (node self-registration substrate) - B-0813 (ArgoCD reconciliation substrate) - B-0814 (deregister sibling) - B-0816 (architectural principle: maximize ArgoCD + minimize NixOS- native lock-in) - B-0754 (zero-typing first-boot scope; substrate exercised in Slice 1) - B-0818 (isoName nixpkgs 25.11 regression; orthogonal) - flash-cluster-iso SKILL.md (operator-side 0-human-typing analog of this CI-side cascade) - 2026-05-26 USB test gate empirical anchor (PR #5324 + 179a8d2 build) Operational implication: Slice 1 adds ~5-10 min to PR-build latency; acceptable trade-off for eliminating human physical-test as routine gate. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
…rdware-support test (not eliminated; assigned distinct first-class scope) Per the human maintainer 2026-05-26: "yes physcal test become actually hardware support test". Adds reframing section to B-0831: - Routine substrate validation → CI cascade #6 (QEMU-emulatable) - Hardware-support test → physical USB on real hardware (validates BIOS/UEFI variants, motherboard NIC drivers, SAS controller support, GPU detection on actual silicon, real-hardware quirks QEMU cannot emulate) Physical test becomes first-class hardware-compatibility-matrix gate fired for: onboarding new hardware, iterating on hardware-specific code paths, periodic compatibility sanity-checks across fleet's hardware diversity. This composes with broader cluster-bringup substrate work: hardware- support-test results inform hardware-compatibility-matrix that drives provisioning decisions. The reframing turns physical-test from "annoying routine gate" INTO "valuable hardware-compatibility-matrix substrate." Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a new P1 backlog row (B-0831) capturing the planned CI “cascade #6” work to validate a full installer run in QEMU plus post-boot cluster auto-join, with the goal of eliminating routine physical USB testing as the substrate gate.
Changes:
- Adds new backlog row B-0831 describing a 3-slice CI verification plan (full install, mock join verification, optional ArgoCD reconciliation verification).
- Updates
docs/BACKLOG.mdindex to include B-0831.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| docs/backlog/P1/B-0831-ci-cascade-6-full-install-plus-cluster-auto-join-eliminate-routine-human-physical-usb-test-aaron-2026-05-26.md | New backlog row defining the problem statement, slices, acceptance criteria, and cross-references for CI cascade #6. |
| docs/BACKLOG.md | Adds B-0831 to the P1 backlog index list. |
…g-ambiguity (<10 / <1) wording
1+2. Lines 70+126: cluster-nodes path was `maintainers/cluster-nodes/`
which doesn't match B-0794 + B-0812 per-maintainer convention.
Corrected to `maintainers/<operator>/cluster-nodes/<hostname>/...`
for the registration step + `maintainers/*/cluster-nodes/**`
glob for the ArgoCD watch path.
3. Line 102: replaced `<10 min` with `under 10 min` to avoid HTML-tag-
ambiguity in Markdown renderers; also restructured the parenthetical
list to avoid `+` continuation in bullet (which markdownlint parses
as nested list-item).
4. Line 173 (now 192): replaced `(<1 min ...)` with `(under 1 min ...)`
for the same HTML-tag-ambiguity reason.
No substantive scope changes; consistency-fixes + cross-renderer-safety.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Files B-0831 as P1 substrate-engineering target capturing operator direction 2026-05-26: "zflash is the thing plus cluster auto joining after boot from iso use we want that in ci not needing human to test everytime."
3-slice decomposition
Each slice ships independently. Overall acceptance: human physical-USB-test is no longer the routine gate for substrate landings.
What remains valuable for physical test
Test plan
🤖 Generated with Claude Code