Skip to content

backlog(B-0722): CI ephemeral cluster smoke via k3d-on-runner; evolve to vcluster#4954

Merged
AceHack merged 7 commits into
mainfrom
backlog/b0722-ci-ephemeral-cluster-smoke-2026-05-25-c2
May 25, 2026
Merged

backlog(B-0722): CI ephemeral cluster smoke via k3d-on-runner; evolve to vcluster#4954
AceHack merged 7 commits into
mainfrom
backlog/b0722-ci-ephemeral-cluster-smoke-2026-05-25-c2

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 25, 2026

Summary

Files Aaron's "tests should be able to use kind/k3d to do ephemeral clusters on prs" + "we will do k8s in k8s later k8s in docker if fine for ci now" as a P2 backlog row.

Builds on PR #4953's dev-cluster substrate. Phase 1 = k3d-on-runner workflow (immediate ask); Phase 2 = vcluster-on-shared-host when persistent dev cluster exists.

PR contents:

  • New: docs/backlog/P2/B-0722-ci-ephemeral-cluster-smoke-via-k3d-on-runner-evolve-to-vcluster-2026-05-25.md (the backlog row — substrate only, no implementation)
  • Updated: docs/BACKLOG.md (regenerated index after main-merge to clear MD012 + drift on the generated index)
  • New: docs/hygiene-history/ticks/2026/05/25/2208Z.md (Otto-CLI cold-boot tick shard documenting the CI-fix work)

Test plan

🤖 Generated with Claude Code

… to vcluster

Files Aaron's "tests should be able to use kind/k3d to do ephemeral
clusters on prs" + "we will do k8s in k8s later k8s in docker if fine
for ci now" as a P2 backlog row.

Builds on PR #4953's dev-cluster substrate (up.sh / down.sh / sync-
wave annotations). Phase 1 = k3d-on-runner workflow (immediate ask);
Phase 2 = vcluster-on-shared-host when persistent dev cluster exists
(faster PR cycles: ~30s vs ~5min).

Captures: profile config, smoke script with sync-wave assertion, GH
Actions workflow with concurrency + path filter + secure env-var
pattern for github-context values, small up.sh refactor for --config
flag. Acceptance criteria + non-scope items documented.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 25, 2026 16:46
@AceHack AceHack enabled auto-merge (squash) May 25, 2026 16:46
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fea52af477

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new P2 backlog row (B-0722) capturing a plan to run an ephemeral Kubernetes cluster smoke test in CI for AI-cluster PRs (k3d-on-runner now, with a future evolution to vcluster-on-shared-host).

Changes:

  • Introduces docs/backlog/P2/B-0722-*.md with frontmatter + detailed Phase 1/Phase 2 implementation plan.
  • Documents workflow triggering, artifact capture, teardown behavior, and acceptance criteria for the future CI smoke workflow.

Codex/markdownlint flagged two lines where bullet lists weren't
preceded by blank lines (MD032). Also regenerated docs/BACKLOG.md
via `BACKLOG_WRITE_FORCE=1 bun tools/backlog/generate-index.ts`
to include B-0722.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 25, 2026
…st-1636Z; all peer ai-cluster -c2 batch (audit-only, 4th precedent application) (#4957)

49 open PRs (net +2 from 1636Z); 3 BLOCKED+resolve-threads (#4954 / #4955 / #4956), all peer ai-cluster -c2 train continuation of merged #4951. 11 threads deep-audited, 0 FPs — all substantive (full-ai-cluster/dev-cluster/ referenced ahead of substrate landing in #4953). Audit-only disposition per 1405Z/1539Z/1636Z/0441Z precedent (4th application today, 5th overall). Build gate green (0/0/00:00:25.48).

Co-authored-by: Otto <noreply@anthropic.com>
AceHack pushed a commit that referenced this pull request May 25, 2026
…ate-honest

Codex/Copilot flagged 5 dangling cross-references after the prior fix:
  - composes_with B-0722 path (in PR #4954, not on main) — replaced
    with a comment noting pending merge
  - body refs to B-0722, B-0723 — qualified with 'PR #4954/#4955
    pending merge' so the intent is preserved + state is honest
  - body refs to dev-cluster/ + PR #4953#4953 was closed pending
    redesign; replaced 'dev-cluster/' references with 'local k3d /
    kind cluster' + raw 'k3d cluster create' fallback for now

Substrate-honest framing: row's design intent stays intact; reader
isn't promised a path that won't resolve until upstream merges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AceHack added a commit that referenced this pull request May 25, 2026
…rn proof for Max (#4960)

* backlog(B-0724): TS hat-system operator — polyglot K8s-operator pattern proof

Aaron 2026-05-25:
  > "yes lets combine he will like kubernets operators but he does
  > not have experience maybe we write a ts operator insteadd of go
  > he likes ts"
  > "we want polyglot operator support for k8s anyways so we are not
  > rigid about go"

Reframes Max's TS preference accommodation into "first deliberate
proof of the polyglot-operator pattern the cluster commits to
anyway." Two operators against the same CRDs forces the schema
to be the canonical contract — no language-specific quirks bleed
through.

Captures:
- Pattern (CRD-as-canonical-contract + multiple language impls
  watching same CRDs; leader election for active reconciler)
- Why polyglot at cluster scope (contract enforcement, failure-
  domain isolation, talent flexibility, ecosystem coverage)
- TS operator stack (kubernetes/client-node, NestJS optional,
  fastify webhook, nats.js + pino for tick emit, coordination.k8s.io
  Lease for leader election)
- Composition with shipped substrate (PR #4930 Go scaffold as
  reference/baseline; PR #4958 agentic-organization CLUSTER_NATIVE_HAT_SYSTEM
  doc; B-0722 smoke test as polyglot validation gate; B-0723
  multi-kubelet × polyglot operators for max redundancy)
- Acceptance criteria for the TS scaffold
- Future Rust (kube-rs) + Python (kopf) as same-pattern extensions
- P2 because Go scaffold is already functional; not blocking
- Max owns the TS implementation at his preferred pace

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(B-0724): MD012 (consecutive blanks) + MD032 (blank-before-list)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(B-0724): rewrite dangling refs to closed/pending PRs to be substrate-honest

Codex/Copilot flagged 5 dangling cross-references after the prior fix:
  - composes_with B-0722 path (in PR #4954, not on main) — replaced
    with a comment noting pending merge
  - body refs to B-0722, B-0723 — qualified with 'PR #4954/#4955
    pending merge' so the intent is preserved + state is honest
  - body refs to dev-cluster/ + PR #4953#4953 was closed pending
    redesign; replaced 'dev-cluster/' references with 'local k3d /
    kind cluster' + raw 'k3d cluster create' fallback for now

Substrate-honest framing: row's design intent stays intact; reader
isn't promised a path that won't resolve until upstream merges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* B-0724: add team language-affinity map + 'limit Go necessity' framing

Aaron 2026-05-25:
  > 'max love ts and cs i love fs and cs we both like rust and python
  > for where they make sense'
  > 'we understand go is necessary in some places for k8s but we would
  > like to limit its necessity'

Updates the polyglot operator language table:
  - Names Aaron + Max's individual + shared strong languages
  - Adds C# / F# via KubeOps.NET as future operator #2 — the team's
    overlap language (both love C#); kubebuilder-class framework on
    .NET removes Go from operator authoring entirely for this work
  - Sharpens the polyglot motivation: Go is starter / minimize over
    time; ecosystem-forced where genuinely required, not chosen

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Lior <lior@zeta.dev>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AceHack pushed a commit that referenced this pull request May 25, 2026
Both review threads (Copilot P1 + Codex P2) on the same line of
docs/backlog/P3/B-0723-...md correctly flagged dangling composes_with
references:
- full-ai-cluster/dev-cluster/ — never on main (PR #4953 closed unmerged)
- docs/backlog/P2/B-0722-...md — in-flight via PR #4954, not on main yet

Both would surface as missing-target noise to backlog hygiene auditors.

Fix: keep full-ai-cluster/ (exists on main) in composes_with; move the
in-flight PR cross-refs to a new related_prs: key that names GitHub state
rather than filesystem state. When PR #4954 lands B-0722 on main, a
follow-up can promote that PR-ref back into composes_with.

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 25, 2026 21:23
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread docs/BACKLOG.md
Comment thread docs/BACKLOG.md
@AceHack
Copy link
Copy Markdown
Member Author

AceHack commented May 25, 2026

This PR has two failing checks: 'backlog-pr-hygiene-p2' and 'validate-doc-imports'. The 'backlog-pr-hygiene-p2' check is failing because the PR is not following the backlog PR hygiene rules. Please review the rules and update the PR accordingly. The 'validate-doc-imports' check is a false positive and should be updated to ignore files in the 'docs/backlog/P2' directory.

Post-main-merge drift: 5 row-ordering changes (B-0499/B-0506/B-0514/B-0515/B-0517/B-0519) + one extra blank line at line 695 (`## P3 — convenience / deferred`).

Single-source fix: `BACKLOG_WRITE_FORCE=1 bun tools/backlog/generate-index.ts`. Removes one blank line; brings rows into canonical generator order.

Co-Authored-By: Claude <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

…ft fix landed

Sentinel re-armed (catch-43 fired; CronList empty).
PR #4954 named-dep cleared via single BACKLOG.md regeneration; commit 8282de9 pushed.

Co-Authored-By: Claude <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 25, 2026 22:12
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 706b7f4517

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

The "**Concrete artifacts this tick:**" bold-text line directly preceded the bullet list; markdownlint MD032 requires lists be surrounded by blank lines.

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

Comment thread docs/hygiene-history/ticks/2026/05/25/2208Z.md Outdated
Comment thread docs/BACKLOG.md
…eference accuracy

Copilot+Codex review findings on PR #4954 (verified against repo state in isolated worktree):

- Path prefixes — `up.sh`, `dev-cluster/`, `tools/ci/` references now consistently use `full-ai-cluster/` prefix matching actual subtree location
- security-reminder hook — replaced with concrete link to docs/security/GITHUB-ACTIONS-SAFE-PATTERNS.md (the actual workflow-injection guidance doc; security-reminder hook does not exist as a separate artifact)
- "after the local refactor in this row's PR" — reworded; this PR is the backlog row, not the implementation; `--config` flag is part of Phase 1's planned refactor
- Tick shard 2208Z absolute path — replaced `/Users/acehack/...` with `<repo>` placeholder

NOT addressed in this commit (FP / outdated):
- MD012 line 695 (already resolved at source by regeneration; marked isOutdated:true)
- sync-wave annotations claim — VERIFIED present on all 35 Application.yaml files

Co-Authored-By: Claude <noreply@anthropic.com>
@AceHack AceHack merged commit d2c1d49 into main May 25, 2026
28 checks passed
@AceHack AceHack deleted the backlog/b0722-ci-ephemeral-cluster-smoke-2026-05-25-c2 branch May 25, 2026 22:33
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 508efd9c87

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Tears down on EXIT trap (skip with `SKIP_TEARDOWN=1`)
- Exit codes: 0 = pass; 1 = converge timeout; 2 = pre-flight fail

3. **`.github/workflows/ai-cluster-smoke.yml`** — triggers on `pull_request` with path filter (`full-ai-cluster/k8s/applications/**`, `full-ai-cluster/dev-cluster/**`, `full-ai-cluster/tools/ci/**`, this workflow file). Concurrency group cancels in-flight runs on new commits. Installs k3d + kubectl + helm + jq, runs `cluster-smoke.sh`, uploads artifacts, posts PR comment on failure with sync-wave plan + recent events.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include bootstrap root app in smoke workflow path filter

Expand the proposed pull_request paths list to include the root App-of-Apps manifest (e.g. full-ai-cluster/k8s/bootstrap/root-application.yaml). As written, a PR that changes the root application graph entrypoint would not trigger this smoke workflow, which contradicts the row’s goal of validating graph-affecting changes before merge and leaves a real blind spot for bootstrap-level regressions.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants