chore(workflows): group smoke-test workflows under test-workflows/ + add e2e-minimax-smoke#1431
chore(workflows): group smoke-test workflows under test-workflows/ + add e2e-minimax-smoke#1431
Conversation
…ows/test-workflows/ Move the 7 existing e2e-*.yaml smoke tests plus the new e2e-minimax-smoke test into a dedicated subfolder. Subfolder grouping is supported by the workflow loader (1 level deep, resolution by filename) so workflow names are unchanged. Mirrors the .archon/workflows/maintainer/ split landing in #1430. Also adds e2e-minimax-smoke.yaml — a sanity check that Pi correctly routes to Minimax M2.7 via the user's local pi auth, and that Pi's best-effort output_format parser handles a small nested schema. Asserts routing by reading the most recent Pi session jsonl rather than asking the model to self-identify (LLMs are unreliable narrators about their own identity, especially when Pi's system prompt mentions other providers as defaults).
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a new e2e smoke-test CI workflow that runs three sequential prompt nodes against the Pi provider targeting minimax/MiniMax-M2.7, asserts non-empty and structured JSON outputs, and verifies recent Pi session logs include the provider and model identifiers. Changes
Sequence Diagram(s)sequenceDiagram
participant Runner as CI Runner
participant Pi as Pi Provider
participant Model as MiniMax-M2.7
participant FS as Filesystem (~/.pi/agent/sessions)
Runner->>Pi: Start session (target minimax/MiniMax-M2.7)
Pi->>Model: Execute prompt "hello" (math)
Model-->>Pi: Streamed output (non-empty)
Pi->>Model: Execute prompt "identify" (identity)
Model-->>Pi: Streamed output (non-empty)
Pi->>Model: Execute prompt "json" (structured output request)
Model-->>Pi: Structured JSON with fields `name`, `ok`
Pi->>FS: Append session JSONL (includes provider/model metadata)
Runner->>FS: Find recent session files (modified <10 min)
Runner->>FS: Grep for provider/minimax and modelId/MiniMax-M2.7
Runner->>Runner: Assert non-empty outputs and JSON fields present
alt All checks pass
Runner->>Runner: Print PASS
else Any check fails
Runner->>Runner: Print FAIL and exit 1
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.archon/workflows/test-workflows/e2e-minimax-smoke.yaml:
- Line 98: The current brittle check uses the line "if grep -q
'\"provider\":\"minimax\".*\"modelId\":\"MiniMax-M2.7\"' \"$recent_session\";"
which fails if JSON field order changes; replace this with an order-independent
JSON-aware assertion that reads each JSONL line from the recent_session and uses
a JSON parser (jq) to select entries where .provider == "minimax" and .modelId
== "MiniMax-M2.7", then test the parser's success/exit code to determine
routing; update the conditional that references the grep invocation (the if ...
"$recent_session") to use this jq-based selection so the test no longer depends
on field order.
- Around line 93-97: Increase the find window (e.g., change -mmin -3 to a larger
value like -mmin -10) and replace the nondeterministic "find ... | head -1"
selection with a deterministic mtime sort: collect matching files, sort them by
modification time (newest first) and pick the first result to set
recent_session; when implementing the sort step on Linux use GNU stat/printf
(stat -c '%Y %n' or find -printf '%T@ %p\n') and on macOS use BSD stat (stat -f
'%m %N') so the script (the recent_session assignment and the find invocation)
works portably and reliably across platforms.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4c1fb2fb-6837-402e-924e-7593ab4c23f1
📒 Files selected for processing (8)
.archon/workflows/test-workflows/e2e-claude-smoke.yaml.archon/workflows/test-workflows/e2e-codex-smoke.yaml.archon/workflows/test-workflows/e2e-deterministic.yaml.archon/workflows/test-workflows/e2e-minimax-smoke.yaml.archon/workflows/test-workflows/e2e-mixed-providers.yaml.archon/workflows/test-workflows/e2e-pi-all-nodes-smoke.yaml.archon/workflows/test-workflows/e2e-pi-smoke.yaml.archon/workflows/test-workflows/e2e-worktree-disabled.yaml
| recent_session=$(find "$HOME/.pi/agent/sessions" -name '*.jsonl' -mmin -3 -print 2>/dev/null | head -1) | ||
| if [ -z "$recent_session" ]; then | ||
| echo "FAIL: no Pi session jsonl modified in the last 3 minutes" | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
GNU find -printf vs BSD find macOS portability
💡 Result:
GNU find's -printf option is not supported by BSD find on macOS, as macOS uses a BSD-derived find that lacks this GNU extension. BSD find man pages (including macOS equivalents) list options like -print and -print0 but no -printf. GNU findutils documentation explicitly describes -printf with format directives like %i for inode, %s for size, etc. For portability on macOS: - Install GNU findutils via Homebrew: brew install findutils, then use gfind -printf ... - Alternatives using native tools: find . -print0 | xargs -0 stat -f '%i ' (BSD stat format) or find . -exec stat -f '%N\n' {} + No changes in recent macOS versions (e.g., Sonoma as of 2026); still BSD find without -printf.
Citations:
- 1: https://stackoverflow.com/questions/752818/find-lacks-the-option-printf-now-what
- 2: https://hackmd.io/CG-bn8JxRFCM6vKCQK0TDQ
- 3: https://superuser.com/questions/293199/find-printf-on-mac-printf-unknown-option
- 4: https://man.freebsd.org/cgi/man.cgi?manpath=macOS+26.3&query=find&sektion=1
- 5: https://www.unix.com/man-page/osx/1/find?os=osx§ion=1&query=find
- 6: https://www.gnu.org/software/findutils/manual/html_mono/find.html
- 7: https://www.gnu.org/software/findutils/manual/html_node/find_html/Format-Directives.html
- 8: https://manpages.ubuntu.com/manpages/jammy/man1/find.1.html
- 9: https://www.gnu.org/software/findutils/manual/html_node/find_html/Print-File-Information.html
Widen the 3-minute find window and select the most recently modified session file deterministically.
The 3-minute timeout (-mmin -3) may be too tight. With three Pi nodes each carrying 60-second idle timeouts, the cumulative wall-clock time between session write and the assert node executing this find can exceed 3 minutes. Additionally, when multiple recent Pi sessions coexist, head -1 produces unpredictable results because find does not guarantee output order.
Suggested fix
- recent_session=$(find "$HOME/.pi/agent/sessions" -name '*.jsonl' -mmin -3 -print 2>/dev/null | head -1)
+ recent_session=$(find "$HOME/.pi/agent/sessions" -name '*.jsonl' -mmin -10 \
+ -printf '%T@ %p\n' 2>/dev/null | sort -nr | head -1 | cut -d' ' -f2-)For macOS (which uses BSD find without -printf), use stat -f instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.archon/workflows/test-workflows/e2e-minimax-smoke.yaml around lines 93 -
97, Increase the find window (e.g., change -mmin -3 to a larger value like -mmin
-10) and replace the nondeterministic "find ... | head -1" selection with a
deterministic mtime sort: collect matching files, sort them by modification time
(newest first) and pick the first result to set recent_session; when
implementing the sort step on Linux use GNU stat/printf (stat -c '%Y %n' or find
-printf '%T@ %p\n') and on macOS use BSD stat (stat -f '%m %N') so the script
(the recent_session assignment and the find invocation) works portably and
reliably across platforms.
- Widen find window from -mmin -3 to -mmin -10. The smoke's three Pi nodes plus the assert can collectively run several minutes on slow networks; 3 minutes was tight enough to false-FAIL on a healthy run. (CodeRabbit minor) - Drop non-deterministic `head -1` over `find` output. find doesn't guarantee any order; on a tie, the wrong file would be picked. Now iterates all matching sessions and breaks on first one carrying the routing signal — any match is sufficient evidence. (CodeRabbit minor) - Replace single-regex `'"provider":"minimax".*"modelId":"MiniMax-M2.7"'` with two separate greps joined by `&&`. JSON field order isn't part of Pi's contract; a future Pi release reordering `provider` and `modelId` in the model_change event would silently false-FAIL the original pattern. The new check is order-independent. (CodeRabbit major)
Resolves the conflict on packages/providers/src/community/pi/provider.ts. - Upstream's coleam00#1284 (ModelRegistry) and our shared/structured-output extraction both touch the same region. Upstream removed the inline augmentPromptForJsonSchema call site that coleam00#1284 didn't itself need; our branch had moved that function to shared/. Resolution keeps the shared/ extraction (single source of truth for both Pi and Copilot) and re-exports it from pi/provider.ts under the original name so existing Pi callers and tests stay byte-for-byte. - Drops the dead-code lookupPiModel/GetModelFn helper that was a stale leftover from an earlier merge attempt — never had a caller and was superseded by ModelRegistry upstream. - Picks up coleam00#1431 — moves e2e-copilot-smoke.yaml under test-workflows/ alongside the other e2e-*.yaml smokes. Adds e2e-copilot-all-features smoke (mirrors e2e-minimax-smoke): basic chat (PONG) + effort: high (17×23 = 391) + denied_tools: [shell, write] (DENIED_OK) + output_format JSON (best-effort via shared/ structured-output, parsed as {model, ok}). Single bash assert verifies all four end-to-end. Doubles as adoption docs. Validation: - bun run check:bundled, type-check, lint, format:check — all green. - bun --filter @archon/providers test — fully green (Pi included). - Live smoke (Linux + gpt-5-mini, Copilot CLI 1.0.36): e2e-copilot-smoke → 12s, PASS e2e-copilot-all-features → 25s, PASS (all four caps) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
.archon/workflows/was getting cluttered — 7 e2e smoke tests living next to maintainer workflows, default workflows, and ad-hoc local YAMLs. Hard to scan, easy to miss new tests.e2e-*.yaml..archon/workflows/test-workflows/and adds one new test,e2e-minimax-smoke, that verifies Pi correctly routes to Minimax M2.7. Subfolder grouping is supported by the workflow loader (1 level deep, resolution by filename) so workflow names are unchanged.archon workflow run e2e-pi-smokestill works. No bundled defaults moved.UX Journey
Before
After
Architecture Diagram
Before
Flat smoke-test layout at workflow root.
After
Connection inventory:
.archon/workflows/test-workflows/*.yamlmaintainer/archon workflow run e2e-*callersLabel Snapshot
risk: lowsize: S(8 file moves + 1 new file, all in.archon/)workflowsworkflows:test-organizationChange Metadata
choreworkflowsLinked Issue
maintainer/)Validation Evidence (required)
e2e-minimax-smokewas test-run twice end-to-end against the user's local Pi auth — both runs confirmed Pi routed the call to Minimax M2.7 (verified via the Pi session jsonl, since LLM self-identification is unreliable).bun run test— this PR moves YAML files and adds one more YAML file. No source code changes, no test coverage at risk.bun run lintis a no-op for.archon/**(ignored in eslint config).Security Impact (required)
e2e-minimax-smokecalls Pi's API via the user's local auth — same auth path ase2e-pi-smoke.yaml.~/.pi/agent/sessions/to verify Pi routing, but only as a post-run assertion — read-only, scoped to the user's own Pi session dir.Compatibility / Migration
Human Verification (required)
archon validate workflows <name>.e2e-minimax-smokeend-to-end run: Pi connects with user'sminimaxapi_key entry, routes toMiniMax-M2.7, the structured-output (output_format) call returns parseable JSON for a flat schema, the bash post-assert reads the most recent Pi session jsonl and confirmsprovider=minimax, modelId=MiniMax-M2.7.e2e-pi-smoke.yamlintest-workflows/is stillarchon workflow run e2e-pi-smoke.find ... -mmin -3andtr— POSIX, but Windows runs in WSL or git-bash).Side Effects / Blast Radius (required)
.archon/workflows/e2e-pi-smoke.yaml(rather than the workflow name) would break. I've grepped the repo andpackages/— no source-code references to those paths.archon validate workflowspasses for all 8; CI catches any path issues.Rollback Plan (required)
git revert <merge-sha>— single commit, all renames + 1 new file. Reversible cleanly.archon workflow run e2e-pi-smokeerrors). CI catches.Risks and Mitigations
output_formatis best-effort and could parse-fail on the nested schemas Archon uses elsewhere.e2e-minimax-smokeexercises the simplest possible flat schema ({name, ok}) to keep the smoke a pure connectivity / parsing sanity check, not a stress test. More elaborate schemas should have their own dedicated smokes..archon/workflows/e2e-*.yamlinstead of intest-workflows/.CONTRIBUTING.mdcould codify the convention if this matters; not adding it in this PR to keep scope tight.Summary by CodeRabbit
Note: This release contains internal CI/testing improvements with no user-facing changes.