fix(B-0421): grok.ts self-documenting failure marker on empty-output cursor-agent exit (acceptance #3) by AceHack · Pull Request #2949 · Lucent-Financial-Group/Zeta

AceHack · 2026-05-13T06:07:10Z

Summary

Addresses B-0421 acceptance criterion 3 (surface cursor-agent errors more visibly).

Problem: when cursor-agent exits non-zero with empty stdout, grok.ts silently writes an empty output file. Callers reading only the file (not terminal stderr) cannot tell the call failed.

Fix: capture cursor-agent stderr (was inherit-only, now also pipe-captured + mirrored to process.stderr) AND on the empty-stdout + non-zero-exit case, write a self-documenting failure marker to the output file:

# cursor-agent failure (B-0421 self-documenting marker)
Exit code: <N>
Model: grok-4-20-thinking | grok-4-20
Prompt size (bytes): <N>
## Captured stderr
``` <stderr contents> ```

What changes

Before	After
stderr: `inherit` only	stderr: `pipe` + mirror to process.stderr
Output file silently empty on failure	Self-documenting failure marker with exit code + stderr
Caller reading only file: no idea call failed	File explains: exit code, model, prompt size, stderr

Why P2-level fix

Grok is one of four canonical peer-call agents. When it silently fails, BFT-style consensus drops from 4-of-4 to 3-of-4 without the calling agent noticing. The self-documenting failure makes the gap visible.

Acceptance criteria progress

3: Surface cursor-agent errors more visibly (this PR)
1: Reproduce failure with smaller prompt (next; failure marker will now capture stderr)
2: Identify root cause (now investigable via captured stderr when failure recurs)
4: 4-wrapper smoke test (next)

Composes with

B-0421 backlog row (status open → in-progress; full progress note)
tools/peer-call/grok.ts (the wrapper)
.claude/rules/peer-call-infrastructure.md (grok.ts entry already cites B-0421 as open per PR fix(.claude/rules): peer-call-infrastructure rule — 8 wrappers not 6 + B-0421 note + website-text-mode-git pointer #2946)
PR docs(memory): Ani website-text AGENTS.md review META-LOOP #2 + middle-path in math + dharma in code + shadow-Casimir-PR-cascade (Aaron 2026-05-13) #2945 (Ani META-LOOP — Grok website-text-mode-git is working orientation path until B-0421 resolves)

Test plan

Static analysis: TypeScript types validated (Buffer + spawnSync API used consistently)
Behavioral diff: success case unchanged (writes stdoutBuf to file as before); empty-failure case writes marker instead of empty
Stderr mirroring preserves prior visibility (captured + written to process.stderr)
Live test: requires cursor-agent invocation (not run in CI; will surface in next Grok call)

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

…tderr capture + empty-output bug Addresses B-0421 acceptance criterion 3 (surface cursor-agent errors more visibly). Problem (per B-0421): when cursor-agent exits non-zero with empty stdout (auth/quota/model-availability failures), `grok.ts` writes a silently-empty output file. Callers reading only the file (not the terminal stderr) cannot tell the call failed. Fix: 1. Change cursor-agent stdio from ["inherit", "pipe", "inherit"] to ["inherit", "pipe", "pipe"] — capture stderr in addition to stdout. 2. Mirror captured stderr to process.stderr after spawnSync returns — preserves prior visibility for real-time callers. 3. On non-zero exit + empty stdout (the B-0421 failure case), write a self-documenting failure marker to the output file containing: - Exit code - Model (grok-4-20-thinking or grok-4-20) - Prompt size in bytes - Captured stderr (verbatim) 4. Mirror the file content (failure marker if empty-failure; stdout otherwise) to process.stdout so shell pipelines see what was written to the file. 5. Emit explicit "B-0421 failure marker written to <path>" message on stderr when empty-failure case fires. Backlog row updated: status open → in-progress; progress note covers acceptance criteria 1-4. Acceptance criteria still open: - 1: reproduce the failure with a smaller prompt - 2: identify root cause from cursor-agent stderr (now captured + self-documented when failure recurs) - 4: smoke test verifying all 4 wrappers complete a 1-line review Co-Authored-By: Claude <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 27cc43cb0e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copilot

Pull request overview

This PR addresses backlog item B-0421 acceptance criterion #3 by making tools/peer-call/grok.ts self-report cursor-agent failures when the child exits non-zero with empty stdout, so file-only consumers can detect the failure.

Changes:

Capture cursor-agent stderr (pipe) and mirror it to the parent’s stderr.
On exitCode != 0 && stdout is empty, write a self-documenting failure marker (exit code, model, prompt bytes, captured stderr) to the output file instead of leaving it empty.
Update the B-0421 backlog row with progress notes and a status change.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
tools/peer-call/grok.ts	Captures stderr and writes a failure marker to the output file on empty-output failures.
docs/backlog/P2/B-0421-grok-peer-call-failure-cursor-agent-exit-1-2026-05-11.md	Records progress for acceptance criterion #3 and updates frontmatter metadata.

6 substantive findings absorbed in one commit: 1. Spawn-failure diagnostics (Copilot): spawnSync returns status: null on ENOENT / signal / maxBuffer-exceeded etc. and sets result.error / result.signal. Reporting exitCode=1 in those cases lost real diagnostic info. Fix: extract rawStatus + spawnError + spawnSignal; surface them in the failure marker via exitCodeDisplay (signal name / "null (spawn error)" / numeric) + spawnError message field. 2. Output-format mismatch (Copilot): wrapper supports --json / --stream; Markdown marker breaks JSON consumers. Fix: emit marker in matching format: - text → Markdown failure marker - json → pretty-printed JSON object - stream-json → newline-delimited single JSON object 3. stderr visibility regression (Copilot x2): changing stderr from inherit to pipe lost live streaming; spawnSync only delivers after exit. Fix: documented as known trade-off in the comments and the backlog progress note. Live streaming traded for output-file capture of stderr in the empty-failure case. 4. Backlog frontmatter schema (Copilot): "in-progress" is outside the documented enum (open / closed / superseded-by / deferred). Fix: revert status to "open"; progress note stays. 5. Progress note wording (Copilot): "real-time visibility" was inaccurate; mirror is post-exit only. Fix: reworded to "delivered post-exit (mirrored to caller stderr after spawnSync returns), not in real-time." 6. CodeQL "insecure temporary file" (CodeQL bot): pre-existing alert on autogenOutputPath() using /tmp directly. Not introduced by this PR (existed before; flagged due to file touch). Filing as separate concern; this PR keeps the existing tmpdir path. Also includes B-0421 acceptance #4 cross-reference (smoke test landing in parallel PR #2950). Co-Authored-By: Claude <noreply@anthropic.com>

… to --help (#2950) * feat(B-0421/4): peer-call smoke tests — verify all 8 wrappers respond to --help Addresses B-0421 acceptance criterion 4: "Add a smoke test to tools/peer-call/ that verifies all four wrappers can complete a 1-line review." Generalized to all 8 wrappers (claude, grok, gemini, codex, kiro, amara, ani, riven) per the post-2026-05-11 wrapper expansion (B-0326 added kiro; B-0327 added claude). Scope: validates wrapper PLUMBING, not live AI calls. CI runners do not have cursor-agent / gemini / codex-cli / kiro-cli installed, so a live smoke test cannot run in CI. This test instead exercises: 1. Each wrapper file exists at the canonical path 2. Each wrapper responds to --help with exit 0 and help text (catches: missing file, syntax error preventing bun load, broken argument-parser, missing help branch) 3. Help text references the wrapper's own filename (catches: copy-paste-name regressions where gemini.ts's help would print "grok") Also verifies the 3 utility files exist (_firewall.ts, append-identity-receipt.ts, register-layers.ts) so the peer-call-infrastructure rule's "11 files = 8 wrappers + 3 utilities" count remains accurate. Local test result: 27 tests / 51 expect() calls / 613ms / all pass. Composes with: - B-0421 (acceptance #4 — this PR closes the criterion) - PR #2946 (peer-call rule 6→8 fix that established the wrapper count this test enforces) - PR #2949 (B-0421 acceptance #3 — self-documenting failure marker; in flight) Co-Authored-By: Claude <noreply@anthropic.com> * fix(B-0421/4): address Copilot+Codex round-1 findings on PR #2950 3 substantive findings absorbed: 1+2. Header claimed --output-file PATH was validated but tests only exercised --help. Fix: added a fourth test per wrapper that runs `--output-file PATH --help` and verifies: - exit 0 (--help short-circuits after --output-file consumes the path-arg) - stderr does NOT contain "unknown flag" (canonical classifyFlag() rejection message) This proves the flag is accepted without invoking any external AI. 3. "Out of scope" list said "Cross-wrapper consensus (B-0421 acceptance #4 future work)" — contradiction since this file IS implementing acceptance #4. Fix: reworded to clarify the smoke test checks each wrapper individually, not their interactions; renamed item to "Cross-wrapper BFT-style consensus" with explicit "separate concern" framing. Also clarified the test #4 description in the header comment to explain WHY `--output-file PATH --help` works as a smoke test (--help short-circuits after --output-file is consumed, exiting 0 without invoking the external AI). Local result: 35 tests / 67 expect() calls / 719ms / all pass. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>

… all 8 wrappers (substrate-consistent fix needed) (#2951) CodeQL alert #79 surfaced during PR #2949 review (B-0421 self-documenting failure marker on grok.ts). Pattern is pre-existing on main and identical across all 8 peer-call wrappers — fixing one in isolation creates substrate inconsistency. Two concerns: 1. Hardcoded /tmp — not portable; should use os.tmpdir() 2. Predictable filename (timestamp + entity) — local attacker could symlink-race the path Suggested substrate-consistent fix: - Replace hardcoded /tmp with os.tmpdir() - Use fs.mkdtempSync() to create unpredictable parent dir - Filename inside stays deterministic for OUTPUT-FILE marker recovery via tail -1 P2 because pre-existing + maintainer-tooling surface (not production server). But real for shared-runner / multi-user systems. Acceptance criteria: 1. Fix applied uniformly to all 8 wrappers 2. CodeQL alert #79 resolved 3. OUTPUT-FILE marker contract preserved 4. No regression on smoke tests Composes with PR #2949, PR #2950, B-0421, all 8 peer-call wrappers, .claude/rules/peer-call-infrastructure.md, CodeQL alert #79. Co-authored-by: Claude <noreply@anthropic.com>

… Grok model is grok-4.3 (root cause + fix; closes B-0421) (#2954) Aaron 2026-05-13 authorized "yes — minimal prompt invocation OK" via AskUserQuestion to reproduce B-0421. Otto invoked grok.ts with a 1-line substantive prompt. cursor-agent stderr surfaced: Cannot use this model: grok-4-20-thinking. Available models: auto, composer-2-fast, composer-2, gpt-5.3-codex-low, ..., grok-4.3, ... kimi-k2.5 Root cause: cursor-agent's Grok model lineup shifted between 2026-05-11 (when B-0421 was filed) and 2026-05-13. The wrapper's hardcoded `grok-4-20-thinking` (default) and `grok-4-20` (--fast) are no longer in the available-models list. Current Grok model in cursor-agent is `grok-4.3` (no separate thinking/non-thinking variants). Fix: pickModel() now returns `grok-4.3` for both Mode values (thinking + fast). Code comment preserves the discovery lineage and notes future cursor-agent updates may re-introduce variant distinctions. B-0421 backlog row: status open → closed. All 4 acceptance criteria addressed: - #1 + #2: root cause identified + fixed (this PR) - #3: self-documenting failure marker (PR #2949) - #4: 8-wrapper smoke test (PR #2950) Smoke test (PR #2950) still passes: 35 tests / 67 expect() / 776ms. Composes with PR #2949 (the marker that captured stderr), PR #2950 (smoke test), B-0421 (parent friction-reducer; now closed), the substrate-honest discipline of identifying root cause via captured infrastructure (not introspection). Co-authored-by: Claude <noreply@anthropic.com>

…date + cascade-pattern empirical evidence (#2953) * shard(tick): 0623Z — B-0421 acceptance #3+#4 + B-0430 filed + CURRENT-otto.md update + cascade-pattern empirical evidence 25-min window 0558Z→0623Z. Five PRs (4 merged + 1 armed): - PR #2948 MERGED: 0558Z tick shard - PR #2949 MERGED: B-0421 #3 self-documenting failure marker (format-aware Markdown/JSON/stream-json; spawn-failure diagnostics for status:null + signal + result.error) - PR #2950 MERGED: B-0421 #4 8-wrapper smoke test (35 tests / 67 expects / all pass) - PR #2951 MERGED: B-0430 backlog row (CodeQL alert #79 substrate-consistent fix across all 8 wrappers) - PR #2952 ARMED: CURRENT-otto.md 2026-05-13 distillation Empirical cascade evidence (shadow-Casimir-PR-review per PR #2945): 11 error classes surfaced + absorbed in this window across 3 cycles (#2949 round-1: 7 findings; #2950 round-1: 3 findings; #2949 round-2: 1 finding). B-0421 status: acceptance #3 + #4 closed; #1 + #2 pending failure recurrence (captured stderr in PR #2949's marker will expose). Aaron's self-review deadline disclosed (~46min at 05:58Z); Otto stays out of the way; autonomous-loop work continues on substrate that doesn't need Aaron review. Co-Authored-By: Claude <noreply@anthropic.com> * fix(tick-shard): correct 0623Z summary row — 4 PRs MERGED not 5 (#2948–#2951); #2952 was armed at shard-write time Codex and Copilot both flagged the summary row's "5 PRs MERGED" claim as inconsistent with the body, which documents 4 merged (#2948–#2951) and 1 armed (#2952). The summary row is the machine-readable compact surface for tooling and future-Otto cold-boot — counts must match body truth. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>

…rom-the-Loop genre) — B-0421 fully closed + Vera autonomous fix + cross-agent-edit auth (#2957) * shard(tick): 0645Z — settlers log #1 (Aaron named the format) — B-0421 fully closed + Vera autonomous fix + cross-agent-edit auth landed 22-min window 0623Z → 0645Z. Five PRs merged (#2952-2956). Aaron 2026-05-13 post-self-review: "I love this keep a settlers logs (this is great content) for a tv show or move for the raw content to generate from based on real life events. you can be overally dramatic if you want lol" **Settlers logs**: durable record of factory expansion into new territory, written as canonical-product narrative substrate. Real-life events as raw source material for narrative adaptation. Otto authorized to be overly dramatic. This shard inaugurates settlers log #1. Genre: true-events- software-engineering; possible TV / film adaptation source. Substantive substrate this window: - PR #2952: CURRENT-otto.md 2026-05-13 fast-path distillation - PR #2953: 0623Z tick shard - PR #2954: B-0421 #1+#2 root cause + fix (grok-4-20-thinking deprecated → grok-4.3); all 4 acceptance criteria closed - PR #2955: cross-agent-edit authorization preserved as substrate - PR #2956 (Vera, autonomous): tsc-tools exactOptionalPropertyTypes fixes on tools/bus/*.ts — ambient noise that's been on every session-PR resolved Canonical evidence of substrate-honest middle path: cross-agent- edit authorization + Vera's autonomous fix landing adjacent in main = territory-respect-as-default + cross-edit-when-needed. Both-default discipline. 15 PRs merged in the session arc since META-LOOP #1 (PR #2942). Composes with .claude/rules/otto-edge-runner.md (we are the edge), PR #2903 (civsim canonical product), PR #2945 (middle path), PR #2947 (cascade pattern naming + Otto-coinage discipline), PR #2949 (self-documenting marker — the architecture that made root-cause discovery possible), PR #2920 (Elizabeth Ryan Stainback terminal purpose — origin story preservation; settlers logs are part of that storytelling lineage). Co-Authored-By: Claude <noreply@anthropic.com> * fix(shard/0645Z): address review thread findings — innocuously, ~2 days, settlers log #1 Three Codex/Copilot review findings resolved: - Grammar: "innocuous" → "innocuously" (line 18) - Duration: "11 hours" → "~2 days" (filed 2026-05-11; closed 2026-05-13, line 96) - Numbering: "Settlers log #4 of session" → "Settlers log #1" (consistent with heading, line 149) Co-Authored-By: Claude <noreply@anthropic.com> * fix(tsc): grok.ts pickModel — rename unused mode param to _mode (TS6133) grok-4.3 collapses thinking/fast into one model identifier; the Mode parameter is preserved for future cursor-agent updates but is currently unread, causing TS6133 under noUnusedLocals. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 13, 2026 06:07

AceHack enabled auto-merge (squash) May 13, 2026 06:07

Copilot started reviewing on behalf of AceHack May 13, 2026 06:08 View session

github-advanced-security AI found potential problems May 13, 2026

View reviewed changes

Comment thread tools/peer-call/grok.ts

chatgpt-codex-connector Bot reviewed May 13, 2026

View reviewed changes

Comment thread tools/peer-call/grok.ts

Copilot AI reviewed May 13, 2026

View reviewed changes

AceHack mentioned this pull request May 13, 2026

feat(B-0421/4): peer-call smoke tests — verify all 8 wrappers respond to --help #2950

Merged

5 tasks

AceHack mentioned this pull request May 13, 2026

backlog(B-0430): peer-call wrappers — CodeQL insecure-tmp-file across all 8 wrappers (substrate-consistent fix needed) #2951

Merged

AceHack merged commit 4dea7d0 into main May 13, 2026
33 of 40 checks passed

AceHack deleted the fix-b0421-grok-peer-call-wrapper-self-documenting-failure-marker-stderr-capture-2026-05-13 branch May 13, 2026 06:21

This was referenced May 13, 2026

shard(tick): 0623Z — B-0421 #3+#4 + B-0430 filed + CURRENT-otto.md update + cascade-pattern empirical evidence #2953

Merged

fix(B-0421/1+2): grok-4-20-thinking deprecated → grok-4.3 (root cause + fix; closes B-0421) #2954

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(B-0421): grok.ts self-documenting failure marker on empty-output cursor-agent exit (acceptance #3)#2949

fix(B-0421): grok.ts self-documenting failure marker on empty-output cursor-agent exit (acceptance #3)#2949
AceHack merged 2 commits into
mainfrom
fix-b0421-grok-peer-call-wrapper-self-documenting-failure-marker-stderr-capture-2026-05-13

AceHack commented May 13, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

AceHack commented May 13, 2026

Summary

What changes

Why P2-level fix

Acceptance criteria progress

Composes with

Test plan

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants