feat(providers/pi): best-effort structured output via prompt engineering#1297
feat(providers/pi): best-effort structured output via prompt engineering#1297
Conversation
Pi's SDK has no native JSON-schema mode (unlike Claude's outputFormat / Codex's outputSchema). Previously Pi declared structuredOutput: false and any workflow using output_format silently degraded — the node ran, the transcript was treated as free text, and downstream $nodeId.output.field refs resolved to empty strings. 8 bundled/repo workflows across 10 nodes were affected (archon-create-issue, archon-fix-github-issue, archon-smart-pr-review, archon-workflow-builder, archon-validate-pr, etc.). This PR closes the gap via prompt engineering + post-parse: 1. When requestOptions.outputFormat is present, the provider appends a "respond with ONLY a JSON object matching this schema" instruction plus JSON.stringify(schema) to the prompt before calling session.prompt(). 2. bridgeSession accepts an optional jsonSchema param. When set, it buffers every assistant text_delta and — on the terminal result chunk — parses the buffer via tryParseStructuredOutput (trims whitespace, strips ```json / ``` fences, JSON.parse). On success, attaches structuredOutput to the result chunk (matching Claude's shape). On failure, emits a warn event and leaves structuredOutput undefined so the executor's existing dag.structured_output_missing path handles it. 3. Flipped PI_CAPABILITIES.structuredOutput to true. Unlike Claude/Codex this is best-effort, not SDK-enforced — reliable on GPT-5, Claude, Gemini 2.x, recent Qwen Coder, DeepSeek V3, less reliable on smaller or older models that ignore JSON-only instructions. Tests added (14 total): - tryParseStructuredOutput: clean JSON, fenced, bare fences, arrays, whitespace, empty, prose-wrapped (fails), malformed, inner backticks - augmentPromptForJsonSchema via provider integration: schema appended, prompt unchanged when absent - End-to-end: clean JSON → structuredOutput parsed; fenced JSON parses; prose-wrapped → no structuredOutput + no crash; no outputFormat → never sets structuredOutput even if assistant happens to emit JSON Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis pull request enables structured output support for the Pi provider by declaring the capability, implementing JSON parsing logic to extract structured data from assistant responses, augmenting prompts with schema information, and connecting these pieces through the provider and event-bridge layers. Changes span capability declarations, parsing utilities, prompt augmentation, and corresponding test coverage. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant PiProvider
participant augmentPromptForJsonSchema
participant bridgeSession
participant eventBridge as Event Bridge<br/>(session subscription)
participant tryParseStructuredOutput
participant Assistant
User->>PiProvider: sendQuery(prompt, outputFormat)
alt outputFormat provided (with schema)
PiProvider->>augmentPromptForJsonSchema: augmentPromptForJsonSchema(prompt, schema)
augmentPromptForJsonSchema-->>PiProvider: effectivePrompt (with JSON instructions)
else no outputFormat
PiProvider->>PiProvider: effectivePrompt = original prompt
end
PiProvider->>bridgeSession: bridgeSession(session, effectivePrompt, abortSignal, schema)
Note over bridgeSession: wantsStructured = schema !== undefined
bridgeSession->>eventBridge: subscribe to session
loop on message chunks
Assistant-->>eventBridge: MessageChunk (type: 'text_delta')
alt wantsStructured
eventBridge->>eventBridge: accumulate content → assistantBuffer
end
end
Assistant-->>eventBridge: terminal chunk (type: 'result')
alt wantsStructured
eventBridge->>tryParseStructuredOutput: tryParseStructuredOutput(assistantBuffer)
alt parse succeeds
tryParseStructuredOutput-->>eventBridge: parsed JSON object/array
eventBridge->>eventBridge: attach structuredOutput to result
else parse fails
tryParseStructuredOutput-->>eventBridge: undefined
eventBridge->>eventBridge: log warning, leave structuredOutput unset
end
end
eventBridge-->>PiProvider: result chunk (with structuredOutput if successful)
PiProvider-->>User: AgentMessage (with structuredOutput)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/providers/src/community/pi/event-bridge.ts`:
- Around line 273-280: When jsonSchema is provided (wantsStructured) validate
the parsed JSON against that schema before attaching it as structured output:
after parsing the chunk JSON (the code around the parsing at lines ~340-343) run
a schema validation step against jsonSchema and if validation fails treat it
like a parse failure—do not set/attach the structured output field on the
MessageChunk and follow the same error/suppression flow as parse errors; use the
existing variables (wantsStructured, jsonSchema, MessageChunk, BridgeQueueItem)
and existing error handling path so schema mismatches are not silently accepted.
In `@packages/providers/src/community/pi/provider.ts`:
- Around line 77-88: The prompt in augmentPromptForJsonSchema forces an "object"
response which conflicts with schemas that may be arrays or scalars and with
tryParseStructuredOutput which accepts non-objects; update the wording to be
schema-root-neutral (e.g., say "Respond with ONLY a JSON value that matches the
schema below" or similar), keep the constraints "No prose before or after... No
markdown code fences", and ensure the function augmentPromptForJsonSchema
returns the revised instruction string so models can return arrays/scalars as
valid JSON matching the provided schema.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e94a4442-5761-460c-ac7e-b3abcf1a2d17
📒 Files selected for processing (6)
packages/providers/src/community/pi/capabilities.tspackages/providers/src/community/pi/event-bridge.test.tspackages/providers/src/community/pi/event-bridge.tspackages/providers/src/community/pi/provider.test.tspackages/providers/src/community/pi/provider.tspackages/providers/src/registry.test.ts
| abortSignal?: AbortSignal, | ||
| jsonSchema?: Record<string, unknown> | ||
| ): AsyncGenerator<MessageChunk> { | ||
| const queue = new AsyncQueue<BridgeQueueItem>(); | ||
| // Best-effort structured-output buffer. Only accumulates when the caller | ||
| // requested a JSON schema; otherwise stays empty and the terminal chunk | ||
| // passes through untouched. | ||
| const wantsStructured = jsonSchema !== undefined; |
There was a problem hiding this comment.
Validate parsed JSON against jsonSchema before attaching it.
Line 341 only parses JSON, and Line 343 treats any valid JSON as structured output. If the schema requires { area: string } but the model returns {"ok":true}, the executor will suppress dag.structured_output_missing and downstream field refs can silently degrade. Treat schema mismatches like parse failures.
Also applies to: 340-343
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/providers/src/community/pi/event-bridge.ts` around lines 273 - 280,
When jsonSchema is provided (wantsStructured) validate the parsed JSON against
that schema before attaching it as structured output: after parsing the chunk
JSON (the code around the parsing at lines ~340-343) run a schema validation
step against jsonSchema and if validation fails treat it like a parse failure—do
not set/attach the structured output field on the MessageChunk and follow the
same error/suppression flow as parse errors; use the existing variables
(wantsStructured, jsonSchema, MessageChunk, BridgeQueueItem) and existing error
handling path so schema mismatches are not silently accepted.
| export function augmentPromptForJsonSchema( | ||
| prompt: string, | ||
| schema: Record<string, unknown> | ||
| ): string { | ||
| return `${prompt} | ||
|
|
||
| --- | ||
|
|
||
| CRITICAL: Respond with ONLY a JSON object matching the schema below. No prose before or after the JSON. No markdown code fences. Just the raw JSON object as your final message. | ||
|
|
||
| Schema: | ||
| ${JSON.stringify(schema, null, 2)}`; |
There was a problem hiding this comment.
Avoid forcing object-shaped output for every schema.
Line 85 asks for a JSON object, but JSON Schema can describe arrays/scalars and tryParseStructuredOutput already accepts arrays. This can steer Pi models away from valid non-object schemas; use schema-root-neutral wording.
Suggested prompt wording
-CRITICAL: Respond with ONLY a JSON object matching the schema below. No prose before or after the JSON. No markdown code fences. Just the raw JSON object as your final message.
+CRITICAL: Respond with ONLY valid JSON matching the schema below. No prose before or after the JSON. No markdown code fences. Just the raw JSON as your final message.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/providers/src/community/pi/provider.ts` around lines 77 - 88, The
prompt in augmentPromptForJsonSchema forces an "object" response which conflicts
with schemas that may be arrays or scalars and with tryParseStructuredOutput
which accepts non-objects; update the wording to be schema-root-neutral (e.g.,
say "Respond with ONLY a JSON value that matches the schema below" or similar),
keep the constraints "No prose before or after... No markdown code fences", and
ensure the function augmentPromptForJsonSchema returns the revised instruction
string so models can return arrays/scalars as valid JSON matching the provided
schema.
Archon PR Validation ReportVerdict: ✅ APPROVE SummaryThis PR correctly addresses a real silent-degradation gap where Pi's provider ignored Bug Confirmation
Fix Quality: 5/5
IssuesNo blocking issues found. Validated by archon-validate-pr workflow |
Summary
Pi's SDK has no native JSON-schema mode (unlike Claude's
outputFormat/ Codex'soutputSchema). Previously Pi declaredstructuredOutput: falseand any workflow usingoutput_format:silently degraded — the node ran, the transcript was treated as free text, and downstream\$nodeId.output.fieldrefs resolved to empty strings.8 bundled / repo workflows across 10 nodes were affected:
Implementation
Prompt augmentation (`pi/provider.ts`): when `requestOptions.outputFormat` is present, append a "respond with ONLY a JSON object matching this schema" instruction + `JSON.stringify(schema, null, 2)` to the user prompt before calling `session.prompt()`.
Post-parse on terminal chunk (`pi/event-bridge.ts`): `bridgeSession` accepts an optional `jsonSchema` param. When set, it buffers every assistant `text_delta` and — on the terminal result chunk — parses the buffer via `tryParseStructuredOutput` (trims whitespace, strips ````json / ```` fences, `JSON.parse`). On success, attaches `structuredOutput` to the result chunk (matching Claude's shape). On failure, logs a warn event and leaves `structuredOutput` undefined — the executor's existing `dag.structured_output_missing` path handles degradation (downstream `$node.output.field` refs substitute empty strings, user sees a warning).
Capability flag (`pi/capabilities.ts`): `structuredOutput: false` → `true`. Commented clearly that this is best-effort, not SDK-enforced.
What works, what degrades
Works reliably: GPT-5, Claude (via OpenRouter), Gemini 2.x, recent Qwen Coder, DeepSeek V3 — all follow "JSON only" instructions consistently. Fence-stripper catches the most common compliance slip (model wraps in ````json````).
Degrades cleanly: smaller or older models that emit prose like "Here's the JSON you requested: {...}" before the object. `JSON.parse` fails → `structuredOutput` stays undefined → same behavior as before this PR (but with an explicit warn log so users can diagnose).
Tests added (14 new)
Test plan
Blast radius
3 production files touched, ~90 LOC (core) + ~200 LOC (tests). Behavior-preserving when `outputFormat` is absent. Reversible by flipping `PI_CAPABILITIES.structuredOutput` back to `false`.
🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes