Skip to content

fix(openai): unwrap SELECTION-type setting to a string — TRUE root cause of HTTP 400#39

Merged
ComBba merged 1 commit into
mainfrom
fix/openai-400-unwrap-selection
May 13, 2026
Merged

fix(openai): unwrap SELECTION-type setting to a string — TRUE root cause of HTTP 400#39
ComBba merged 1 commit into
mainfrom
fix/openai-400-unwrap-selection

Conversation

@ComBba
Copy link
Copy Markdown
Contributor

@ComBba ComBba commented May 13, 2026

TRUE ROOT CAUSE

settings.get('openaiModel') on a Devvit SELECTION-type field returns a string array (["gpt-5.4-mini"]), not a string. We were passing the raw array straight into the request body as "model": ["gpt-5.4-mini"]. OpenAI rejected this with the misleading error We could not parse the JSON body — which sounds like structural JSON error but is actually a type mismatch on the model field.

Production proof (v0.0.39 logs)

[vibe-mod] callOpenAI: openaiModel raw = ["gpt-5.4-mini"]

This single bug explains why PR #32-#37 (six rounds of unrelated body-shape and content fixes) all still returned HTTP 400. None of them touched the model field. PR #38 worked around it by hardcoding gpt-5.4-nano (a string literal), at which point callOpenAI started succeeding — but our few-shot was still in the broken flattened format from PR #37, so v0.0.39 returned a different toast: The compiled rule contained an action this app does not support. That's a downstream issue, not the transit bug.

Why probes (b)/(d)/(e)/(f) all returned 200

All probes hardcoded 'gpt-5.4-nano' (a string literal) and so sent valid "model": "gpt-5.4-nano" to OpenAI. Only callOpenAI read through settings.get(...) and got the array-of-string-instead-of-string mismatch.

Fix

  1. Unwrap the SELECTION result: if Array.isArray(raw), take raw[0].
  2. Log both the raw and unwrapped values for diagnostic visibility.
  3. Honour the user-configured model (default gpt-5.4-mini).
  4. Restore inline JSON.stringify(ex.assistant) for each few-shot example — re-teaches the model the correct {action, params} shape that the executor validates.

Tests

  • npm run check 4/4 PASS.
  • After merge: devvit upload → v0.0.40 → Chrome verify → production callOpenAI returns 200 → toast shows compile success.

Postmortem note

The misleading OpenAI error wording ("could not parse the JSON body" for a type mismatch on a single field, rather than e.g. "Invalid type for parameter 'model'") cost us seven PRs and ~40 minutes of bisecting. We tried body shape, escape density, byte vs string, message count, size, Content-Length, JSON-syntax-in-content — none addressed the actual issue. The diagnostic console.log added in PR #38 ("settings.get(openaiModel) = ...") surfaced the array in one line.

🤖 Generated with Claude Code

…use of HTTP 400

# Root cause

`settings.get('openaiModel')` returns the value of a Devvit SELECTION
schema field. SELECTION fields return a `string[]` -- even for single
selection. We were passing the raw array straight into the request body
as `"model": ["gpt-5.4-mini"]`, which OpenAI rejected as unparseable
JSON for the `model` field. OpenAI's error wording masked the underlying
problem: it said `We could not parse the JSON body of your request`,
which sounds like a structural JSON error but is actually a type
mismatch on a single field. Production log proof from v0.0.39:

  [vibe-mod] callOpenAI: openaiModel raw = ["gpt-5.4-mini"]

This single bug accounts for PR #32-#37 all returning HTTP 400 with
seemingly unrelated changes. PR #38 (v0.0.39) worked around it by
hardcoding `gpt-5.4-nano` (a string), which made the request body
valid -- callOpenAI succeeded, the toast became
`The compiled rule contained an action this app does not support`
(a downstream issue, not a transit one).

# Fix

* Unwrap the SELECTION result: if `Array.isArray(raw)`, take `raw[0]`.
* Log both the raw and unwrapped values for diagnostic visibility.
* Honour the user-configured model (default `gpt-5.4-mini`).

# Restore proper few-shot examples

PR #37 had stripped JSON syntax from message content (`flattenExample`,
output as `key=value; key=value`) on a false hypothesis. With JSON
syntax removed, the model learned a non-schema output shape -- v0.0.39
emitted `{"modqueue":{"note":"..."}}` instead of `{"action":"modqueue",
"params":{"note":"..."}}`. Restoring `JSON.stringify(ex.assistant)` for
each example re-teaches the schema.

Single-user composition (PR #32) is preserved -- it's a sensible
packaging choice and was never causing the 400; the model array was.

# Tests

* `npm run check` 4/4 PASS.
* Local POST of the new body to OpenAI returns 200 with a valid
  compiled rule conforming to the SAFE-action schema.
* After merge: `devvit upload` -> v0.0.40 -> Chrome verify -> production
  callOpenAI returns 200 -> toast shows compile success.

# Why probe(b)/(d)/(e)/(f) all worked

The probes hardcoded `'gpt-5.4-nano'` (a string literal) -- so they
sent a valid `model` field and OpenAI parsed their requests fine. Only
callOpenAI used `settings.get(...)` and got hit by the array bug.
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

Warning

Rate limit exceeded

@ComBba has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 43 minutes and 6 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f1918fd2-059d-4a94-8f1a-53341c74012b

📥 Commits

Reviewing files that changed from the base of the PR and between b2cc1c1 and 6a59ea2.

📒 Files selected for processing (1)
  • src/server/index.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/openai-400-unwrap-selection

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ComBba ComBba merged commit ddad44e into main May 13, 2026
2 checks passed
@ComBba ComBba deleted the fix/openai-400-unwrap-selection branch May 13, 2026 16:01
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a bug where the openaiModel setting was incorrectly being sent as an array instead of a string, and restores the use of structured JSON in few-shot examples now that the model-array issue is resolved. The reviewer suggests that since the previous 'JSON-syntax-free' strategy has been abandoned, the prompt construction should move away from single-line collapsing and instead use newlines to better delineate sections, providing a code suggestion to implement this improved structure.

Comment thread src/server/index.ts
Comment on lines 1322 to +1334
const collapse = (s: string) => s.replace(/\s+/g, ' ').trim();
// Recursively flatten any value to a JSON-syntax-free string. Keys use
// `key=value`, arrays use `[a; b; c]`-but-with-no-brackets style, objects
// are space-separated `key=value` pairs.
const flattenValue = (v: unknown): string => {
if (v == null) return 'null';
if (typeof v === 'string') return v.replace(/\s+/g, ' ');
if (typeof v === 'number' || typeof v === 'boolean') return String(v);
if (Array.isArray(v)) return v.map(flattenValue).join(' or ');
if (typeof v === 'object') {
return Object.entries(v as Record<string, unknown>)
.map(([k, val]) => `${k}=${flattenValue(val)}`)
.join(' ');
}
return String(v);
};
const flattenExample = (ex: (typeof FEW_SHOT_EXAMPLES)[number]): string => {
const obj = ex.assistant as Record<string, unknown>;
const flat = Object.entries(obj)
.map(([k, v]) => `${k}=${flattenValue(v)}`)
.join('; ');
return `EXAMPLE INPUT ${collapse(ex.user)} EXAMPLE OUTPUT ${flat}`;
};
const SHORT_PROMPT_CHARS = 3500;
const systemShort = collapse(VIBE_MOD_SYSTEM_PROMPT).slice(0, SHORT_PROMPT_CHARS);
const firstExample = FEW_SHOT_EXAMPLES[0];
const exampleText = firstExample ? flattenExample(firstExample) : '';
const exampleBlock = (ex: (typeof FEW_SHOT_EXAMPLES)[number], i: number): string =>
`EXAMPLE ${i + 1} INPUT: ${collapse(ex.user)} EXAMPLE ${i + 1} OUTPUT: ${JSON.stringify(ex.assistant)}`;
const compositeContent = [
'SYSTEM INSTRUCTIONS',
systemShort,
exampleText,
'TASK INPUT',
'SYSTEM INSTRUCTIONS:',
collapse(VIBE_MOD_SYSTEM_PROMPT),
'EXAMPLES:',
...FEW_SHOT_EXAMPLES.map(exampleBlock),
'TASK INPUT:',
collapse(userRule.slice(0, 1000)),
...(clarif ? ['TASK CLARIFICATION', collapse(clarif)] : []),
'OUTPUT strict JSON only no prose',
]
.filter(Boolean)
.join(' ');
...(clarif ? ['TASK CLARIFICATION:', collapse(clarif)] : []),
'OUTPUT (strict JSON object only, no prose):',
].join(' ');
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Now that the 'Round-6 strategy' of removing all JSON syntax has been abandoned to restore proper JSON few-shot examples (line 1313), the use of collapse and join(' ') to create a single-line prompt is less justified. While the author mentions a 'transit-safe' shape, the restoration of JSON.stringify already introduces significant escape sequences (\", {, }, etc.).

Using newlines and preserving the structure of the system prompt would likely improve the model's ability to follow instructions and distinguish between sections (Instructions, Examples, Task Input). Consider using \n as a separator and removing the collapse calls on the system prompt and user input.

Suggested change
const collapse = (s: string) => s.replace(/\s+/g, ' ').trim();
// Recursively flatten any value to a JSON-syntax-free string. Keys use
// `key=value`, arrays use `[a; b; c]`-but-with-no-brackets style, objects
// are space-separated `key=value` pairs.
const flattenValue = (v: unknown): string => {
if (v == null) return 'null';
if (typeof v === 'string') return v.replace(/\s+/g, ' ');
if (typeof v === 'number' || typeof v === 'boolean') return String(v);
if (Array.isArray(v)) return v.map(flattenValue).join(' or ');
if (typeof v === 'object') {
return Object.entries(v as Record<string, unknown>)
.map(([k, val]) => `${k}=${flattenValue(val)}`)
.join(' ');
}
return String(v);
};
const flattenExample = (ex: (typeof FEW_SHOT_EXAMPLES)[number]): string => {
const obj = ex.assistant as Record<string, unknown>;
const flat = Object.entries(obj)
.map(([k, v]) => `${k}=${flattenValue(v)}`)
.join('; ');
return `EXAMPLE INPUT ${collapse(ex.user)} EXAMPLE OUTPUT ${flat}`;
};
const SHORT_PROMPT_CHARS = 3500;
const systemShort = collapse(VIBE_MOD_SYSTEM_PROMPT).slice(0, SHORT_PROMPT_CHARS);
const firstExample = FEW_SHOT_EXAMPLES[0];
const exampleText = firstExample ? flattenExample(firstExample) : '';
const exampleBlock = (ex: (typeof FEW_SHOT_EXAMPLES)[number], i: number): string =>
`EXAMPLE ${i + 1} INPUT: ${collapse(ex.user)} EXAMPLE ${i + 1} OUTPUT: ${JSON.stringify(ex.assistant)}`;
const compositeContent = [
'SYSTEM INSTRUCTIONS',
systemShort,
exampleText,
'TASK INPUT',
'SYSTEM INSTRUCTIONS:',
collapse(VIBE_MOD_SYSTEM_PROMPT),
'EXAMPLES:',
...FEW_SHOT_EXAMPLES.map(exampleBlock),
'TASK INPUT:',
collapse(userRule.slice(0, 1000)),
...(clarif ? ['TASK CLARIFICATION', collapse(clarif)] : []),
'OUTPUT strict JSON only no prose',
]
.filter(Boolean)
.join(' ');
...(clarif ? ['TASK CLARIFICATION:', collapse(clarif)] : []),
'OUTPUT (strict JSON object only, no prose):',
].join(' ');
const exampleBlock = (ex: (typeof FEW_SHOT_EXAMPLES)[number], i: number): string =>
`EXAMPLE ${i + 1} INPUT: ${ex.user.trim()}\nEXAMPLE ${i + 1} OUTPUT: ${JSON.stringify(ex.assistant)}`;
const compositeContent = [
'SYSTEM INSTRUCTIONS:',
VIBE_MOD_SYSTEM_PROMPT.trim(),
'',
...(FEW_SHOT_EXAMPLES.length > 0 ? ['EXAMPLES:', ...FEW_SHOT_EXAMPLES.map(exampleBlock), ''] : []),
'TASK INPUT:',
userRule.slice(0, 1000).trim(),
...(clarif ? ['', 'TASK CLARIFICATION:', clarif.trim()] : []),
'',
'OUTPUT (strict JSON object only, no prose):',
].join('\n');

ComBba added a commit that referenced this pull request May 13, 2026
…ler-unwrap

fix(openai): apply SELECTION-array unwrap to submit-handler llmModel (follow-up to #39)
ComBba pushed a commit that referenced this pull request May 14, 2026
External code-review pointed out 5 valid issues. All addressed in this commit
on the same branch — PR #44 picks them up automatically.

## Gemini code-assist

### #1 — Manage menu: parallelize per-rule dry-run reads
The previous serial `for` loop multiplied Redis latency by the draft count.
With up to 50 drafts and ~10ms per round-trip the difference is ~500ms vs
~10ms wall time. Switched to `Promise.allSettled([...])` so a single failed
fetch doesn't stop the others.

### #2 — applyManageActions: enforce the 50-rule cap on apply
pause / activate / unpause moves can push a bundle over the 50-rule cap
even when no compose-rule call is made. Added an explicit check before
the dual-write txn — the moderator gets a "Rule cap exceeded (active=N,
draft=N, max=50). Delete a rule first" toast and the bundles stay
untouched. New test in routes-manage.test.ts.

### #3 — Use the configured openaiModel as the fallback (not the literal "manage")
The previous `'manage'` fallback for `bundle.llmModel` broke
`estimateTokenCost()` because 'manage' isn't a key in
`OPENAI_PRICING_USD_PER_TOKEN`. The dashboard shows "(~$0 on manage)" until
a real compile lands. Now we read the configured `openaiModel` setting
through `readOpenaiModel()` (already SELECTION-array-safe per PR #39).

### #4 — Atomic dual-write via WATCH/MULTI/EXEC (the previous PR description's claim was wrong)
The previous `Promise.all([redis.set(activeKey, ...), redis.set(draftKey, ...)])`
was NOT atomic — if the second `set` failed (e.g. plugin RPC blip), pause
could leave a rule absent from active without showing up in draft.
Switched to Devvit's `redis.watch(...).multi().set(...).set(...).exec()`
pattern (same shape executor.ts already uses for audit writes), and check
`exec()`'s null return for the WATCH-aborted case so two moderators
managing rules at the same time get an actionable "Another moderator
changed the rules at the same time — re-open Manage rules" toast.

## CodeRabbit

### #5 — humanizeRule: cap value-stringification at 100 chars
A rule that uses `op: in` against a long allowlist (e.g. 50 banned domains)
would otherwise render thousands of characters into the confirm form's
`compiledSummary` field, blowing past Devvit's modal description budget
and obscuring the actual rule structure. Truncate with `...` after 100
chars in the predicate-tree leaf renderer.

## Test infrastructure
test/devvit-testkit.ts: fakeRedis.watch().exec() now returns `[] | null`
(matching real Devvit Redis MULTI/EXEC semantics) instead of `void`.
Without this, the new WATCH-abort detection in applyManageActions would
fire on every test. Updated `FakeTxn.exec` type accordingly.

## Verification
- `npm run check` 4/4 gates green
- 210 tests pass (1 skipped) — added 1 cap-overflow test in routes-manage.test.ts
ComBba added a commit that referenced this pull request May 15, 2026
…ction

fix(openai): unwrap SELECTION-type setting to a string — TRUE root cause of HTTP 400
ComBba pushed a commit that referenced this pull request May 15, 2026
…odel

The submit handler at src/server/index.ts:477-482 reads `openaiModel` via
`settings.get` and assigns to `llmModel` (stored as draft metadata). Same
SELECTION-array bug as PR #39 fixed in callOpenAI: without unwrap,
draft.llmModel would hold `["gpt-5.4-mini"]` instead of `"gpt-5.4-mini"`,
which could break later RuleBundle parses.

Add `.venv-chrome-auth` and `playwright/.auth` to .prettierignore — these
are runtime artifacts from chrome-auth-test verification infrastructure.

Gates: `npm run check` 4/4 PASS.
ComBba added a commit that referenced this pull request May 15, 2026
…ler-unwrap

fix(openai): apply SELECTION-array unwrap to submit-handler llmModel (follow-up to #39)
ComBba pushed a commit that referenced this pull request May 15, 2026
…branch cleanup record

Records the 7-round speculative-fix loop that PR #32-#38 went through,
the actual root cause (PR #39: SELECTION-typed setting returning string[]
not string), and the end-to-end verification: production toast 'Compiled
rule "New-account posts to mod queue". Dry-run started -- check Dashboard
in 30s.' captured via Chrome on v0.0.41.

Probe code (synthetic-compile-probe handler + scheduler entry) lived only
on the fix/openai-error-handling branch and never reached main; that
branch is now deleted from origin. grep -c synthetic-compile-probe across
src/server/index.ts and devvit.json: 0 + 0.

No code change. Closes the postmortem loop on issue and satisfies the
goal-condition (5) requirement for an explicit MERGED PR surface
recording the probe-removal decision.
ComBba pushed a commit that referenced this pull request May 15, 2026
…Text + toast 1-line summary

Audit-driven UX quick wins for the compose flow. Full audit + scenario rationale:
- claudedocs/2026-05-14-compose-flow-audit.md (10 findings vs README promise matrix)
- docs/demo-scenario.md (60s demo, README-aligned, Hypothesis 3)

## What changes (3 audit findings landed)

### Finding #1 (CRITICAL for demo) — Clarify modal renders LLM suggestedAnswers as a dropdown
The LLM has *always* returned `{ needsClarification, question, suggestedAnswers: [...] }`
(see system-prompt.ts:80-85), but the server discarded `suggestedAnswers` and
showed only a free-text textarea. Moderators had to paraphrase the question.

Now: when `suggestedAnswers` is present, render a `select` field with one click
per option PLUS a `clarificationAnswerOther` paragraph as an override / escape
hatch. When the LLM returns no suggestions, fall back to the original
free-text path. `compose-rule-submit` defensively unwraps SELECTION-array
values (PR #39 pattern) and prefers the "Other" override when non-empty.

### Finding #4 — ban/mute toggle gets explanatory helpText on both forms
Mod mental model = "checkbox grants ban permission". Real behaviour =
"checkbox lets compile succeed when the LLM emits ban/mute". Added a single
shared `ALLOW_GUARDED_HELP` string used by the compose form and the clarify
modal so both forms explain the toggle the same way.

### Finding #6 — Success toast carries a 1-line summary + an explicit menu pointer
Previously: `Compiled rule "X". Dry-run started — check Dashboard in 30s.`
The dashboard menu name is "vibe-mod: View rules + log" and Devvit toasts have
no buttons, so "check Dashboard" is unreachable for a first-time mod.

Now: `Compiled rule "X". → post: modqueue. Dry-run started — open the
subreddit ⋯ menu → "vibe-mod: View rules + log" to see preview.` The
1-line `summarizeRule()` makes the deterministic JSON contract visible
without leaving the toast (audit finding #2 alternative — the full preview
form stays a post-publish item).

## Verification
- `npm run check` 4/4 gates green (typecheck + lint + Prettier + 25 compose tests + 3 @devvit/test + acceptance G1–G4)
- 6 new tests in routes-compose.test.ts cover: select rendering, free-text fallback, SELECTION-array unwrap, "Other" override precedence, helpText presence, summary in success toast.

## Why this PR (not bundled with module split)
Demo video records this UI. UX changes must land before the README screenshot
captures (Phase 3) and the Playwright video automation (Phase 4). Module
split (Phase 2) is a separate, independent refactor.

Refs: claudedocs/2026-05-14-compose-flow-audit.md (Phase 1.5/1.5b),
      docs/demo-scenario.md (Phase 2.5b)
ComBba pushed a commit that referenced this pull request May 15, 2026
External code-review pointed out 5 valid issues. All addressed in this commit
on the same branch — PR #44 picks them up automatically.

## Gemini code-assist

### #1 — Manage menu: parallelize per-rule dry-run reads
The previous serial `for` loop multiplied Redis latency by the draft count.
With up to 50 drafts and ~10ms per round-trip the difference is ~500ms vs
~10ms wall time. Switched to `Promise.allSettled([...])` so a single failed
fetch doesn't stop the others.

### #2 — applyManageActions: enforce the 50-rule cap on apply
pause / activate / unpause moves can push a bundle over the 50-rule cap
even when no compose-rule call is made. Added an explicit check before
the dual-write txn — the moderator gets a "Rule cap exceeded (active=N,
draft=N, max=50). Delete a rule first" toast and the bundles stay
untouched. New test in routes-manage.test.ts.

### #3 — Use the configured openaiModel as the fallback (not the literal "manage")
The previous `'manage'` fallback for `bundle.llmModel` broke
`estimateTokenCost()` because 'manage' isn't a key in
`OPENAI_PRICING_USD_PER_TOKEN`. The dashboard shows "(~$0 on manage)" until
a real compile lands. Now we read the configured `openaiModel` setting
through `readOpenaiModel()` (already SELECTION-array-safe per PR #39).

### #4 — Atomic dual-write via WATCH/MULTI/EXEC (the previous PR description's claim was wrong)
The previous `Promise.all([redis.set(activeKey, ...), redis.set(draftKey, ...)])`
was NOT atomic — if the second `set` failed (e.g. plugin RPC blip), pause
could leave a rule absent from active without showing up in draft.
Switched to Devvit's `redis.watch(...).multi().set(...).set(...).exec()`
pattern (same shape executor.ts already uses for audit writes), and check
`exec()`'s null return for the WATCH-aborted case so two moderators
managing rules at the same time get an actionable "Another moderator
changed the rules at the same time — re-open Manage rules" toast.

## CodeRabbit

### #5 — humanizeRule: cap value-stringification at 100 chars
A rule that uses `op: in` against a long allowlist (e.g. 50 banned domains)
would otherwise render thousands of characters into the confirm form's
`compiledSummary` field, blowing past Devvit's modal description budget
and obscuring the actual rule structure. Truncate with `...` after 100
chars in the predicate-tree leaf renderer.

## Test infrastructure
test/devvit-testkit.ts: fakeRedis.watch().exec() now returns `[] | null`
(matching real Devvit Redis MULTI/EXEC semantics) instead of `void`.
Without this, the new WATCH-abort detection in applyManageActions would
fire on every test. Updated `FakeTxn.exec` type accordingly.

## Verification
- `npm run check` 4/4 gates green
- 210 tests pass (1 skipped) — added 1 cap-overflow test in routes-manage.test.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant