feat(openai): tune compile call (reasoning_effort: none, verbosity: low), default gpt-5.4-mini#11
Conversation
…none, verbosity: low; default gpt-5.4-mini
The single LLM call vibe-mod makes (callOpenAI: NL rule → strict JSON, or a
clarification) is mechanical translation, not reasoning. Configure it as such:
- reasoning_effort: 'none' — no hidden reasoning; fast (~1.2–1.4s) and keeps
the token budget from being eaten by reasoning. NB: 'none' is the gpt-5.4
family's value; the gpt-5.0/5.1-era 'minimal' is rejected by 5.4 ("Supported
values are: 'none','low','medium','high','xhigh'"), and gpt-5-mini wants
'minimal' — so this is 5.4-family-specific, which is fine since the model
options are restricted to that family.
- verbosity: 'low' — terse JSON, no commentary.
- max_completion_tokens: 600 (down from 700) — a compiled rule + a clarification
fit comfortably; observed worst case ~150 out tokens.
- still no `temperature` (gpt-5.x only accepts the default).
devvit.json openaiModel options trimmed/relabelled to the three viable picks
with measured numbers, default switched gpt-5.4-nano → gpt-5.4-mini:
gpt-5.4-mini 7/7 median ~1.2s max ~1.8s ← recommended, fastest
gpt-5.4-nano 7/7 median ~1.5s max ~1.7s ← cheapest
gpt-5.4 7/7 median ~2.1s max ~4.2s ← full; slower, more cautious on
ambiguous rules, no quality gain
(index.ts fallback model also updated nano → mini.)
smoketest: default request config now mirrors callOpenAI (reasoning_effort/
verbosity/max_completion_tokens), with REASONING_EFFORT/VERBOSITY/
MAX_COMPLETION_TOKENS env overrides for experiments; added per-call latency and
the OPENAI_MODELS=a,b,c comparison table in earlier commits. Tightened one test
case to an explicit threshold ("more than 90% of the letters are uppercase") so
it doesn't penalise the more-cautious model for asking — now 7/7 on all three.
tsc/lint/format/tests(152)/acceptance(4/4) all green; smoke test 7/7 × 3 models.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request updates the default OpenAI model to gpt-5.4-mini and introduces specific tuning parameters—reasoning_effort, verbosity, and a reduced max_completion_tokens—to optimize for speed and cost. The smoketest script was also updated to allow experimentation via environment variables. Feedback was provided regarding the hardcoding of these new parameters in the production API call, as it may cause errors for older models or non-standard API implementations; a conditional approach is recommended to ensure backward compatibility and API stability.
| body: JSON.stringify({ | ||
| model, | ||
| model, // gpt-5.4-mini (default) / gpt-5.4-nano / gpt-5.4 — see devvit.json openaiModel | ||
| response_format: { type: 'json_object' }, | ||
| messages, | ||
| // Newer OpenAI models (gpt-5.x family) require max_completion_tokens (not max_tokens) | ||
| // and only accept the default temperature, so we don't send `temperature`. Determinism | ||
| // is carried by response_format: json_object + the strict prompt + few-shot examples. | ||
| max_completion_tokens: 700, | ||
| // Tuned for what this call is: a mechanical NL → strict-JSON translation. | ||
| // reasoning_effort: 'none' — no hidden reasoning needed; keeps it fast and stops the | ||
| // token budget being eaten by reasoning (gpt-5.4 family value; | ||
| // older models call this 'minimal'). Measured ~1.1–1.4s. | ||
| // verbosity: 'low' — terse JSON, no commentary. | ||
| // max_completion_tokens — a compiled rule + a clarification fit well under 600. | ||
| // (no `temperature` — the gpt-5.x family only accepts the default; max_tokens isn't | ||
| // supported on these models, use max_completion_tokens.) | ||
| reasoning_effort: 'none', | ||
| verbosity: 'low', | ||
| max_completion_tokens: 600, | ||
| }), |
There was a problem hiding this comment.
Hardcoding reasoning_effort and verbosity in the request body can lead to API errors (HTTP 400) if the model stored in the user's settings does not support these parameters. This is a significant risk for existing installations where an older model (like gpt-4o-mini or gpt-5-mini) might still be configured in the subreddit settings.
Additionally, verbosity is not a standard parameter in the public OpenAI Chat Completions API. While reasoning_effort is a valid parameter for reasoning models (like the o1/o3 family), verbosity appears to be non-standard and may cause errors if the API does not recognize it.
It is safer to construct the request body conditionally, ensuring these parameters are only sent to models known to support them (the gpt-5.4 family in this context), similar to the defensive implementation in the smoketest script.
body: JSON.stringify({
model, // gpt-5.4-mini (default) / gpt-5.4-nano / gpt-5.4 — see devvit.json openaiModel
response_format: { type: 'json_object' },
messages,
// Tuned for what this call is: a mechanical NL → strict-JSON translation.
// reasoning_effort: 'none' — no hidden reasoning needed; keeps it fast and stops the
// token budget being eaten by reasoning (gpt-5.4 family value;
// older models call this 'minimal'). Measured ~1.1–1.4s.
// verbosity: 'low' — terse JSON, no commentary.
// max_completion_tokens — a compiled rule + a clarification fit well under 600.
// (no `temperature` — the gpt-5.x family only accepts the default; max_tokens isn't
// supported on these models, use max_completion_tokens.)
max_completion_tokens: 600,
...(model.startsWith('gpt-5.4') ? { reasoning_effort: 'none', verbosity: 'low' } : {}),
}),feat(openai): tune compile call (reasoning_effort: none, verbosity: low), default gpt-5.4-mini
What
vibe-mod's one LLM call (
callOpenAI: NL rule → strict JSON, or a clarification) is mechanical translation, not reasoning — so configure it that way:reasoning_effort'none''none'is the gpt-5.4-family value; gpt-5.0/5.1 used'minimal'(5.4 rejects it: "Supported values are: 'none','low','medium','high','xhigh'") and gpt-5-mini wants'minimal'. Fine because the model options are restricted to the 5.4 family.verbosity'low'max_completion_tokens600(was 700)/max_tokenstemperaturemax_completion_tokens, default temperature onlyModel choice — measured (real key, production config)
reasoning_effort: minimal)devvit.jsonopenaiModeltrimmed to those three with the numbers in the labels; default switched gpt-5.4-nano → gpt-5.4-mini;index.tsfallback updated too.smoketest
Default request config now mirrors
callOpenAI(reasoning_effort/verbosity/max_completion_tokens), withREASONING_EFFORT/VERBOSITY/MAX_COMPLETION_TOKENSenv overrides for experiments (earlier commits added per-call latency + theOPENAI_MODELS=a,b,ccomparison table). Tightened one test case to an explicit threshold ("more than 90% of the letters are uppercase") so it doesn't penalise the cautious model for asking — now 7/7 on all three.Verify
tsc --noEmit/eslint --max-warnings 0/prettier --checkclean ·vitest151 passed (1 skipped) ·acceptance4/4 ·doctor0 hard · CI greenOPENAI_MODELS=gpt-5.4-mini,gpt-5.4-nano,gpt-5.4 npm run openai:smoketest→ 7/7 × 3🤖 Generated with Claude Code