Skip to content

feat(openai): tune compile call (reasoning_effort: none, verbosity: low), default gpt-5.4-mini#11

Merged
ComBba merged 1 commit into
mainfrom
feat/openai-reasoning-config
May 12, 2026
Merged

feat(openai): tune compile call (reasoning_effort: none, verbosity: low), default gpt-5.4-mini#11
ComBba merged 1 commit into
mainfrom
feat/openai-reasoning-config

Conversation

@ComBba
Copy link
Copy Markdown
Contributor

@ComBba ComBba commented May 12, 2026

What

vibe-mod's one LLM call (callOpenAI: NL rule → strict JSON, or a clarification) is mechanical translation, not reasoning — so configure it that way:

param value why
reasoning_effort 'none' no hidden reasoning → fast (~1.2–1.4s) and the token budget isn't eaten by reasoning. ⚠️ 'none' is the gpt-5.4-family value; gpt-5.0/5.1 used 'minimal' (5.4 rejects it: "Supported values are: 'none','low','medium','high','xhigh'") and gpt-5-mini wants 'minimal'. Fine because the model options are restricted to the 5.4 family.
verbosity 'low' terse JSON, no commentary
max_completion_tokens 600 (was 700) a compiled rule + a clarification fit comfortably; observed worst case ~150 out tokens
max_tokens / temperature not supported on gpt-5.x; use max_completion_tokens, default temperature only

Model choice — measured (real key, production config)

model pass median max avg out tok
gpt-5.4-mini 7/7 ~1.2s ~1.8s ~98 recommended / new default, fastest
gpt-5.4-nano 7/7 ~1.5s ~1.7s ~113 ← cheapest
gpt-5.4 (full) 7/7 ~2.1s ~4.2s ~112 ← slower, more cautious on ambiguous rules, no quality gain for this task
gpt-5-mini 7/7 (only with reasoning_effort: minimal) ~1.9s ~4.2s slower + bumpy; not in the options
gpt-5-nano / gpt-4.1-mini / gpt-4.1-nano n/a 403 (not available to this project)

devvit.json openaiModel trimmed to those three with the numbers in the labels; default switched gpt-5.4-nano → gpt-5.4-mini; index.ts fallback updated too.

smoketest

Default request config now mirrors callOpenAI (reasoning_effort/verbosity/max_completion_tokens), with REASONING_EFFORT/VERBOSITY/MAX_COMPLETION_TOKENS env overrides for experiments (earlier commits added per-call latency + the OPENAI_MODELS=a,b,c comparison table). Tightened one test case to an explicit threshold ("more than 90% of the letters are uppercase") so it doesn't penalise the cautious model for asking — now 7/7 on all three.

On data sharing: the free-daily-usage tier is the OpenAI "share API inputs/outputs" program (the help article you linked). For vibe-mod that's benign — it sends only the moderator's natural-language rule + the system prompt, never Reddit content (hard-lock #6). The shared data is the rule text + the compiled JSON.

Verify

  • tsc --noEmit / eslint --max-warnings 0 / prettier --check clean · vitest 151 passed (1 skipped) · acceptance 4/4 · doctor 0 hard · CI green
  • OPENAI_MODELS=gpt-5.4-mini,gpt-5.4-nano,gpt-5.4 npm run openai:smoketest → 7/7 × 3

🤖 Generated with Claude Code

…none, verbosity: low; default gpt-5.4-mini

The single LLM call vibe-mod makes (callOpenAI: NL rule → strict JSON, or a
clarification) is mechanical translation, not reasoning. Configure it as such:
  - reasoning_effort: 'none'  — no hidden reasoning; fast (~1.2–1.4s) and keeps
    the token budget from being eaten by reasoning. NB: 'none' is the gpt-5.4
    family's value; the gpt-5.0/5.1-era 'minimal' is rejected by 5.4 ("Supported
    values are: 'none','low','medium','high','xhigh'"), and gpt-5-mini wants
    'minimal' — so this is 5.4-family-specific, which is fine since the model
    options are restricted to that family.
  - verbosity: 'low'          — terse JSON, no commentary.
  - max_completion_tokens: 600 (down from 700) — a compiled rule + a clarification
    fit comfortably; observed worst case ~150 out tokens.
  - still no `temperature` (gpt-5.x only accepts the default).

devvit.json openaiModel options trimmed/relabelled to the three viable picks
with measured numbers, default switched gpt-5.4-nano → gpt-5.4-mini:
  gpt-5.4-mini  7/7  median ~1.2s  max ~1.8s   ← recommended, fastest
  gpt-5.4-nano  7/7  median ~1.5s  max ~1.7s   ← cheapest
  gpt-5.4       7/7  median ~2.1s  max ~4.2s   ← full; slower, more cautious on
                                                ambiguous rules, no quality gain
(index.ts fallback model also updated nano → mini.)

smoketest: default request config now mirrors callOpenAI (reasoning_effort/
verbosity/max_completion_tokens), with REASONING_EFFORT/VERBOSITY/
MAX_COMPLETION_TOKENS env overrides for experiments; added per-call latency and
the OPENAI_MODELS=a,b,c comparison table in earlier commits. Tightened one test
case to an explicit threshold ("more than 90% of the letters are uppercase") so
it doesn't penalise the more-cautious model for asking — now 7/7 on all three.

tsc/lint/format/tests(152)/acceptance(4/4) all green; smoke test 7/7 × 3 models.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

Warning

Rate limit exceeded

@ComBba has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 51 minutes and 47 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 06b60383-8567-4f48-a565-b806a4c8fa2c

📥 Commits

Reviewing files that changed from the base of the PR and between 3f5fa4d and 301bae3.

📒 Files selected for processing (3)
  • devvit.json
  • scripts/openai-smoketest.ts
  • src/server/index.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/openai-reasoning-config

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ComBba ComBba merged commit 23c8b1b into main May 12, 2026
2 checks passed
@ComBba ComBba deleted the feat/openai-reasoning-config branch May 12, 2026 10:56
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the default OpenAI model to gpt-5.4-mini and introduces specific tuning parameters—reasoning_effort, verbosity, and a reduced max_completion_tokens—to optimize for speed and cost. The smoketest script was also updated to allow experimentation via environment variables. Feedback was provided regarding the hardcoding of these new parameters in the production API call, as it may cause errors for older models or non-standard API implementations; a conditional approach is recommended to ensure backward compatibility and API stability.

Comment thread src/server/index.ts
Comment on lines 722 to 737
body: JSON.stringify({
model,
model, // gpt-5.4-mini (default) / gpt-5.4-nano / gpt-5.4 — see devvit.json openaiModel
response_format: { type: 'json_object' },
messages,
// Newer OpenAI models (gpt-5.x family) require max_completion_tokens (not max_tokens)
// and only accept the default temperature, so we don't send `temperature`. Determinism
// is carried by response_format: json_object + the strict prompt + few-shot examples.
max_completion_tokens: 700,
// Tuned for what this call is: a mechanical NL → strict-JSON translation.
// reasoning_effort: 'none' — no hidden reasoning needed; keeps it fast and stops the
// token budget being eaten by reasoning (gpt-5.4 family value;
// older models call this 'minimal'). Measured ~1.1–1.4s.
// verbosity: 'low' — terse JSON, no commentary.
// max_completion_tokens — a compiled rule + a clarification fit well under 600.
// (no `temperature` — the gpt-5.x family only accepts the default; max_tokens isn't
// supported on these models, use max_completion_tokens.)
reasoning_effort: 'none',
verbosity: 'low',
max_completion_tokens: 600,
}),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Hardcoding reasoning_effort and verbosity in the request body can lead to API errors (HTTP 400) if the model stored in the user's settings does not support these parameters. This is a significant risk for existing installations where an older model (like gpt-4o-mini or gpt-5-mini) might still be configured in the subreddit settings.

Additionally, verbosity is not a standard parameter in the public OpenAI Chat Completions API. While reasoning_effort is a valid parameter for reasoning models (like the o1/o3 family), verbosity appears to be non-standard and may cause errors if the API does not recognize it.

It is safer to construct the request body conditionally, ensuring these parameters are only sent to models known to support them (the gpt-5.4 family in this context), similar to the defensive implementation in the smoketest script.

    body: JSON.stringify({
      model, // gpt-5.4-mini (default) / gpt-5.4-nano / gpt-5.4 — see devvit.json openaiModel
      response_format: { type: 'json_object' },
      messages,
      // Tuned for what this call is: a mechanical NL → strict-JSON translation.
      //   reasoning_effort: 'none'  — no hidden reasoning needed; keeps it fast and stops the
      //                               token budget being eaten by reasoning (gpt-5.4 family value;
      //                               older models call this 'minimal'). Measured ~1.1–1.4s.
      //   verbosity: 'low'          — terse JSON, no commentary.
      //   max_completion_tokens     — a compiled rule + a clarification fit well under 600.
      //   (no `temperature` — the gpt-5.x family only accepts the default; max_tokens isn't
      //    supported on these models, use max_completion_tokens.)
      max_completion_tokens: 600,
      ...(model.startsWith('gpt-5.4') ? { reasoning_effort: 'none', verbosity: 'low' } : {}),
    }),

ComBba added a commit that referenced this pull request May 15, 2026
feat(openai): tune compile call (reasoning_effort: none, verbosity: low), default gpt-5.4-mini
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant