Skip to content

Studio: add Codex SDK as a chat provider with parallel-calls fan-out#5724

Open
danielhanchen wants to merge 48 commits into
mainfrom
feat/codex-provider
Open

Studio: add Codex SDK as a chat provider with parallel-calls fan-out#5724
danielhanchen wants to merge 48 commits into
mainfrom
feat/codex-provider

Conversation

@danielhanchen
Copy link
Copy Markdown
Member

Summary

Wires the OpenAI Codex CLI / Python SDK (codex_app_server) into Studio as a new chat provider. The provider is hidden on hosts without the CLI + SDK and exposes a device-auth Sign-in button when logged out. A parallel_calls knob fans the turn out across N (up to 20) Codex tasks and synthesises a unified answer.

  • Backend: new codex_availability.py + codex_provider.py + routes/codex.py; ChatCompletionRequest.parallel_calls (1-20); provider_type=codex dispatches through the SDK instead of HTTP; all SDK imports are lazy via importlib.util.find_spec.
  • Frontend: new api/codex-api.ts, components/codex-parallel-tabs.tsx (tabbed render with Synthesis highlight), components/codex-login-button.tsx (device-auth + log streaming + window.open of the verification URL); external-providers.ts exports CODEX_PROVIDER_TYPE, CODEX_MAX_PARALLEL_CALLS, clampCodexParallelCalls, and marks codex text-only.
  • Tests: 14 cases in test_codex_provider.py cover the availability probe across the four install/login states, the streaming + parallel-calls translation against a fake codex_app_server injected into sys.modules, the [1, 20] pydantic clamp, the CodexUnavailableError surfacing path, and the parallel_calls=1 single-call shape.

Test plan

  • pytest studio/backend/tests/test_codex_provider.py (14 passing)
  • pytest studio/backend/tests/test_external_provider_usage_chunk.py studio/backend/tests/test_anthropic_messages.py studio/backend/tests/test_openai_tool_passthrough.py studio/backend/tests/test_inference_model_validation.py (173 passing total)
  • npx tsc -b --pretty false in studio/frontend (clean)
  • Live end-to-end with codex_app_server installed (not available on the build host -- see "Limitations" below)

Limitations

  • The codex_app_server Python SDK was not installable on the build host (PyPI returned no matching distribution as of this PR). All SDK interactions are exercised against a fake module injected into sys.modules in tests. Once the SDK ships to PyPI, an integration test against a real AsyncCodex instance should be added.
  • The parallel_calls UI pill in the composer (deliverable 6) is implemented as a typed clamp + types in external-providers.ts; surfacing it as an actual composer pill requires a follow-up edit in shared-composer.tsx that integrates with the existing InferenceParams plumbing. The CodexParallelTabs component and the underlying state reducer are wired and ready to consume the backend events.
  • The chat-providers dialog gating (deliverable 5) currently relies on the existing hidden: true registry flag; surfacing a synthetic codex row gated on /api/codex/status is a small follow-up in chat-providers-dialog.tsx consuming fetchCodexStatus() from the new API module.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fa809262d9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

)
return "".join(collected)

workers = [asyncio.create_task(_worker(i + 1)) for i in range(n)]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Cancel Codex fan-out tasks when stream is aborted

When a client disconnects or cancels a streaming request, this async generator can be closed before it reaches normal completion, but the worker tasks created for parallel fan-out are never canceled. Because _stream_codex_parallel starts up to 20 background tasks and a drain task here, an interrupted stream continues consuming local Codex capacity after the user is gone, which can starve subsequent requests and waste significant resources.

Useful? React with 👍 / 👎.

Comment on lines +667 to +669
try:
rc = await proc.wait()
except Exception:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Kill device-auth process on canceled login stream

If the /api/codex/login SSE connection is closed mid-flow (dialog closed, navigation, network drop), cancellation lands in this finally, but the code only awaits proc.wait() and never terminates the codex auth login --device-auth subprocess. That leaves orphaned login processes running server-side until they exit on their own, and the canceled request task can remain blocked waiting for that long-running process.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bc8134ebf1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +102 to +106
for msg in reversed(messages):
if msg.get("role") != "user":
continue
content = msg.get("content")
if isinstance(content, str):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve prior turns when routing Codex requests

_last_user_prompt stops at the newest role=user message and returns only that one turn, but each request also creates a brand-new Codex thread (async_codex_cls() + thread_start) instead of reusing prior thread state. In follow-up questions, Codex therefore receives no assistant/user history and answers without conversation context, which breaks normal multi-turn chat behavior for this provider.

Useful? React with 👍 / 👎.

Comment on lines +80 to +81
} else if (!error) {
setError("Codex login did not complete -- see log for details.");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve streamed error details in Codex login UI

The post-loop fallback checks !error from the callback closure, not the latest value set during the stream. When the backend sends an error event, setError(event.message) runs, but error here is still the stale pre-stream value, so the specific backend message is overwritten by the generic "did not complete" text. This hides actionable failure details from users during device-auth failures.

Useful? React with 👍 / 👎.

yield "data: [DONE]\n\n"

return StreamingResponse(
_codex_stream(),
danielhanchen and others added 4 commits May 23, 2026 14:00
Wires the OpenAI Codex CLI / Python SDK (codex_app_server) into Studio
as a new chat provider type. Hosts that don't have the CLI or the SDK
installed never see the entry; on logged-out hosts the provider config
dialog renders a device-auth Sign-in button that surfaces the
verification URL and streams CLI progress back over SSE.

Backend
- new core/inference/codex_availability.py probes the CLI + SDK and
  reports {installed, logged_in, version, supported_models}; it never
  imports codex_app_server at module top level so the rest of the
  backend keeps starting cleanly on hosts that don't have the SDK.
- new core/inference/codex_provider.py wraps AsyncCodex and translates
  Codex events into OpenAI chat-completion chunks. Supports the
  thread.run_streaming path with a non-streaming fallback for older
  SDK revs.
- parallel_calls > 1 fans the turn out across N tasks (capped at 20)
  via asyncio.gather and emits codex_tab_open / codex_tab_chunk /
  codex_tab_close tool-events per attempt plus a final codex_gather
  synthesis event. A separate standalone Codex call produces the
  unified answer.
- new routes/codex.py exposes GET /api/codex/status and POST
  /api/codex/login. The login route shells out to
  codex auth login --device-auth and streams events; the first event
  carries the verification URL so the frontend can window.open it.
- ChatCompletionRequest gains a parallel_calls field bounded [1, 20]
  by pydantic. The codex registry entry stays hidden by default; the
  /api/codex/status probe is the authoritative gate.
- routes/inference.py dispatches provider_type=codex through the
  local CLI/SDK pipeline instead of the standard HTTP client, with
  graceful error surfacing for CodexUnavailableError.

Frontend
- new api/codex-api.ts exposes fetchCodexStatus() and an async
  generator streamCodexDeviceLogin() that drives the SSE stream and
  yields parsed events.
- new components/codex-parallel-tabs.tsx renders the tabbed parallel-
  calls UI with a Synthesis tab highlighted once the codex_gather
  event arrives. Pure reducer keeps the state transitions unit-
  testable.
- new components/codex-login-button.tsx posts to /api/codex/login,
  opens the verification URL in a new tab via window.open, and shows
  the streamed CLI log as it lands.
- external-providers.ts exports CODEX_PROVIDER_TYPE,
  CODEX_MAX_PARALLEL_CALLS, isCodexProviderType, and
  clampCodexParallelCalls. Codex is marked text-only so the composer
  hides image-attach affordances when selected.

Tests
- tests/test_codex_provider.py (14 cases) covers the availability
  probe across the four install / login states, the streaming +
  parallel-calls translation against a fake codex_app_server module
  injected into sys.modules, the [1, 20] pydantic clamp, the
  CodexUnavailableError surfacing path, and the parallel_calls=1
  single-call shape (no tab tool-events).
test_health_response_reports_desktop_capability_fields builds a
SimpleNamespace as a fake routes module so it can exercise
main.health_check without standing the full app up. The stub
listed every router name except codex_router, which lands in the
main.py import block alongside the others as of this PR, so the
import failed with 'cannot import name codex_router from <unknown
module name>' on the Python 3.13 unit run.

Add the codex_router slot to the stub.
@danielhanchen danielhanchen force-pushed the feat/codex-provider branch from bc8134e to 0a4309f Compare May 23, 2026 14:00
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0a4309fefd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

setLogs([]);
setDeviceUrl(null);
const controller = new AbortController();
abortRef.current?.abort();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Abort login stream on unmount

AbortController is only wired to startLogin, so closing the dialog or navigating away while busy leaves the /api/codex/login stream running until the CLI exits. In practice this keeps a server-side login process alive after the UI is gone and continues dispatching async updates to a detached component tree. Add unmount cleanup (and clear abortRef in finally) so in-flight login streams are canceled when the component is removed.

Useful? React with 👍 / 👎.

Comment thread studio/backend/routes/codex.py Fixed
Followups on the post-merge review pass for the Codex SDK chat
provider. Verified against codex-cli 0.133.0 + the upstream
`openai/codex` Rust + Python sources, then pinned each fix with
a regression test in `test_codex_provider.py` (24/24 passing).

* Probe both `openai_codex` (canonical upstream Python package at
  `openai/codex/sdk/python`) and the legacy `codex_app_server`
  alias. Without this the availability probe always reported
  `sdk_importable: false` even when the SDK was installed, so the
  provider was permanently hidden.
* Switch the device-auth and login-status invocations from
  `codex auth login --device-auth` / `codex auth status` to the
  real upstream subcommands `codex login --device-auth` and
  `codex login status`. The former path returns
  `unrecognized subcommand 'auth'` on a real CLI.
* Strip ANSI control sequences before extracting the device URL
  (upstream wraps the URL in `\x1b[34m...\x1b[0m`) and tighten the
  pattern to the canonical `.../codex/device` shape. Also surface
  the one-time code as a `device_code` SSE event so the UI can
  show it alongside the URL.
* Fix `_detect_logged_in` substring footgun: `"logged in" in
  combined` matched inside `"not logged in"`, flipping logged-out
  users to logged-in. Anchor on word boundaries with negative
  prefixes winning regardless of return code.
* Cancel in-flight fan-out workers on SSE disconnect. Previously
  every parallel Codex turn ran to completion against a
  disconnected client and burned quota; now `_stream_codex_parallel`
  cancels its worker + drain tasks in a try/finally on
  `CancelledError`/`GeneratorExit`.
* Tear down the device-login subprocess on disconnect via
  `start_new_session=True` + `os.killpg(SIGTERM)` (Unix) or
  `CREATE_NEW_PROCESS_GROUP` + `CTRL_BREAK_EVENT` (Windows), with
  a bounded `proc.wait()` and `proc.kill()` fallback. Previously
  `finally: await proc.wait()` blocked the SSE close path because
  `codex login --device-auth` only exits on user action.
* Render the full conversation transcript in `_last_user_prompt`
  instead of returning only the most recent user message. The PR
  opens a fresh thread per request so prior assistant turns were
  dropped, degrading multi-turn chats to single-shot prompts.
  Single-turn input is unchanged.
* Make `ChatCompletionRequest.parallel_calls` default to 1 (`int`
  with `ge=1, le=20`) instead of `Optional[int] = None`. The
  runtime already coerced `None` -> 1, but the schema now matches
  the documented `[1, 20]` range.
* Replace the registry's hardcoded `default_models` (which
  contained `o3`, not in the upstream catalog) with the current
  `gpt-5.5 / 5.4 / 5.4-mini / 5.3-codex / 5.2` set from
  `codex-rs/models-manager/models.json`.
* Stop echoing `str(exc)` in SSE error frames in both
  `routes/inference.py` and `routes/codex.py`. The Codex SDK can
  raise with local paths, env-var content, or traceback fragments
  (CodeQL `py/information-exposure-through-exception`). Surface a
  generic message + `exception_type` discriminator; log the full
  reason server-side via `logger.error(..., exc_type=..., error=...)`.

Doc / comment updates throughout to refer to `codex login` /
`openai_codex` rather than the older incorrect strings.

Tested: pytest 24 cases in `test_codex_provider.py` (the original
14 + 10 new `TestCodexHardenedRegressions`) plus the rest of the
Studio-backend test suite the PR touches (209 passing). Also
verified live against Studio launched from this branch on a
Blackwell B200 via `UNSLOTH_STUDIO_HOME=$WORKSPACE/temp/...
./install.sh --local` then a Playwright probe.
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

The OpenAI Codex Python SDK ships on PyPI as
`openai-codex-app-server-sdk`, not `openai-codex` (which is the
GitHub repo project name in pyproject.toml). The runtime binary
ships separately as `openai-codex-cli-bin`. Both packages expose
the import name `openai_codex`; the older docs reference
`codex_app_server` so we keep probing both.

Update the `CodexUnavailableError` message and the provider
registry notes so a user hitting the unavailable path gets a
copy-pasteable `pip install` command. No behaviour change.
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Post-review pass driven by reviewer.py. The original PR shipped the
backend codex provider, the registry entry (with `hidden:true`), the
status API, and the `CodexParallelTabs` component, but the chat UI
never surfaced the row, required an API key for the connection, and
never sent `parallel_calls` over the wire. Also fixes a CodeQL leak
in the parallel fan-out error path and adds the canonical streaming
hook upstream actually exposes.

Frontend
* chat-providers-dialog.tsx now calls `/api/codex/status` alongside
  `/api/providers/registry`. When the host has Codex installed the
  Add connection dialog gains a synthetic Codex row (curated model
  list comes from `supported_models`) so the picker is reachable.
* The Add / Edit connection guards now skip the API-key requirement
  for Codex the same way they do for the custom OpenAI-compat
  presets; the field itself is also hidden so the user is not asked
  for a key Studio will not use.
* chat-adapter.ts now also exempts Codex from the "Missing API key"
  pre-flight, and emits `parallel_calls` on the outgoing request
  when the selected connection is Codex (clamped to [1, 20] by the
  shared helper, defaults to 1).
* external-providers.ts adds `codexParallelCalls` to
  ExternalProviderConfig so future composer UI can persist the
  user's pick per connection.

Backend
* `_stream_thread_run` now tries `thread.turn(prompt).stream()`
  first, mirroring the canonical openai_codex API
  (`openai/codex/sdk/python/src/openai_codex/api.py`). The legacy
  `thread.run_streaming(prompt)` path is kept as a fallback and the
  buffered `await thread.run(prompt)` stays as the last resort.
* `_stream_codex_parallel` no longer echoes `str(exc)` in the
  `codex_tab_error` SSE event. Per-tab failures now surface a
  generic "Codex tab failed" message plus an `exception_type`
  discriminator; `CodexUnavailableError` is the only exception
  whose text is forwarded verbatim because it is a user-actionable
  install hint with no sensitive content (CodeQL
  `py/information-exposure-through-exception`).

Tests
* New `TestCodexHardenedRegressions::test_parallel_tab_error_sanitised`
  injects a fake SDK that raises with a path-like message and
  asserts the SSE frames do not echo it.
* New `TestCodexHardenedRegressions::test_thread_turn_stream_path_taken`
  verifies the canonical `thread.turn(prompt).stream()` hook is
  preferred over the legacy helper.

All 26 codex_provider tests pass. Frontend `tsc --noEmit` clean.
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Second reviewer.py pass surfaced three follow-ups missed in the
earlier round. All caught by 12 parallel reviewers + cross-block
audit; each fix is small but user-facing.

* `testProvider` no longer pushes a Codex connection back to the
  edit form to "add an API key". Codex has no remote endpoint to
  ping, so the Test button now calls `/api/codex/status` directly:
  toasts success with the CLI version when installed+logged in,
  prompts to sign in when installed+logged out, and errors when
  the CLI or SDK is missing.

* The Sign-in to Codex affordance is now actually mounted. When
  the selected provider is Codex and `/api/codex/status` reports
  `installed:true, logged_in:false`, the dialog renders the new
  `CodexLoginButton` above the (hidden) API key row. The button's
  `onLoggedIn` callback re-probes status so the UI flips to the
  ready state without a page reload.

* The chat adapter now handles `codex_*` `_toolEvent` types
  instead of silently swallowing them. Per-tab chunks render
  inline with a `[Codex tab N/M]` header so users see each
  parallel attempt; `codex_gather` adds a `--- Synthesis ---`
  divider before the final unified content delta the backend
  also emits as plain text. This unblocks the existing fan-out
  path while a dedicated `CodexParallelTabs` UI is wired in a
  future change.

Verified: 26/26 codex_provider tests pass; `tsc --noEmit` on
studio/frontend completes clean.
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

…t login on unmount

Third reviewer.py pass found three remaining sharp edges. Each fix
is small and paired with a regression test where applicable.

* Codex subprocess env is now scrubbed to a safe-list before spawn.
  Both `_run_cli` in codex_availability and the device-auth spawn
  in stream_codex_device_login switch from `env=os.environ.copy()`
  to `env=_codex_subprocess_env()`, which forwards only PATH /
  HOME / USER / Windows-equivalents / CODEX_HOME / OPENAI_API_KEY /
  OPENAI_BASE_URL. Other-provider secrets like HF_TOKEN, GH_TOKEN,
  WANDB_API_KEY, ANTHROPIC_API_KEY no longer reach the local codex
  binary, so a shimmed `codex` earlier on PATH cannot harvest them.

* `_stream_thread_run` now tracks `emitted_any` and refuses to fall
  through to the buffered `await thread.run(prompt)` after either
  streaming helper has already yielded text. Previously a network
  glitch mid-stream re-executed the same Codex turn, which can
  duplicate file writes, shell commands, and other Codex side
  effects. The buffered path is now reserved for the zero-output
  case (no streaming helper resolved, or streaming returned empty).

* `CodexLoginButton` now aborts the SSE reader on unmount via a
  useEffect cleanup that calls `abortRef.current?.abort()`. The
  underlying `codex login --device-auth` subprocess no longer
  keeps streaming (and holding a device-auth session) after the
  dialog closes.

Two new pytest cases pin the behaviour: `test_codex_subprocess_env_scrubbed`
sets HF/GH/WANDB/ANTHROPIC keys and asserts none reach the codex
env while OPENAI_API_KEY / CODEX_HOME survive; and
`test_partial_stream_failure_does_not_replay_turn` injects a fake
`turn().stream()` that yields "partial output " then raises, and
asserts `thread.run()` is never called. 28/28 codex_provider tests
pass; `tsc --noEmit` clean.
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

…x preselect

Two P1 fixes from the round 8 reviewer pass:

1. _stream_thread_run no longer replays a Codex turn that fired
   non-visible events before crashing.

   The replay guard only tracked `emitted_any` (visible text). A
   Codex turn that emitted, say, a command.delta or file.delta event
   first -- both filtered to "" by _coerce_text -- and THEN crashed
   would leave emitted_any=False and fall through to the buffered
   `thread.run(prompt)` fallback, re-executing the same turn and
   duplicating its side effects (shell commands, file writes,
   tool calls). This is exactly the case the guard was added to
   prevent in earlier rounds; the missing bit was tracking
   "the turn ran at all", not just "the turn yielded text".

   Fix: add a separate turn_started flag that flips True the moment
   we ask the SDK for a turn handle or observe any event from a
   streaming helper. When the buffered fallback is gated on
   turn_started instead of emitted_any, a partial-turn crash
   correctly stops without replaying. Regression test reproduces
   the bug against the pre-fix code (assertion catches the extra
   thread.run call) and locks the fix in.

2. openAddProvider now mirrors the providerType-change effect's
   Codex pre-check.

   The first-run UX fix from `26799d9a` pre-checked every Codex
   default model in the providerType-change effect, but
   openAddProvider() calls resetForm() (which clears
   selectedModelIds) and then only restores availableModels, not
   selectedModelIds. If the user closes the Add connection form
   and re-opens it while Codex is still the current providerType,
   the effect does not re-run, so the form opens with Codex
   defaults available but none selected -- the "Add at least one
   model ID" save guard then blocks the Save click.

   Fix: openAddProvider now seeds selectedModelIds with the full
   default-models list when the provider is Codex, matching the
   providerType-change effect so the two entry paths produce the
   same first-run state.
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@danielhanchen
Copy link
Copy Markdown
Member Author

Round 8 reviewer pass surfaced two more P1 issues; one more commit lands both fixes.

b17765b5 (round 8): replay guard on non-visible events + Add-flow Codex preselect

  1. _stream_thread_run could replay a partial Codex turn after non-visible events. The earlier replay guard only tracked emitted_any (visible text). A Codex turn that emitted a command.delta / file.delta / tool_call.delta event first -- all filtered to "" by _coerce_text -- and then crashed would leave emitted_any False and fall through to the buffered thread.run(prompt) fallback, re-executing the same turn and duplicating shell commands or file writes upstream. The fix tracks a separate turn_started flag that flips True the moment we ask the SDK for a turn handle OR observe any event from a streaming helper; the buffered fallback is gated on turn_started instead, so a partial-turn crash stops cleanly without replay. Regression test reproduces the bug against the pre-fix code (thread.run was called after a partial-turn crash assertion catches the extra call) and locks the fix in.

  2. openAddProvider was clobbering the Codex auto-preselect. The round 7 fix (26799d9a) pre-checked every Codex default model in the providerType-change useEffect. But openAddProvider() calls resetForm() (which clears selectedModelIds) and then only restores availableModels. If the user closes the Add connection form and re-opens it while Codex was already providerType, the effect did not re-run, so the form opened with Codex defaults available but none selected and the "Add at least one model ID" save guard blocked the click. openAddProvider now mirrors the effect's Codex pre-check, matching the same first-run state on both entry paths.

Test counts after b17765b5:

  • tests/test_codex_provider.py: 60 passed (50 round 6 -> 58 round 7b -> 60 round 8; 10 new regression tests across the round 7 / 7b / 8 commits, covering the cross-wrapper env scrub, device URL allowlist, log-filter blocklist, and replay-on-non-visible-events).

Round 8 reviewer-finding summary:

  • 11 of 12 reviewers reported additional issues; 10 of those were follow-ups on areas this PR already touches.
  • The two P1s above are fixed.
  • Two reviewers (10 and 12) flagged normalizeProvider allegedly dropping codexParallelCalls. Reproducing the path in Node with the actual normalizeProvider body, codexParallelCalls survives via the ...raw spread; the explicit overrides do not touch it. Not actionable from the source as-written.
  • Other reviewer notes (Windows subprocess launch, dead "tab" component, accidental main reverts) are tracked but lower severity than the two P1s landed here.

Resolve three conflicts touched by main since the branch forked:

- studio/backend/core/inference/external_provider.py: take main's
  rewrite of _anthropic_citation_key (extended dedup keys covering
  end-char/page/block indices plus search_result_index) and the new
  _anthropic_supports_fast_mode helper. The branch had only the
  earlier shorter citation_key form so accepting main wholesale
  here loses nothing Codex-related.

- studio/backend/models/inference.py: keep BOTH the Codex
  parallel_calls field + _clamp_parallel_calls validator (from the
  branch) AND the new Anthropic fast_mode field (from main). They
  occupy different provider lanes.

- studio/frontend/src/features/chat/api/chat-adapter.ts: fold the
  Codex per-tab rendering pass (renderFullContent) into main's new
  orderAssistantContent positioning so tools land before text,
  generated images after, AND any Codex tab text accumulated in
  earlier _toolEvent frames is preserved through the synthesis
  content delta (round 8 render-order fix).

Backend codex tests: 60/60 passing. Anthropic citations / fast_mode
tests: 96/96 passing. Frontend builds cleanly.
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

1. Codex SSE wrapper terminates on exact `data: [DONE]` only.

   The old substring check `if "[DONE]" in line` would flip
   sent_done True when a normal model response carried the literal
   text "[DONE]" in delta.content (for example an explanation of
   the OpenAI stream sentinel). The real terminator was then
   suppressed, leaving OpenAI-compatible clients that finalise on
   the explicit sentinel hung on stream close. Now compares the
   stripped line to the exact `data: [DONE]` form.

2. Legacy `thread.run_streaming` path no longer returns an empty
   reply on completion-only streams.

   If the SDK exposes `thread.run_streaming` but the stream emits
   ONLY item.completed / agentMessage events with no message
   deltas, the loop previously exited with emitted_any False and
   never reached the agent-message fallback. The request returned
   200 with an empty assistant reply even though Codex produced a
   final answer. Mirror the canonical-path behavior: collect
   `_completed_agent_message_text` strings in a sidecar list and
   emit the last one when no deltas arrived. Match the canonical
   payload-extraction (`getattr(event, "payload", event)`) so the
   event-vs-payload SDK shape difference is handled the same way
   in both branches.

3. Parallel-calls fan-out propagates CodexUnavailableError so the
   route layer can return 503.

   When the SDK is not importable or the safety enums are missing
   without the dev opt-in, every worker raised the same
   CodexUnavailableError. The previous catch-all converted the
   error into a per-tab codex_tab_error event, the outer stream
   never raised, and clients saw a 200 with only tool events and
   an empty synthesis -- OpenAI-compatible consumers that ignore
   _toolEvent saw a successful empty reply. Now CodexUnavailableError
   re-raises out of the worker (no spurious per-tab error event),
   _await_workers re-raises it when EVERY worker hit the same
   setup failure, and the finally-block drain await propagates the
   exception out of the parallel function so the route's existing
   CodexUnavailableError handler can emit the right 503 SSE error
   frame. Per-tab runtime failures (model rejected, timeout, mid-
   stream SDK crash) still get swallowed into codex_tab_error
   events so a single bad model in the fan-out does not kill the
   others.

Test counts: 63/63 passing (60 round 6-8 plus 3 new round 9 regression
tests). Each new test was first run against a `git stash`-restored
pre-fix tree to confirm it catches the bug, then run against the
patched tree.
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@danielhanchen
Copy link
Copy Markdown
Member Author

Merged main into the branch and resolved three conflicts: _anthropic_citation_key (took main's extended dedup keys), ChatCompletionRequest (kept both the Codex parallel_calls field + clamp validator and main's new Anthropic fast_mode field), and chat-adapter.ts (folded the Codex per-tab renderFullContent() into main's new orderAssistantContent ordering). Backend Codex tests are 60/60 green, Anthropic citation / fast-mode tests are 96/96 green, frontend builds cleanly.

Then addressed three new P2 findings the Codex bot left on be15c57f / 26799d9a in e2b7f595:

  • P2 Codex SSE wrapper sentinel match (routes/inference.py:1941): the substring check "[DONE]" in line would flip sent_done on a delta.content containing the literal text [DONE] and suppress the real terminal frame. Now compares the stripped line to the exact data: [DONE] form.

  • P2 legacy run_streaming completion-only stream: when a legacy SDK exposes thread.run_streaming but the stream emits only item.completed / agentMessage events with no message deltas, the loop used to exit with emitted_any=False and never reach the agent-message fallback. The request returned 200 with an empty assistant reply even though Codex produced a final answer. Mirror the canonical-path behavior in the legacy branch: collect _completed_agent_message_text strings and emit the last one if no deltas arrived. Match the canonical payload-extraction so the event-vs-payload SDK shape difference is handled the same way in both branches.

  • P2 parallel-calls fan-out swallowed CodexUnavailableError: when the SDK is not importable or safety enums are missing, every worker raises the same CodexUnavailableError. The previous catch-all converted that into a per-tab codex_tab_error event, so non-tool-aware clients saw a successful empty reply. Now CodexUnavailableError re-raises out of the worker (no spurious per-tab event), _await_workers re-raises when EVERY worker hit the same setup failure, and the finally-block drain await propagates it out of the parallel function so the route's existing handler emits a proper 503 SSE error frame.

Test count after the round 9 commit: 63/63 passing in tests/test_codex_provider.py. Each new regression test was first run against a git stash-restored pre-fix tree to confirm it catches the bug, then re-run against the patched tree.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@danielhanchen
Copy link
Copy Markdown
Member Author

End-to-end Codex screenshots for this PR, captured against the latest
head with the real openai_codex SDK + real codex CLI on PATH.

1. Add the Codex connection (final probe, head 26799d9a with the

default-models prefix fix)

Empty Studio after login:

login

Pick "OpenAI Codex (local CLI)" in the Add Connection dropdown -- the
form now opens with all five default models (gpt-5.5, gpt-5.4,
gpt-5.4-mini, gpt-5.3-codex, gpt-5.2) pre-checked and "5 models
selected", so one save click is enough:

codex form default state

After save, the Connections list shows the new Codex entry with all
five models attached:

connection saved

The model picker's "Connected" tab surfaces every Codex model under an
OPENAI CODEX (LOCAL CLI) heading:

connected tab

Selected model in the chat header:

model selected

2. Multi-turn chat works

Turn 1: user says "My favorite color is teal. Reply with only the word
OK." -- Codex replies OK:

turn 1 done

Turn 2 on the same thread: "What was my favorite color? Reply with
only the color name, no other words." -- Codex replies teal, so the
thread history is being forwarded correctly:

turn 2 done

Full thread visible after both turns:

full thread

3. Parallel calls fan-out + synthesis

Connection edit form -- Parallel calls is in the SDK schema with help
text, range 1-20:

connection edit

Bumped from 1 to 2 and saved:

parallel calls 2

Fan-out on a real prompt ("Reply with three short sentences about LoRA
fine tuning") shows [Codex tab 1/2] and [Codex tab 2/2] rendered
in order, then --- Synthesis ---, then the unified synthesis -- the
render-order fix from b7862388 is in effect:

streaming done

Render-order zoom (separate single-prompt probe -- HELLO example):

render zoom

4. Side-by-side before/after collages

Before vs after the multi-turn fix-set:

e2e before after

Before vs after the parallel-calls synthesis render-order fix:

render before after

All shots are captured live on the PR head with the actual SDK and
codex CLI -- no mocks. Anything containing token fields or other
secret slots was excluded from the upload.

# Conflicts:
#	studio/frontend/src/features/chat/api/chat-adapter.ts
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

danielhanchen and others added 3 commits May 27, 2026 13:30
Two paired changes that finally make the Codex parallel-calls fan-out
visible as actual clickable tabs in the chat surface, plus a credit-
free spoof that lets the whole pipeline run in dev / CI without ever
touching the upstream API.

1. Real tab UI (frontend).

   The chat-adapter used to render the per-worker outputs as inline
   `[Codex tab 1/N] ...` text blocks in the assistant message body,
   which collapsed into one big run-on block once more than a handful
   of tokens had streamed. Now each `codex_*` SSE event is folded into
   `codexParallelState` and re-published as the `args.state` of a
   single tool-call part with `toolName === "codex_parallel"`. The
   assistant-ui surface dispatches that to the new
   `CodexParallelToolUI` wrapper, which mounts the existing
   `CodexParallelTabs` component -- one tab per worker, one Synthesis
   tab, click to switch. The stable `toolCallId` keeps assistant-ui
   updating the SAME card across stream yields rather than spawning
   new cards.

   `renderCodexTabsBlock` now returns the empty string so the message
   body no longer contains the labelled-text fallback (kept the
   function name so the rest of the adapter's `renderFullContent` /
   pin-signature paths are untouched).

2. Credit-free Codex SDK spoof (backend).

   New `studio/backend/core/inference/codex_spoof.py` exposes a drop-in
   subset of the upstream `openai_codex` surface (`AsyncCodex`,
   `AppServerConfig`, `ApprovalMode.deny_all`, `SandboxMode.read_only`,
   thread with `turn().stream()` + `run_streaming()` + `run()`) and
   emits deterministic per-tab streaming events tagged with the worker
   index, so flipping between tabs in the UI shows visibly distinct
   text. Activated by `UNSLOTH_CODEX_SPOOF=1`; `_import_codex` installs
   the spoof into `sys.modules` under both `openai_codex` and
   `codex_app_server` and the rest of the provider keeps running
   unchanged. OFF by default; production is unaffected.

   Six new tests cover the spoof itself (module install, env-flag
   gating, delta + completion event shape, per-tab tagging, provider
   import path, safety-kwargs resolution against the spoof). 69/69
   tests pass with and without the flag; TypeScript clean.
@danielhanchen
Copy link
Copy Markdown
Member Author

Two paired changes pushed on feat/codex-provider:

Merge conflict resolution (ecf8bc76)
Merged main into the branch (latest Gemini provider + 10 other PRs). One conflict in studio/frontend/src/features/chat/api/chat-adapter.ts across 6 hunks (the Codex predicates we added on this branch vs main's Gemini custom-base predicates and tweaked comments). Combined both: API-key gate now skips hosted-key checks for local providers, custom providers, Codex local CLI, AND Gemini custom OpenAI-compat bases; renderFullContent() / pinTextThoughtSignature() are composed so Codex per-tab text AND Gemini thoughtSignature both survive into the final yield. TypeScript clean. 69/69 backend tests pass.

Real Codex parallel-call tab UI + credit-free SDK spoof (f01011e4)

  1. Real tab UI. The chat-adapter used to render the per-worker outputs as inline [Codex tab 1/N] text in the assistant message body. Now each codex_* SSE event is folded into a CodexParallelState and re-published as the args.state of a single tool-call part (toolName: "codex_parallel", stable toolCallId). The assistant-ui surface dispatches that to a new CodexParallelToolUI wrapper that mounts the existing CodexParallelTabs component: one tab per worker, plus a Synthesis tab, click to switch, read-only for now (matches the spec). The dead codex-parallel-tabs.tsx component is no longer dead.

  2. Credit-free SDK spoof. New studio/backend/core/inference/codex_spoof.py exposes a drop-in subset of the upstream openai_codex API surface (AsyncCodex, AppServerConfig, ApprovalMode.deny_all, SandboxMode.read_only, thread with turn().stream() + run_streaming() + run()). Emits deterministic per-tab streaming events tagged with the worker index, so flipping between tabs in the UI shows visibly distinct text. Activated by UNSLOTH_CODEX_SPOOF=1; _import_codex installs it into sys.modules under both canonical names. OFF by default; production unaffected.

Six new tests cover the spoof itself (module install + spec, env-flag gating, delta + completion event shape, per-tab tagging, provider import path, safety-kwargs resolution). Full count: 69/69 tests pass with and without UNSLOTH_CODEX_SPOOF=1. TypeScript build clean.

Playwright capture of the new tab UI driven by the spoof is up next.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

# Conflicts:
#	studio/backend/main.py
#	studio/backend/routes/__init__.py
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

danielhanchen and others added 2 commits May 27, 2026 14:44
When ``UNSLOTH_CODEX_SPOOF=1`` is exported (the credit-free dev / CI
path the previous commit added), the in-process spoof IS the Codex
SDK and a real ``codex`` CLI is irrelevant. The status endpoint at
``/api/codex/status`` used to gate ``installed`` on the real CLI +
real SDK only, which made the frontend hide the Codex provider in
the connections dropdown even when the spoof was active. Now both
``_sdk_importable`` and ``probe_codex_availability`` short-circuit on
``codex_spoof.is_spoof_enabled()`` so the provider becomes visible
under the spoof. ``installed=True``, ``cli_path="<spoof>"``,
``logged_in=True``, ``version="spoof"`` -- a sentinel that lets devs
read off "yes I am under the spoof" at a glance.

Real production code path (no spoof flag) is unchanged: still gates
on bool(cli_path) AND sdk_ok the same as round 6.

Tests: added an autouse fixture in ``test_codex_provider.py`` that
clears ``UNSLOTH_CODEX_SPOOF`` before every test so the existing
availability / import gating tests are not polluted when a dev runs
the suite with the flag exported. The spoof-targeted tests still
call ``monkeypatch.setenv(...)`` to flip it back on inside their own
scope. 69/69 pass with and without the env flag.
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

1 similar comment
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@danielhanchen
Copy link
Copy Markdown
Member Author

Another round on feat/codex-provider:

Merge -- main moved forward (Gemini provider #5720 + MCP servers #5750). Resolved conflicts in chat-adapter.ts (6 hunks) and studio/backend/{main.py,routes/__init__.py} (kept both codex_router and mcp_servers_router registrations). Branch is back to clean against main.

Code -- the Codex availability probe now honours UNSLOTH_CODEX_SPOOF=1 end-to-end (6403846b). Before, the spoof would emit per-tab text correctly but the UI hid the Codex provider because /api/codex/status still required a real CLI on PATH. Now installed=True + cli_path="<spoof>" + version="spoof" so the connection dropdown surfaces the entry. Test fixture clears the env var by default so the gating tests stay deterministic.

Demo screenshots (under tab_ui_round/):

Live UI flow against the spoofed Studio:

login

Connection form opens with Codex pre-selected, default models pre-checked, parallel-calls input ready:

form open

Parallel calls bumped to 3, 5 models still selected:

parallel 3

The actual CodexParallelTabs component mounted with mock parallel state -- one tab per worker, click to switch, Synthesis tab highlighted:

tab 1

Tab 2 -- visibly different worker text (this is the point of the per-tab tagging in the spoof, mirrors what real Codex parallel calls produce when each worker explores a different angle):

tab 2

Tab 3:

tab 3

Synthesis tab -- highlighted in green when ready, unified answer below the tab strip:

synthesis

Mid-stream view (some tabs still streaming):

streaming

Error path -- per-tab failure isolated, other tabs still complete and synthesis still arrives:

error tab

The demo screenshots use mock CodexParallelState injected via Playwright to bypass the chat-stream flow; the visible card layout is the same DOM the chat-adapter emits at runtime via the new codex_parallel tool-call part. 69/69 tests still green. No real Codex tokens were consumed (spoof active).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants