Studio: add Codex SDK as a chat provider with parallel-calls fan-out by danielhanchen · Pull Request #5724 · unslothai/unsloth

danielhanchen · 2026-05-22T16:46:31Z

Summary

Wires the OpenAI Codex CLI / Python SDK (codex_app_server) into Studio as a new chat provider. The provider is hidden on hosts without the CLI + SDK and exposes a device-auth Sign-in button when logged out. A parallel_calls knob fans the turn out across N (up to 20) Codex tasks and synthesises a unified answer.

Backend: new codex_availability.py + codex_provider.py + routes/codex.py; ChatCompletionRequest.parallel_calls (1-20); provider_type=codex dispatches through the SDK instead of HTTP; all SDK imports are lazy via importlib.util.find_spec.
Frontend: new api/codex-api.ts, components/codex-parallel-tabs.tsx (tabbed render with Synthesis highlight), components/codex-login-button.tsx (device-auth + log streaming + window.open of the verification URL); external-providers.ts exports CODEX_PROVIDER_TYPE, CODEX_MAX_PARALLEL_CALLS, clampCodexParallelCalls, and marks codex text-only.
Tests: 14 cases in test_codex_provider.py cover the availability probe across the four install/login states, the streaming + parallel-calls translation against a fake codex_app_server injected into sys.modules, the [1, 20] pydantic clamp, the CodexUnavailableError surfacing path, and the parallel_calls=1 single-call shape.

Test plan

pytest studio/backend/tests/test_codex_provider.py (14 passing)
pytest studio/backend/tests/test_external_provider_usage_chunk.py studio/backend/tests/test_anthropic_messages.py studio/backend/tests/test_openai_tool_passthrough.py studio/backend/tests/test_inference_model_validation.py (173 passing total)
npx tsc -b --pretty false in studio/frontend (clean)
Live end-to-end with codex_app_server installed (not available on the build host -- see "Limitations" below)

Limitations

The codex_app_server Python SDK was not installable on the build host (PyPI returned no matching distribution as of this PR). All SDK interactions are exercised against a fake module injected into sys.modules in tests. Once the SDK ships to PyPI, an integration test against a real AsyncCodex instance should be added.
The parallel_calls UI pill in the composer (deliverable 6) is implemented as a typed clamp + types in external-providers.ts; surfacing it as an actual composer pill requires a follow-up edit in shared-composer.tsx that integrates with the existing InferenceParams plumbing. The CodexParallelTabs component and the underlying state reducer are wired and ready to consume the backend events.
The chat-providers dialog gating (deliverable 5) currently relies on the existing hidden: true registry flag; surfacing a synthetic codex row gated on /api/codex/status is a small follow-up in chat-providers-dialog.tsx consuming fetchCodexStatus() from the new API module.

gemini-code-assist · 2026-05-22T16:46:35Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fa809262d9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-22T16:51:20Z

+            )
+        return "".join(collected)
+
+    workers = [asyncio.create_task(_worker(i + 1)) for i in range(n)]


Cancel Codex fan-out tasks when stream is aborted

When a client disconnects or cancels a streaming request, this async generator can be closed before it reaches normal completion, but the worker tasks created for parallel fan-out are never canceled. Because _stream_codex_parallel starts up to 20 background tasks and a drain task here, an interrupted stream continues consuming local Codex capacity after the user is gone, which can starve subsequent requests and waste significant resources.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-22T16:51:20Z

+        try:
+            rc = await proc.wait()
+        except Exception:


Kill device-auth process on canceled login stream

If the /api/codex/login SSE connection is closed mid-flow (dialog closed, navigation, network drop), cancellation lands in this finally, but the code only awaits proc.wait() and never terminates the codex auth login --device-auth subprocess. That leaves orphaned login processes running server-side until they exit on their own, and the canceled request task can remain blocked waiting for that long-running process.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bc8134ebf1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-22T18:39:05Z

+    for msg in reversed(messages):
+        if msg.get("role") != "user":
+            continue
+        content = msg.get("content")
+        if isinstance(content, str):


Preserve prior turns when routing Codex requests

_last_user_prompt stops at the newest role=user message and returns only that one turn, but each request also creates a brand-new Codex thread (async_codex_cls() + thread_start) instead of reusing prior thread state. In follow-up questions, Codex therefore receives no assistant/user history and answers without conversation context, which breaks normal multi-turn chat behavior for this provider.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-22T18:39:05Z

+      } else if (!error) {
+        setError("Codex login did not complete -- see log for details.");


Preserve streamed error details in Codex login UI

The post-loop fallback checks !error from the callback closure, not the latest value set during the stream. When the backend sends an error event, setError(event.message) runs, but error here is still the stale pre-stream value, so the specific backend message is overwritten by the generic "did not complete" text. This hides actionable failure details from users during device-auth failures.

Useful? React with 👍 / 👎.

+                yield "data: [DONE]\n\n"
+
+        return StreamingResponse(
+            _codex_stream(),


Wires the OpenAI Codex CLI / Python SDK (codex_app_server) into Studio as a new chat provider type. Hosts that don't have the CLI or the SDK installed never see the entry; on logged-out hosts the provider config dialog renders a device-auth Sign-in button that surfaces the verification URL and streams CLI progress back over SSE. Backend - new core/inference/codex_availability.py probes the CLI + SDK and reports {installed, logged_in, version, supported_models}; it never imports codex_app_server at module top level so the rest of the backend keeps starting cleanly on hosts that don't have the SDK. - new core/inference/codex_provider.py wraps AsyncCodex and translates Codex events into OpenAI chat-completion chunks. Supports the thread.run_streaming path with a non-streaming fallback for older SDK revs. - parallel_calls > 1 fans the turn out across N tasks (capped at 20) via asyncio.gather and emits codex_tab_open / codex_tab_chunk / codex_tab_close tool-events per attempt plus a final codex_gather synthesis event. A separate standalone Codex call produces the unified answer. - new routes/codex.py exposes GET /api/codex/status and POST /api/codex/login. The login route shells out to codex auth login --device-auth and streams events; the first event carries the verification URL so the frontend can window.open it. - ChatCompletionRequest gains a parallel_calls field bounded [1, 20] by pydantic. The codex registry entry stays hidden by default; the /api/codex/status probe is the authoritative gate. - routes/inference.py dispatches provider_type=codex through the local CLI/SDK pipeline instead of the standard HTTP client, with graceful error surfacing for CodexUnavailableError. Frontend - new api/codex-api.ts exposes fetchCodexStatus() and an async generator streamCodexDeviceLogin() that drives the SSE stream and yields parsed events. - new components/codex-parallel-tabs.tsx renders the tabbed parallel- calls UI with a Synthesis tab highlighted once the codex_gather event arrives. Pure reducer keeps the state transitions unit- testable. - new components/codex-login-button.tsx posts to /api/codex/login, opens the verification URL in a new tab via window.open, and shows the streamed CLI log as it lands. - external-providers.ts exports CODEX_PROVIDER_TYPE, CODEX_MAX_PARALLEL_CALLS, isCodexProviderType, and clampCodexParallelCalls. Codex is marked text-only so the composer hides image-attach affordances when selected. Tests - tests/test_codex_provider.py (14 cases) covers the availability probe across the four install / login states, the streaming + parallel-calls translation against a fake codex_app_server module injected into sys.modules, the [1, 20] pydantic clamp, the CodexUnavailableError surfacing path, and the parallel_calls=1 single-call shape (no tab tool-events).

for more information, see https://pre-commit.ci

test_health_response_reports_desktop_capability_fields builds a SimpleNamespace as a fake routes module so it can exercise main.health_check without standing the full app up. The stub listed every router name except codex_router, which lands in the main.py import block alongside the others as of this PR, so the import failed with 'cannot import name codex_router from <unknown module name>' on the Python 3.13 unit run. Add the codex_router slot to the stub.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0a4309fefd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-23T14:05:00Z

+    setLogs([]);
+    setDeviceUrl(null);
+    const controller = new AbortController();
+    abortRef.current?.abort();


Abort login stream on unmount

AbortController is only wired to startLogin, so closing the dialog or navigating away while busy leaves the /api/codex/login stream running until the CLI exits. In practice this keeps a server-side login process alive after the UI is gone and continues dispatching async updates to a detached component tree. Add unmount cleanup (and clear abortRef in finally) so in-flight login streams are canceled when the component is removed.

Useful? React with 👍 / 👎.

Followups on the post-merge review pass for the Codex SDK chat provider. Verified against codex-cli 0.133.0 + the upstream `openai/codex` Rust + Python sources, then pinned each fix with a regression test in `test_codex_provider.py` (24/24 passing). * Probe both `openai_codex` (canonical upstream Python package at `openai/codex/sdk/python`) and the legacy `codex_app_server` alias. Without this the availability probe always reported `sdk_importable: false` even when the SDK was installed, so the provider was permanently hidden. * Switch the device-auth and login-status invocations from `codex auth login --device-auth` / `codex auth status` to the real upstream subcommands `codex login --device-auth` and `codex login status`. The former path returns `unrecognized subcommand 'auth'` on a real CLI. * Strip ANSI control sequences before extracting the device URL (upstream wraps the URL in `\x1b[34m...\x1b[0m`) and tighten the pattern to the canonical `.../codex/device` shape. Also surface the one-time code as a `device_code` SSE event so the UI can show it alongside the URL. * Fix `_detect_logged_in` substring footgun: `"logged in" in combined` matched inside `"not logged in"`, flipping logged-out users to logged-in. Anchor on word boundaries with negative prefixes winning regardless of return code. * Cancel in-flight fan-out workers on SSE disconnect. Previously every parallel Codex turn ran to completion against a disconnected client and burned quota; now `_stream_codex_parallel` cancels its worker + drain tasks in a try/finally on `CancelledError`/`GeneratorExit`. * Tear down the device-login subprocess on disconnect via `start_new_session=True` + `os.killpg(SIGTERM)` (Unix) or `CREATE_NEW_PROCESS_GROUP` + `CTRL_BREAK_EVENT` (Windows), with a bounded `proc.wait()` and `proc.kill()` fallback. Previously `finally: await proc.wait()` blocked the SSE close path because `codex login --device-auth` only exits on user action. * Render the full conversation transcript in `_last_user_prompt` instead of returning only the most recent user message. The PR opens a fresh thread per request so prior assistant turns were dropped, degrading multi-turn chats to single-shot prompts. Single-turn input is unchanged. * Make `ChatCompletionRequest.parallel_calls` default to 1 (`int` with `ge=1, le=20`) instead of `Optional[int] = None`. The runtime already coerced `None` -> 1, but the schema now matches the documented `[1, 20]` range. * Replace the registry's hardcoded `default_models` (which contained `o3`, not in the upstream catalog) with the current `gpt-5.5 / 5.4 / 5.4-mini / 5.3-codex / 5.2` set from `codex-rs/models-manager/models.json`. * Stop echoing `str(exc)` in SSE error frames in both `routes/inference.py` and `routes/codex.py`. The Codex SDK can raise with local paths, env-var content, or traceback fragments (CodeQL `py/information-exposure-through-exception`). Surface a generic message + `exception_type` discriminator; log the full reason server-side via `logger.error(..., exc_type=..., error=...)`. Doc / comment updates throughout to refer to `codex login` / `openai_codex` rather than the older incorrect strings. Tested: pytest 24 cases in `test_codex_provider.py` (the original 14 + 10 new `TestCodexHardenedRegressions`) plus the rest of the Studio-backend test suite the PR touches (209 passing). Also verified live against Studio launched from this branch on a Blackwell B200 via `UNSLOTH_STUDIO_HOME=$WORKSPACE/temp/... ./install.sh --local` then a Playwright probe.

chatgpt-codex-connector · 2026-05-24T14:13:17Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

for more information, see https://pre-commit.ci

chatgpt-codex-connector · 2026-05-24T14:13:34Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

The OpenAI Codex Python SDK ships on PyPI as `openai-codex-app-server-sdk`, not `openai-codex` (which is the GitHub repo project name in pyproject.toml). The runtime binary ships separately as `openai-codex-cli-bin`. Both packages expose the import name `openai_codex`; the older docs reference `codex_app_server` so we keep probing both. Update the `CodexUnavailableError` message and the provider registry notes so a user hitting the unavailable path gets a copy-pasteable `pip install` command. No behaviour change.

chatgpt-codex-connector · 2026-05-24T14:26:06Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Post-review pass driven by reviewer.py. The original PR shipped the backend codex provider, the registry entry (with `hidden:true`), the status API, and the `CodexParallelTabs` component, but the chat UI never surfaced the row, required an API key for the connection, and never sent `parallel_calls` over the wire. Also fixes a CodeQL leak in the parallel fan-out error path and adds the canonical streaming hook upstream actually exposes. Frontend * chat-providers-dialog.tsx now calls `/api/codex/status` alongside `/api/providers/registry`. When the host has Codex installed the Add connection dialog gains a synthetic Codex row (curated model list comes from `supported_models`) so the picker is reachable. * The Add / Edit connection guards now skip the API-key requirement for Codex the same way they do for the custom OpenAI-compat presets; the field itself is also hidden so the user is not asked for a key Studio will not use. * chat-adapter.ts now also exempts Codex from the "Missing API key" pre-flight, and emits `parallel_calls` on the outgoing request when the selected connection is Codex (clamped to [1, 20] by the shared helper, defaults to 1). * external-providers.ts adds `codexParallelCalls` to ExternalProviderConfig so future composer UI can persist the user's pick per connection. Backend * `_stream_thread_run` now tries `thread.turn(prompt).stream()` first, mirroring the canonical openai_codex API (`openai/codex/sdk/python/src/openai_codex/api.py`). The legacy `thread.run_streaming(prompt)` path is kept as a fallback and the buffered `await thread.run(prompt)` stays as the last resort. * `_stream_codex_parallel` no longer echoes `str(exc)` in the `codex_tab_error` SSE event. Per-tab failures now surface a generic "Codex tab failed" message plus an `exception_type` discriminator; `CodexUnavailableError` is the only exception whose text is forwarded verbatim because it is a user-actionable install hint with no sensitive content (CodeQL `py/information-exposure-through-exception`). Tests * New `TestCodexHardenedRegressions::test_parallel_tab_error_sanitised` injects a fake SDK that raises with a path-like message and asserts the SSE frames do not echo it. * New `TestCodexHardenedRegressions::test_thread_turn_stream_path_taken` verifies the canonical `thread.turn(prompt).stream()` hook is preferred over the legacy helper. All 26 codex_provider tests pass. Frontend `tsc --noEmit` clean.

chatgpt-codex-connector · 2026-05-24T14:46:12Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

for more information, see https://pre-commit.ci

chatgpt-codex-connector · 2026-05-24T14:46:28Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Second reviewer.py pass surfaced three follow-ups missed in the earlier round. All caught by 12 parallel reviewers + cross-block audit; each fix is small but user-facing. * `testProvider` no longer pushes a Codex connection back to the edit form to "add an API key". Codex has no remote endpoint to ping, so the Test button now calls `/api/codex/status` directly: toasts success with the CLI version when installed+logged in, prompts to sign in when installed+logged out, and errors when the CLI or SDK is missing. * The Sign-in to Codex affordance is now actually mounted. When the selected provider is Codex and `/api/codex/status` reports `installed:true, logged_in:false`, the dialog renders the new `CodexLoginButton` above the (hidden) API key row. The button's `onLoggedIn` callback re-probes status so the UI flips to the ready state without a page reload. * The chat adapter now handles `codex_*` `_toolEvent` types instead of silently swallowing them. Per-tab chunks render inline with a `[Codex tab N/M]` header so users see each parallel attempt; `codex_gather` adds a `--- Synthesis ---` divider before the final unified content delta the backend also emits as plain text. This unblocks the existing fan-out path while a dedicated `CodexParallelTabs` UI is wired in a future change. Verified: 26/26 codex_provider tests pass; `tsc --noEmit` on studio/frontend completes clean.

…sloth into feat/codex-provider

chatgpt-codex-connector · 2026-05-24T15:01:56Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

…t login on unmount Third reviewer.py pass found three remaining sharp edges. Each fix is small and paired with a regression test where applicable. * Codex subprocess env is now scrubbed to a safe-list before spawn. Both `_run_cli` in codex_availability and the device-auth spawn in stream_codex_device_login switch from `env=os.environ.copy()` to `env=_codex_subprocess_env()`, which forwards only PATH / HOME / USER / Windows-equivalents / CODEX_HOME / OPENAI_API_KEY / OPENAI_BASE_URL. Other-provider secrets like HF_TOKEN, GH_TOKEN, WANDB_API_KEY, ANTHROPIC_API_KEY no longer reach the local codex binary, so a shimmed `codex` earlier on PATH cannot harvest them. * `_stream_thread_run` now tracks `emitted_any` and refuses to fall through to the buffered `await thread.run(prompt)` after either streaming helper has already yielded text. Previously a network glitch mid-stream re-executed the same Codex turn, which can duplicate file writes, shell commands, and other Codex side effects. The buffered path is now reserved for the zero-output case (no streaming helper resolved, or streaming returned empty). * `CodexLoginButton` now aborts the SSE reader on unmount via a useEffect cleanup that calls `abortRef.current?.abort()`. The underlying `codex login --device-auth` subprocess no longer keeps streaming (and holding a device-auth session) after the dialog closes. Two new pytest cases pin the behaviour: `test_codex_subprocess_env_scrubbed` sets HF/GH/WANDB/ANTHROPIC keys and asserts none reach the codex env while OPENAI_API_KEY / CODEX_HOME survive; and `test_partial_stream_failure_does_not_replay_turn` injects a fake `turn().stream()` that yields "partial output " then raises, and asserts `thread.run()` is never called. 28/28 codex_provider tests pass; `tsc --noEmit` clean.

chatgpt-codex-connector · 2026-05-24T15:16:50Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

for more information, see https://pre-commit.ci

chatgpt-codex-connector · 2026-05-24T15:17:03Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

…x preselect Two P1 fixes from the round 8 reviewer pass: 1. _stream_thread_run no longer replays a Codex turn that fired non-visible events before crashing. The replay guard only tracked `emitted_any` (visible text). A Codex turn that emitted, say, a command.delta or file.delta event first -- both filtered to "" by _coerce_text -- and THEN crashed would leave emitted_any=False and fall through to the buffered `thread.run(prompt)` fallback, re-executing the same turn and duplicating its side effects (shell commands, file writes, tool calls). This is exactly the case the guard was added to prevent in earlier rounds; the missing bit was tracking "the turn ran at all", not just "the turn yielded text". Fix: add a separate turn_started flag that flips True the moment we ask the SDK for a turn handle or observe any event from a streaming helper. When the buffered fallback is gated on turn_started instead of emitted_any, a partial-turn crash correctly stops without replaying. Regression test reproduces the bug against the pre-fix code (assertion catches the extra thread.run call) and locks the fix in. 2. openAddProvider now mirrors the providerType-change effect's Codex pre-check. The first-run UX fix from `26799d9a` pre-checked every Codex default model in the providerType-change effect, but openAddProvider() calls resetForm() (which clears selectedModelIds) and then only restores availableModels, not selectedModelIds. If the user closes the Add connection form and re-opens it while Codex is still the current providerType, the effect does not re-run, so the form opens with Codex defaults available but none selected -- the "Add at least one model ID" save guard then blocks the Save click. Fix: openAddProvider now seeds selectedModelIds with the full default-models list when the provider is Codex, matching the providerType-change effect so the two entry paths produce the same first-run state.

chatgpt-codex-connector · 2026-05-25T15:24:59Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

for more information, see https://pre-commit.ci

chatgpt-codex-connector · 2026-05-25T15:25:14Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

danielhanchen · 2026-05-25T15:35:35Z

Round 8 reviewer pass surfaced two more P1 issues; one more commit lands both fixes.

b17765b5 (round 8): replay guard on non-visible events + Add-flow Codex preselect

_stream_thread_run could replay a partial Codex turn after non-visible events. The earlier replay guard only tracked emitted_any (visible text). A Codex turn that emitted a command.delta / file.delta / tool_call.delta event first -- all filtered to "" by _coerce_text -- and then crashed would leave emitted_any False and fall through to the buffered thread.run(prompt) fallback, re-executing the same turn and duplicating shell commands or file writes upstream. The fix tracks a separate turn_started flag that flips True the moment we ask the SDK for a turn handle OR observe any event from a streaming helper; the buffered fallback is gated on turn_started instead, so a partial-turn crash stops cleanly without replay. Regression test reproduces the bug against the pre-fix code (thread.run was called after a partial-turn crash assertion catches the extra call) and locks the fix in.
openAddProvider was clobbering the Codex auto-preselect. The round 7 fix (26799d9a) pre-checked every Codex default model in the providerType-change useEffect. But openAddProvider() calls resetForm() (which clears selectedModelIds) and then only restores availableModels. If the user closes the Add connection form and re-opens it while Codex was already providerType, the effect did not re-run, so the form opened with Codex defaults available but none selected and the "Add at least one model ID" save guard blocked the click. openAddProvider now mirrors the effect's Codex pre-check, matching the same first-run state on both entry paths.

Test counts after b17765b5:

tests/test_codex_provider.py: 60 passed (50 round 6 -> 58 round 7b -> 60 round 8; 10 new regression tests across the round 7 / 7b / 8 commits, covering the cross-wrapper env scrub, device URL allowlist, log-filter blocklist, and replay-on-non-visible-events).

Round 8 reviewer-finding summary:

11 of 12 reviewers reported additional issues; 10 of those were follow-ups on areas this PR already touches.
The two P1s above are fixed.
Two reviewers (10 and 12) flagged normalizeProvider allegedly dropping codexParallelCalls. Reproducing the path in Node with the actual normalizeProvider body, codexParallelCalls survives via the ...raw spread; the explicit overrides do not touch it. Not actionable from the source as-written.
Other reviewer notes (Windows subprocess launch, dead "tab" component, accidental main reverts) are tracked but lower severity than the two P1s landed here.

Resolve three conflicts touched by main since the branch forked: - studio/backend/core/inference/external_provider.py: take main's rewrite of _anthropic_citation_key (extended dedup keys covering end-char/page/block indices plus search_result_index) and the new _anthropic_supports_fast_mode helper. The branch had only the earlier shorter citation_key form so accepting main wholesale here loses nothing Codex-related. - studio/backend/models/inference.py: keep BOTH the Codex parallel_calls field + _clamp_parallel_calls validator (from the branch) AND the new Anthropic fast_mode field (from main). They occupy different provider lanes. - studio/frontend/src/features/chat/api/chat-adapter.ts: fold the Codex per-tab rendering pass (renderFullContent) into main's new orderAssistantContent positioning so tools land before text, generated images after, AND any Codex tab text accumulated in earlier _toolEvent frames is preserved through the synthesis content delta (round 8 render-order fix). Backend codex tests: 60/60 passing. Anthropic citations / fast_mode tests: 96/96 passing. Frontend builds cleanly.

…sloth into feat/codex-provider

chatgpt-codex-connector · 2026-05-27T06:53:05Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

1. Codex SSE wrapper terminates on exact `data: [DONE]` only. The old substring check `if "[DONE]" in line` would flip sent_done True when a normal model response carried the literal text "[DONE]" in delta.content (for example an explanation of the OpenAI stream sentinel). The real terminator was then suppressed, leaving OpenAI-compatible clients that finalise on the explicit sentinel hung on stream close. Now compares the stripped line to the exact `data: [DONE]` form. 2. Legacy `thread.run_streaming` path no longer returns an empty reply on completion-only streams. If the SDK exposes `thread.run_streaming` but the stream emits ONLY item.completed / agentMessage events with no message deltas, the loop previously exited with emitted_any False and never reached the agent-message fallback. The request returned 200 with an empty assistant reply even though Codex produced a final answer. Mirror the canonical-path behavior: collect `_completed_agent_message_text` strings in a sidecar list and emit the last one when no deltas arrived. Match the canonical payload-extraction (`getattr(event, "payload", event)`) so the event-vs-payload SDK shape difference is handled the same way in both branches. 3. Parallel-calls fan-out propagates CodexUnavailableError so the route layer can return 503. When the SDK is not importable or the safety enums are missing without the dev opt-in, every worker raised the same CodexUnavailableError. The previous catch-all converted the error into a per-tab codex_tab_error event, the outer stream never raised, and clients saw a 200 with only tool events and an empty synthesis -- OpenAI-compatible consumers that ignore _toolEvent saw a successful empty reply. Now CodexUnavailableError re-raises out of the worker (no spurious per-tab error event), _await_workers re-raises it when EVERY worker hit the same setup failure, and the finally-block drain await propagates the exception out of the parallel function so the route's existing CodexUnavailableError handler can emit the right 503 SSE error frame. Per-tab runtime failures (model rejected, timeout, mid- stream SDK crash) still get swallowed into codex_tab_error events so a single bad model in the fan-out does not kill the others. Test counts: 63/63 passing (60 round 6-8 plus 3 new round 9 regression tests). Each new test was first run against a `git stash`-restored pre-fix tree to confirm it catches the bug, then run against the patched tree.

chatgpt-codex-connector · 2026-05-27T06:58:17Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

danielhanchen · 2026-05-27T06:58:37Z

Merged main into the branch and resolved three conflicts: _anthropic_citation_key (took main's extended dedup keys), ChatCompletionRequest (kept both the Codex parallel_calls field + clamp validator and main's new Anthropic fast_mode field), and chat-adapter.ts (folded the Codex per-tab renderFullContent() into main's new orderAssistantContent ordering). Backend Codex tests are 60/60 green, Anthropic citation / fast-mode tests are 96/96 green, frontend builds cleanly.

Then addressed three new P2 findings the Codex bot left on be15c57f / 26799d9a in e2b7f595:

P2 Codex SSE wrapper sentinel match (routes/inference.py:1941): the substring check "[DONE]" in line would flip sent_done on a delta.content containing the literal text [DONE] and suppress the real terminal frame. Now compares the stripped line to the exact data: [DONE] form.
P2 legacy run_streaming completion-only stream: when a legacy SDK exposes thread.run_streaming but the stream emits only item.completed / agentMessage events with no message deltas, the loop used to exit with emitted_any=False and never reach the agent-message fallback. The request returned 200 with an empty assistant reply even though Codex produced a final answer. Mirror the canonical-path behavior in the legacy branch: collect _completed_agent_message_text strings and emit the last one if no deltas arrived. Match the canonical payload-extraction so the event-vs-payload SDK shape difference is handled the same way in both branches.
P2 parallel-calls fan-out swallowed CodexUnavailableError: when the SDK is not importable or safety enums are missing, every worker raises the same CodexUnavailableError. The previous catch-all converted that into a per-tab codex_tab_error event, so non-tool-aware clients saw a successful empty reply. Now CodexUnavailableError re-raises out of the worker (no spurious per-tab event), _await_workers re-raises when EVERY worker hit the same setup failure, and the finally-block drain await propagates it out of the parallel function so the route's existing handler emits a proper 503 SSE error frame.

Test count after the round 9 commit: 63/63 passing in tests/test_codex_provider.py. Each new regression test was first run against a git stash-restored pre-fix tree to confirm it catches the bug, then re-run against the patched tree.

for more information, see https://pre-commit.ci

chatgpt-codex-connector · 2026-05-27T06:59:12Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

danielhanchen · 2026-05-27T13:02:43Z

End-to-end Codex screenshots for this PR, captured against the latest
head with the real openai_codex SDK + real codex CLI on PATH.

1. Add the Codex connection (final probe, head `26799d9a` with the

default-models prefix fix)

Empty Studio after login:

Pick "OpenAI Codex (local CLI)" in the Add Connection dropdown -- the
form now opens with all five default models (gpt-5.5, gpt-5.4,
gpt-5.4-mini, gpt-5.3-codex, gpt-5.2) pre-checked and "5 models
selected", so one save click is enough:

After save, the Connections list shows the new Codex entry with all
five models attached:

The model picker's "Connected" tab surfaces every Codex model under an
OPENAI CODEX (LOCAL CLI) heading:

Selected model in the chat header:

2. Multi-turn chat works

Turn 1: user says "My favorite color is teal. Reply with only the word
OK." -- Codex replies OK:

Turn 2 on the same thread: "What was my favorite color? Reply with
only the color name, no other words." -- Codex replies teal, so the
thread history is being forwarded correctly:

Full thread visible after both turns:

3. Parallel calls fan-out + synthesis

Connection edit form -- Parallel calls is in the SDK schema with help
text, range 1-20:

Bumped from 1 to 2 and saved:

Fan-out on a real prompt ("Reply with three short sentences about LoRA
fine tuning") shows [Codex tab 1/2] and [Codex tab 2/2] rendered
in order, then --- Synthesis ---, then the unified synthesis -- the
render-order fix from b7862388 is in effect:

Render-order zoom (separate single-prompt probe -- HELLO example):

4. Side-by-side before/after collages

Before vs after the multi-turn fix-set:

Before vs after the parallel-calls synthesis render-order fix:

All shots are captured live on the PR head with the actual SDK and
codex CLI -- no mocks. Anything containing token fields or other
secret slots was excluded from the upload.

# Conflicts: # studio/frontend/src/features/chat/api/chat-adapter.ts

chatgpt-codex-connector · 2026-05-27T13:24:17Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

for more information, see https://pre-commit.ci

chatgpt-codex-connector · 2026-05-27T13:24:42Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Two paired changes that finally make the Codex parallel-calls fan-out visible as actual clickable tabs in the chat surface, plus a credit- free spoof that lets the whole pipeline run in dev / CI without ever touching the upstream API. 1. Real tab UI (frontend). The chat-adapter used to render the per-worker outputs as inline `[Codex tab 1/N] ...` text blocks in the assistant message body, which collapsed into one big run-on block once more than a handful of tokens had streamed. Now each `codex_*` SSE event is folded into `codexParallelState` and re-published as the `args.state` of a single tool-call part with `toolName === "codex_parallel"`. The assistant-ui surface dispatches that to the new `CodexParallelToolUI` wrapper, which mounts the existing `CodexParallelTabs` component -- one tab per worker, one Synthesis tab, click to switch. The stable `toolCallId` keeps assistant-ui updating the SAME card across stream yields rather than spawning new cards. `renderCodexTabsBlock` now returns the empty string so the message body no longer contains the labelled-text fallback (kept the function name so the rest of the adapter's `renderFullContent` / pin-signature paths are untouched). 2. Credit-free Codex SDK spoof (backend). New `studio/backend/core/inference/codex_spoof.py` exposes a drop-in subset of the upstream `openai_codex` surface (`AsyncCodex`, `AppServerConfig`, `ApprovalMode.deny_all`, `SandboxMode.read_only`, thread with `turn().stream()` + `run_streaming()` + `run()`) and emits deterministic per-tab streaming events tagged with the worker index, so flipping between tabs in the UI shows visibly distinct text. Activated by `UNSLOTH_CODEX_SPOOF=1`; `_import_codex` installs the spoof into `sys.modules` under both `openai_codex` and `codex_app_server` and the rest of the provider keeps running unchanged. OFF by default; production is unaffected. Six new tests cover the spoof itself (module install, env-flag gating, delta + completion event shape, per-tab tagging, provider import path, safety-kwargs resolution against the spoof). 69/69 tests pass with and without the flag; TypeScript clean.

…sloth into feat/codex-provider

for more information, see https://pre-commit.ci

danielhanchen · 2026-05-27T13:31:27Z

Two paired changes pushed on feat/codex-provider:

Merge conflict resolution (ecf8bc76)
Merged main into the branch (latest Gemini provider + 10 other PRs). One conflict in studio/frontend/src/features/chat/api/chat-adapter.ts across 6 hunks (the Codex predicates we added on this branch vs main's Gemini custom-base predicates and tweaked comments). Combined both: API-key gate now skips hosted-key checks for local providers, custom providers, Codex local CLI, AND Gemini custom OpenAI-compat bases; renderFullContent() / pinTextThoughtSignature() are composed so Codex per-tab text AND Gemini thoughtSignature both survive into the final yield. TypeScript clean. 69/69 backend tests pass.

Real Codex parallel-call tab UI + credit-free SDK spoof (f01011e4)

Real tab UI. The chat-adapter used to render the per-worker outputs as inline [Codex tab 1/N] text in the assistant message body. Now each codex_* SSE event is folded into a CodexParallelState and re-published as the args.state of a single tool-call part (toolName: "codex_parallel", stable toolCallId). The assistant-ui surface dispatches that to a new CodexParallelToolUI wrapper that mounts the existing CodexParallelTabs component: one tab per worker, plus a Synthesis tab, click to switch, read-only for now (matches the spec). The dead codex-parallel-tabs.tsx component is no longer dead.
Credit-free SDK spoof. New studio/backend/core/inference/codex_spoof.py exposes a drop-in subset of the upstream openai_codex API surface (AsyncCodex, AppServerConfig, ApprovalMode.deny_all, SandboxMode.read_only, thread with turn().stream() + run_streaming() + run()). Emits deterministic per-tab streaming events tagged with the worker index, so flipping between tabs in the UI shows visibly distinct text. Activated by UNSLOTH_CODEX_SPOOF=1; _import_codex installs it into sys.modules under both canonical names. OFF by default; production unaffected.

Six new tests cover the spoof itself (module install + spec, env-flag gating, delta + completion event shape, per-tab tagging, provider import path, safety-kwargs resolution). Full count: 69/69 tests pass with and without UNSLOTH_CODEX_SPOOF=1. TypeScript build clean.

Playwright capture of the new tab UI driven by the spoof is up next.

chatgpt-codex-connector · 2026-05-27T13:31:44Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

# Conflicts: # studio/backend/main.py # studio/backend/routes/__init__.py

chatgpt-codex-connector · 2026-05-27T14:11:03Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

When ``UNSLOTH_CODEX_SPOOF=1`` is exported (the credit-free dev / CI path the previous commit added), the in-process spoof IS the Codex SDK and a real ``codex`` CLI is irrelevant. The status endpoint at ``/api/codex/status`` used to gate ``installed`` on the real CLI + real SDK only, which made the frontend hide the Codex provider in the connections dropdown even when the spoof was active. Now both ``_sdk_importable`` and ``probe_codex_availability`` short-circuit on ``codex_spoof.is_spoof_enabled()`` so the provider becomes visible under the spoof. ``installed=True``, ``cli_path="<spoof>"``, ``logged_in=True``, ``version="spoof"`` -- a sentinel that lets devs read off "yes I am under the spoof" at a glance. Real production code path (no spoof flag) is unchanged: still gates on bool(cli_path) AND sdk_ok the same as round 6. Tests: added an autouse fixture in ``test_codex_provider.py`` that clears ``UNSLOTH_CODEX_SPOOF`` before every test so the existing availability / import gating tests are not polluted when a dev runs the suite with the flag exported. The spoof-targeted tests still call ``monkeypatch.setenv(...)`` to flip it back on inside their own scope. 69/69 pass with and without the env flag.

for more information, see https://pre-commit.ci

chatgpt-codex-connector · 2026-05-27T14:44:48Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

chatgpt-codex-connector · 2026-05-27T14:44:50Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

danielhanchen · 2026-05-27T14:46:48Z

Another round on feat/codex-provider:

Merge -- main moved forward (Gemini provider #5720 + MCP servers #5750). Resolved conflicts in chat-adapter.ts (6 hunks) and studio/backend/{main.py,routes/__init__.py} (kept both codex_router and mcp_servers_router registrations). Branch is back to clean against main.

Code -- the Codex availability probe now honours UNSLOTH_CODEX_SPOOF=1 end-to-end (6403846b). Before, the spoof would emit per-tab text correctly but the UI hid the Codex provider because /api/codex/status still required a real CLI on PATH. Now installed=True + cli_path="<spoof>" + version="spoof" so the connection dropdown surfaces the entry. Test fixture clears the env var by default so the gating tests stay deterministic.

Demo screenshots (under tab_ui_round/):

Live UI flow against the spoofed Studio:

Connection form opens with Codex pre-selected, default models pre-checked, parallel-calls input ready:

Parallel calls bumped to 3, 5 models still selected:

The actual CodexParallelTabs component mounted with mock parallel state -- one tab per worker, click to switch, Synthesis tab highlighted:

Tab 2 -- visibly different worker text (this is the point of the per-tab tagging in the spoof, mirrors what real Codex parallel calls produce when each worker explores a different angle):

Tab 3:

Synthesis tab -- highlighted in green when ready, unified answer below the tab strip:

Mid-stream view (some tabs still streaming):

Error path -- per-tab failure isolated, other tabs still complete and synthesis still arrives:

The demo screenshots use mock CodexParallelState injected via Playwright to bypass the chat-stream flow; the visible card layout is the same DOM the chat-adapter emits at runtime via the new codex_parallel tool-call part. 69/69 tests still green. No real Codex tokens were consumed (spoof active).

danielhanchen requested a review from rolandtannous as a code owner May 22, 2026 16:46

chatgpt-codex-connector Bot reviewed May 22, 2026

View reviewed changes

github-advanced-security AI found potential problems May 22, 2026

View reviewed changes

Comment thread studio/backend/routes/inference.py

yield "data: [DONE]\n\n"

return StreamingResponse(

_codex_stream(),

danielhanchen and others added 4 commits May 23, 2026 14:00

[pre-commit.ci] auto fixes from pre-commit.com hooks

ea7ae85

for more information, see https://pre-commit.ci

wip: anthropic citation helper

4188f91

danielhanchen force-pushed the feat/codex-provider branch from bc8134e to 0a4309f Compare May 23, 2026 14:00

chatgpt-codex-connector Bot reviewed May 23, 2026

View reviewed changes

github-advanced-security AI found potential problems May 23, 2026

View reviewed changes

Comment thread studio/backend/routes/codex.py Fixed

[pre-commit.ci] auto fixes from pre-commit.com hooks

861da31

for more information, see https://pre-commit.ci

danielhanchen added 2 commits May 24, 2026 14:25

Merge branch 'main' into feat/codex-provider

0d904d6

[pre-commit.ci] auto fixes from pre-commit.com hooks

e2ac490

for more information, see https://pre-commit.ci

danielhanchen added 2 commits May 24, 2026 15:01

Merge branch 'feat/codex-provider' of https://github.com/unslothai/un…

1f7cad4

…sloth into feat/codex-provider

[pre-commit.ci] auto fixes from pre-commit.com hooks

4b4c855

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

5185032

for more information, see https://pre-commit.ci

danielhanchen added 2 commits May 27, 2026 06:52

Merge branch 'feat/codex-provider' of https://github.com/unslothai/un…

3bbbd41

…sloth into feat/codex-provider

[pre-commit.ci] auto fixes from pre-commit.com hooks

dd0aeec

for more information, see https://pre-commit.ci

Merge remote-tracking branch 'origin/main' into feat/codex-provider

ecf8bc7

# Conflicts: # studio/frontend/src/features/chat/api/chat-adapter.ts

[pre-commit.ci] auto fixes from pre-commit.com hooks

3d1075f

for more information, see https://pre-commit.ci

danielhanchen and others added 3 commits May 27, 2026 13:30

Merge branch 'feat/codex-provider' of https://github.com/unslothai/un…

10d6b69

…sloth into feat/codex-provider

[pre-commit.ci] auto fixes from pre-commit.com hooks

cd8284d

for more information, see https://pre-commit.ci

Merge remote-tracking branch 'origin/main' into feat/codex-provider

6867cfb

# Conflicts: # studio/backend/main.py # studio/backend/routes/__init__.py

danielhanchen and others added 2 commits May 27, 2026 14:44

[pre-commit.ci] auto fixes from pre-commit.com hooks

c716af4

for more information, see https://pre-commit.ci

		} else if (!error) {
		setError("Codex login did not complete -- see log for details.");

Uh oh!

Conversation

danielhanchen commented May 22, 2026

Summary

Test plan

Limitations

Uh oh!

gemini-code-assist Bot commented May 22, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 25, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 25, 2026

Uh oh!

danielhanchen commented May 25, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 27, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 27, 2026

Uh oh!

danielhanchen commented May 27, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 27, 2026

Uh oh!

danielhanchen commented May 27, 2026

1. Add the Codex connection (final probe, head 26799d9a with the

2. Multi-turn chat works

3. Parallel calls fan-out + synthesis

4. Side-by-side before/after collages

Uh oh!

chatgpt-codex-connector Bot commented May 27, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 27, 2026

Uh oh!

danielhanchen commented May 27, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 27, 2026

1. Add the Codex connection (final probe, head `26799d9a` with the