Studio: add Gemini provider with web_search, code_execution, prompt caching, and Nano Banana image generation by danielhanchen · Pull Request #5720 · unslothai/unsloth

danielhanchen · 2026-05-22T16:36:52Z

Summary

Adds Google Gemini as a first-class chat provider in Studio.

New native translator _stream_gemini in studio/backend/core/inference/external_provider.py that converts OpenAI Chat Completions request/response to and from Gemini's contents / parts REST shape on POST /v1beta/models/{model}:streamGenerateContent?alt=sse.
Provider registry uses the native base URL https://generativelanguage.googleapis.com/v1beta with x-goog-api-key authentication. Legacy /v1beta/openai Gemini providers saved by older Studio builds are auto-rewritten to the native path (only when the host is generativelanguage.googleapis.com; custom proxy paths ending in /openai are left untouched).
Surfaces the current Gemini lineup: 3.5 Flash, 3.5 Pro, 3.1 Pro Preview, 3.1 Flash Lite, 3-pro / 3-flash previews plus the *-latest aliases, and the Nano Banana family (2.5-flash-image, 3.1-flash-image-preview = "Nano Banana 2", 3-pro-image-preview = "Nano Banana Pro").
Thinking control split by family per the Gemini docs:
- Gemini 3.x family uses thinkingConfig.thinkingLevel (MINIMAL / LOW / MEDIUM / HIGH). Pro tier rejects MINIMAL, so "off" coerces to LOW; Flash tier coerces "off" to MINIMAL (Gemini 3 cannot fully disable thinking).
- Gemini 2.5 keeps the integer thinkingBudget ladder. 2.5 Pro coerces 0 to a small positive budget because the API 400s with "only works in thinking mode". 2.5 Flash-Lite uses 512 as its lowest positive budget per the API floor.
- Image models skip thinking config entirely.
Server tools wired natively:
- tools: [{googleSearch: {}}] for web search, with grounding citations surfaced through the same tool_start / tool_end + sources envelope OpenAI / Anthropic use.
- tools: [{codeExecution: {}}] for sandboxed Python; executableCode and codeExecutionResult parts emit as code_execution tool events with the native id and thoughtSignature stowed on google.native_part for follow-up replay.
- tools: [{functionDeclarations: [...]}] for caller-supplied OpenAI-shape tools + tool_choice translated to toolConfig.functionCallingConfig.mode / allowedFunctionNames.
- Google Search grounding is allowed on the documented Gemini 3 image models (gemini-3-pro-image-preview, gemini-3.1-flash-image-preview, nano-banana-pro). Older image ids continue to drop text tools (Search as tool is not enabled for this model).
Multimodal:
- Inline image_url parts on user messages map to inlineData (base64) or fileData with MIME guessed from the URL path so remote PNG / WebP / GIF inputs are not relabeled as JPEG.
- Gemini 3 image-editing replay groundwork: text deltas and inline image tool_end events carry extra_content.google.thought_signature so future multi-turn editing can echo the signature back per the Gemini contract.
- Nano Banana inline image bytes surface as the existing image_b64 / image_mime tool_end envelope so the chat UI renders them inline with no extra plumbing.
functionCall parts translate to OpenAI-shape delta.tool_calls, distinct indices per call so parallel turns reassemble cleanly. finishReason="STOP" is rewritten to tool_calls when any function call was emitted so OAI clients trigger tool execution. The chat-adapter consumes these tool_calls deltas so user-supplied function calls render even when no text is emitted.
usageMetadata translation: promptTokenCount + toolUsePromptTokenCount -> input tokens; candidatesTokenCount + thoughtsTokenCount -> output tokens; thoughtsTokenCount surfaces as completion_tokens_details.reasoning_tokens; cachedContentTokenCount -> cached input detail.
Prompt caching: when enable_prompt_caching is a string, it forwards as cachedContent so callers that create CachedContent resources out of band can reference them by name. Boolean is silently ignored on Gemini (no implicit cache create).
Prompt-level safety blocks (promptFeedback.blockReason) surface as a content_filter error event. If a synthetic web_search tool_start was already emitted, a matching tool_end is paired before the error so the UI does not leave a "searching..." spinner stuck on screen.
Stream cleanup uses a single try / finally that always closes both the response and the manual aiter_lines() iterator on normal, prompt-block, and cancellation exits (no more RuntimeWarning: coroutine method 'aclose' of 'Response.aiter_lines' was never awaited).
Composer pills mirror the backend image-mode gate: when Gemini image generation is active on a text model, Search and Code pills are disabled so the request, builder, and active chips agree with what the backend actually sends.

Test plan

PYTHONPATH=studio/backend python -m pytest studio/backend/tests/test_gemini_provider.py -q (62 tests, all green)
Studio backend regression: 1828 passed, 46 skipped (the 2 pre-existing failures - test_help_output, qwen3.5 flash-attn tilelang - are unrelated to this PR)
cd studio/frontend && npx tsc -b --pretty false clean
Live e2e against https://generativelanguage.googleapis.com/v1beta covering single-turn, multi-turn, googleSearch grounding, codeExecution, image generation, function calling, thinkingLevel (Gemini 3) and thinkingBudget (Gemini 2.5) variants
Frontend Sources panel rendering for Gemini groundingMetadata citations: follow-up

References

gemini-code-assist · 2026-05-22T16:36:57Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 68379b6534

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-22T16:40:34Z

+                async for line in self._stream_gemini(
+                    messages,
+                    model,
+                    temperature,
+                    top_p,
+                    max_tokens,
+                    top_k,
+                    enabled_tools,
+                    enable_prompt_caching,


Forward Gemini presence penalty from chat request

presence_penalty is accepted on the external chat request path and Gemini is marked as supporting it in provider-capabilities.ts, but the Gemini dispatch call drops that argument and _stream_gemini never writes generationConfig.presencePenalty. As a result, changing the Presence Penalty control has no effect for Gemini runs, which makes tuning and reproducibility misleading for users who expect that knob to be applied.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-22T16:40:34Z

+                tool_name = msg.get("name") or msg.get("tool_name") or ""
+                response_payload: Any
+                if isinstance(content, str):
+                    try:
+                        response_payload = _json.loads(content)
+                    except Exception:
+                        response_payload = {"result": content}
+                else:
+                    response_payload = content or {}
+                parts = [
+                    {
+                        "functionResponse": {
+                            "name": tool_name,


Derive functionResponse name when tool message omits name

This branch builds Gemini functionResponse using msg.name/msg.tool_name and falls back to an empty string, but tool-result messages are commonly keyed by tool_call_id without a required name field. In that case we emit functionResponse.name: "", which can break function-calling follow-up turns against Gemini because the tool result is no longer tied to a valid function name.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-22T16:40:34Z

+        if isinstance(enable_prompt_caching, str) and enable_prompt_caching:
+            body["cachedContent"] = enable_prompt_caching


Don’t advertise Gemini caching without wiring a cache id

Gemini prompt caching is now exposed in the UI, but this backend path only sets cachedContent when enable_prompt_caching is a non-empty string. The normal external request model and chat adapter send enable_prompt_caching as a boolean, so toggling caching for Gemini never changes the outbound request and silently does nothing.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cd582df889

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-22T17:12:33Z

+        # OpenAI Chat Completions on this base. Requests/responses are
+        # translated in `_stream_gemini` in external_provider.py.
+        # API reference: https://ai.google.dev/gemini-api/docs
+        "base_url": "https://generativelanguage.googleapis.com/v1beta",


Parse Gemini native model catalog before switching base URL

Changing Gemini to the native https://generativelanguage.googleapis.com/v1beta endpoint routes /models to Gemini’s native schema, but our model discovery path still only consumes OpenAI-style data[].id in ExternalProviderClient.list_models(). That leaves Gemini model fetches empty in /api/providers/models (and provider test counts at 0), so users lose dynamic model discovery and rely only on hardcoded defaults. Add native parsing (models[] with name/baseModelId) or mark Gemini as curated-only to avoid silently empty catalogs.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-22T17:12:33Z

+                            "functionCall": {
+                                "name": fn_name,
+                                "args": args,
+                            }


Preserve Gemini function call IDs across tool round-trips

The Gemini translation strips call identifiers when converting OpenAI tool calls into functionCall parts, so follow-up role=tool responses can only be matched by function name. This becomes ambiguous when one assistant turn issues multiple calls to the same function (different args but same name), which can mis-associate tool outputs in subsequent model context. Forward the OpenAI tool-call id into Gemini functionCall.id and mirror tool_call_id into functionResponse.id.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a2ed444404

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-22T17:20:47Z

+                                                    "delta": {
+                                                        "tool_calls": [
+                                                            {
+                                                                "index": 0,


Emit distinct tool_call indices for each Gemini function call

Do not hardcode tool_calls[0].index to 0 for every emitted Gemini functionCall. When one assistant turn contains multiple function calls, all streamed chunks collapse into the same slot in consumers that reassemble by index (including our own delta.tool_calls assemblers), so only one call survives or arguments get merged incorrectly. This breaks parallel/multi-tool execution for Gemini even when unique ids are present; each call needs a stable, distinct index per turn.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2b05fa73ba

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-22T17:43:42Z

+                                        fc_name = fc.get("name") or ""
+                                        fc_args = fc.get("args") or {}
+                                        fc_id = (
+                                            fc.get("id")
+                                            or f"call_{fc_name}_{time.time_ns()}"


Forward thoughtSignature in Gemini function calls

When Gemini thinking models emit tool calls, the functionCall part can include thoughtSignature that must be replayed on the next turn with the corresponding tool result; this translation only reads name/args/id and drops the signature, so the follow-up request cannot reconstruct a valid functionCall history and can fail with Gemini 400 errors on tool round-trips. This impacts multi-turn function-calling flows on Gemini 2.5/3-class models even when IDs are preserved.

Useful? React with 👍 / 👎.

…aching, and Nano Banana image generation Wires Google's native Gemini API into Studio's external-provider stack so users can pick gemini-2.5-pro / gemini-2.5-flash / gemini-2.5-flash-image (Nano Banana) alongside the existing OpenAI / Anthropic / OpenRouter providers. Gemini does not speak OpenAI Chat Completions on its primary endpoint; the new `_stream_gemini` async generator translates between the two shapes the same way `_stream_anthropic` handles the Messages API. Backend: - New `_stream_gemini` translator in external_provider.py. Converts OpenAI messages -> Gemini `contents` + `systemInstruction`; maps generationConfig (temperature / topP / topK / maxOutputTokens); forwards `tools: [{googleSearch: {}}]` for web_search and `{codeExecution: {}}` for code_execution; passes `cachedContent` through for prompt caching; sets `responseModalities=[TEXT, IMAGE]` for Nano Banana image generation. - Translates streamed `GenerateContentResponse` SSE frames back into OpenAI chat.completion.chunk frames (text deltas, function_call -> tool_calls deltas, inlineData -> image_b64 tool_end envelope, usage chunk before [DONE]). - Registry entry switched to native base URL `https://generativelanguage.googleapis.com/v1beta` with `openai_compatible: False` and the `x-goog-api-key` auth header. Model lineup curated to current 2.5 / 2.0 family + Nano Banana. Frontend: - Provider-capability matrix: Gemini supports temperature, top_p, top_k, presence_penalty (matches generationConfig); min_p / repetition_penalty hidden because the API does not accept them. - `providerSupportsBuiltinWebSearch` / `providerSupportsBuiltinCodeExecution` / `providerSupportsBuiltinImageGeneration` extended for Gemini. - Prompt caching toggle now also lit on Gemini. Tests: - 21 new tests in `test_gemini_provider.py` using httpx.MockTransport. Cover request body shape conversion, URL/header wiring, web_search forwarded as googleSearch, function-call translation both directions, prompt caching passthrough, image generation emitting image_b64, grounded-search citations -> tool_end, finish_reason mapping, and vision data URL -> inlineData translation.

for more information, see https://pre-commit.ci

…from tool_call_id Two follow-up fixes for the Gemini provider: * Thread presence_penalty into _stream_gemini and set generationConfig.presencePenalty when non-zero. The OpenAI-side capability matrix already exposes the slider for Gemini, so the value was being collected and silently dropped on the way out. * When an OpenAI role=tool message omits 'name' and only carries 'tool_call_id', recover the function name from the matching functionCall on the prior assistant turn. Gemini 400s on an empty functionResponse name.

for more information, see https://pre-commit.ci

…ents The Gemini stream parser only handled text/functionCall/inlineData parts, so when the user toggled the Code pill on a Gemini model the sandbox output (executableCode + codeExecutionResult parts) was dropped on the floor while adjacent text reached the UI. Reviewers flagged this as the headline feature being silently broken. Translate both parts into the existing code_execution tool envelope that CodeExecutionToolUI already consumes for OpenAI / Anthropic: * executableCode -> tool_start with kind=code_execution and the source code under arguments.code. We mint a tool_call_id and stash it so the matching result block can pair to it. * codeExecutionResult -> tool_end on that id with the stdout under result. Non-OK outcomes (OUTCOME_FAILED / OUTCOME_DEADLINE_EXCEEDED) are prefixed onto the text so the failure is visible.

for more information, see https://pre-commit.ci

…che claim Three follow-ups to the Gemini provider PR after the codex pass: * list_models() now translates Gemini's native /v1beta/models payload ({models[{name, baseModelId, displayName, supportedGenerationMethods}]}) into the OpenAI-compatible shape Studio expects. Without this the picker stayed empty for Gemini and fell back to hardcoded defaults. Embedding-only models are filtered out. * Forward the OpenAI tool_call id into Gemini's functionCall.id and mirror it onto functionResponse.id. Two parallel calls to the same function name can now be paired unambiguously on the follow-up turn. * Drop Gemini from the prompt-caching capability set. The wire flow requires a separate cachedContents POST first and the boolean Studio emits today is a no-op; the toggle should not advertise a feature it cannot apply. Leaves a pointer to the docs for the eventual two-step orchestration.

for more information, see https://pre-commit.ci

Codex flagged that the Gemini stream parser hardcoded tool_calls[0].index to 0 on every emitted functionCall. OpenAI reassemblers key tool_calls by index when joining deltas, so two parallel function calls in one assistant turn collapsed onto a single slot and the second call's arguments overwrote the first. Track the running count via len(emitted_function_call_ids) - 1 and emit it as the per-call index. The dedupe guard above (skip when fc_id already in the set) means the index is monotonic and stable for the lifetime of the stream. Regression test asserts [0, 1] across two parallel calls in one candidate parts list.

for more information, see https://pre-commit.ci

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 696728d53f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-23T14:07:18Z

+    // every 2.x model. Backend translation lives in `_stream_gemini`,
+    // citations surface on the response's `groundingMetadata`. See
+    // https://ai.google.dev/gemini-api/docs/grounding.
+    providerType === "gemini"


Gate Gemini web search by model capability

providerSupportsBuiltinWebSearch now enables the Search pill for all Gemini models, including gemini-2.5-flash-image that this commit also makes selectable; when that pill is on, the backend forwards tools: [{googleSearch:{}}], which is an unsupported config for the image model and can turn normal image requests into provider 4xx failures. Add model-level gating for Gemini web search (for example excluding -image variants), mirroring the existing Gemini code-execution guard.

Useful? React with 👍 / 👎.

…ng budget `gemini-2.0-flash` / `gemini-2.0-flash-exp` were retired by Google in 2026 (`/v1beta/models/gemini-2.0-flash:streamGenerateContent` returns HTTP 404 "no longer available to new users"), and the picker had nothing past the 2.x family. Verified against the live ListModels catalog: drop the retired ids from `default_models` + allowlist and surface the chat-capable 3.5 / 3.1 / 3 families plus the Nano Banana image trio. Also plumb `enable_thinking` / `reasoning_effort` into Gemini's `generationConfig.thinkingConfig`. Without this, Gemini 3.5 Flash, gemini-pro-latest, and the 3.x previews silently spend the caller's `max_tokens` budget on hidden "thoughts" before emitting any visible answer -- the chat shows a truncated stub like "The capital of" and streams stop. Mapping: - enable_thinking=False / reasoning_effort=none -> thinkingBudget=0 (Flash tier; Pro tier coerces to a small positive budget because the API 400s on 0 with "This model only works in thinking mode") - minimal/low/medium/high -> 512/2048/8192/24576 budget tokens - max/xhigh -> -1 (dynamic) - default (neither knob set) -> thinkingConfig omitted, model decides Frontend `getExternalReasoningCapabilities` now surfaces a `reasoning_effort` picker for every Gemini chat id (Pro tier hides the "none" option; image-tier ids stay knob-less). Adds 6 unit tests covering Flash/Pro effort mapping, the off-toggle coercion on Pro, default omission, and the nano-banana-pro-preview alias routing through the image modalities path. 28 -> 34 tests in `test_gemini_provider.py`, all green; full backend suite still passes (1459/1460; the unrelated test_help_output flake is pre-existing and not in any file this PR touches). Live verification against generativelanguage.googleapis.com on 2026-05-24 with `_stream_gemini` directly: text gemini-3.5-flash single PASS multi PASS text gemini-3.1-pro-preview single PASS multi PASS text gemini-3.1-flash-lite single PASS multi PASS text gemini-3-pro-preview single PASS multi PASS text gemini-3-flash-preview single PASS multi PASS text gemini-2.5-pro single PASS multi PASS text gemini-2.5-flash single PASS multi PASS text gemini-2.5-flash-lite single PASS multi PASS text gemini-flash-latest single PASS multi PASS text gemini-flash-lite-latest single PASS multi PASS text gemini-pro-latest single PASS multi PASS image gemini-2.5-flash-image PASS (1082 KB png returned) image gemini-3.1-flash-image-preview PASS (Nano Banana 2) image gemini-3-pro-image-preview PASS (Nano Banana Pro) tool web_search PASS tool code_execution PASS -> 16/16 e2e through the actual ExternalProviderClient code path.

for more information, see https://pre-commit.ci

danielhanchen · 2026-05-24T14:13:13Z

Pushed c6724dbd after end-to-end testing every Gemini model against the live API through the actual _stream_gemini code path.

What was broken before this commit

gemini-2.0-flash and gemini-2.0-flash-exp were in default_models and the allowlist, but Google retired them in 2026. Picking either returns HTTP 404 ("no longer available to new users") and the chat fails silently.
The picker had no Gemini 3.x/3.5 ids. The live ListModels catalog now serves gemini-3.5-flash (GA May 2026 flagship), gemini-3.1-pro-preview, gemini-3.1-flash-lite (GA), gemini-3.1-flash-image-preview (Nano Banana 2), gemini-3-pro-image-preview (Nano Banana Pro), gemini-3-flash-preview, etc. — the allowlist hid all of them.
enable_thinking and reasoning_effort were never forwarded into Gemini's request body. On every thinking-capable id (gemini-3.5-flash, gemini-pro-latest, gemini-flash-latest, all 3.x previews, gemini-2.5-pro) the model spent the caller's max_tokens budget on hidden thoughts before emitting visible text. With Studio's default knob the chat streamed 'The capital of' and stopped.

What's in this commit

Registry (providers.py)

Drop gemini-2.0-flash / gemini-2.0-flash-exp.
Add gemini-3.5-flash, gemini-3.1-pro-preview, gemini-3.1-flash-lite, gemini-3-pro-preview, gemini-3-flash-preview, gemini-3.1-flash-image-preview (Nano Banana 2), gemini-3-pro-image-preview (Nano Banana Pro).
Allowlist regex now covers 3.5 / 3.1 / 3 / 2.5 + the rolling *-latest aliases + nano-banana-pro-preview.

Backend (external_provider.py)

_stream_gemini accepts enable_thinking and reasoning_effort, translates to generationConfig.thinkingConfig.thinkingBudget:
- enable_thinking=False / reasoning_effort='none' → 0 (Flash tier) or 128 (Pro tier — the API 400s on 0 with "only works in thinking mode")
- minimal/low/medium/high → 512 / 2048 / 8192 / 24576 budget tokens
- max/xhigh → -1 (dynamic)
- default (neither knob set) → thinkingConfig omitted, Google decides
is_image_model now also matches nano-banana (covers the nano-banana-pro-preview alias Google ships).

Frontend (provider-capabilities.ts)

getExternalReasoningCapabilities now resolves per Gemini model: Pro tier exposes low/medium/high/max (no off switch — the API rejects budget=0); Flash tier exposes none/low/medium/high/max; image-tier ids stay knob-less.

Tests (test_gemini_provider.py) — 28 → 34, all green

test_thinking_disabled_sets_budget_zero_on_flash
test_thinking_disabled_pro_tier_uses_small_budget
test_reasoning_effort_levels_map_to_budgets
test_reasoning_effort_none_disables_on_flash
test_thinking_default_omits_thinking_config
test_nano_banana_alias_routes_through_image_modalities

Full backend suite: 1459/1460 (the unrelated test_help_output flake doesn't touch any file in this PR).
Frontend: npx tsc -b --pretty false clean.

E2E verified against live Gemini API on 2026-05-24

Each row drives the actual installed ExternalProviderClient.stream_chat_completion (no mocks), one model per row, single-turn + multi-turn for text, image-bytes inspection for image models:

text   gemini-3.5-flash           single PASS  multi PASS
text   gemini-3.1-pro-preview     single PASS  multi PASS
text   gemini-3.1-flash-lite      single PASS  multi PASS
text   gemini-3-pro-preview       single PASS  multi PASS
text   gemini-3-flash-preview     single PASS  multi PASS
text   gemini-2.5-pro             single PASS  multi PASS
text   gemini-2.5-flash           single PASS  multi PASS
text   gemini-2.5-flash-lite      single PASS  multi PASS
text   gemini-flash-latest        single PASS  multi PASS
text   gemini-flash-lite-latest   single PASS  multi PASS
text   gemini-pro-latest          single PASS  multi PASS
image  gemini-2.5-flash-image          PASS (1082 KB PNG)
image  gemini-3.1-flash-image-preview  PASS (Nano Banana 2, 405 KB JPEG)
image  gemini-3-pro-image-preview      PASS (Nano Banana Pro, 401 KB JPEG)
tool   web_search                      PASS
tool   code_execution                  PASS

16 / 16 passing. OpenAI baseline (gpt-5.5, gpt-5.4, gpt-5.4-mini) also passes the same single-turn + multi-turn flow.

Live /api/providers/models against the patched backend with the regression key now surfaces 17 ids (was 8): the 11 chat-capable + 3 image + gemini-3.1-flash-lite-preview + gemini-3.1-pro-preview-customtools + nano-banana-pro-preview.

Fixes a batch of bugs surfaced by a second-pass review on top of the 3.5/3.1/3 + Nano Banana 2/Pro additions in c6724db. Backend (external_provider.py): - Constructor normalises legacy /v1beta/openai base URLs to /v1beta so Gemini providers saved before the native switch keep working without a manual re-config. - Skip thinkingConfig, googleSearch, and codeExecution on image-tier models (-image / nano-banana). The image responseModalities path is mutually exclusive with text-tool wiring and stale UI state would otherwise 400 the turn. - _PRO_THINKING_PREFIXES now includes gemini-3.5-pro and uses anchored prefix matching (exact id or "<prefix>-...") so the image-tier gemini-3-pro-image-preview cannot accidentally match the pro guard. - Gemini 3 functionCall thoughtSignature is round-tripped through the tool_calls envelope via extra_content.google.thought_signature on emit, and replayed as a sibling of functionCall on the next request. - finishReason swaps STOP -> tool_calls when any functionCall was emitted on the same turn so OAI clients trigger tool execution (matches the OpenAI Chat Completions contract). - usageMetadata.thoughtsTokenCount is rolled into output_tokens and surfaced on output_tokens_details.reasoning_tokens so total_tokens reflects the full billable spend instead of dropping the hidden reasoning slice. Registry (providers.py): - Drop gemini-3-pro-preview from default_models. Google shut it down on 2026-03-09 and auto-redirects to gemini-3.1-pro-preview; we surface the canonical id only. - Add model_id_deny_exact = ("gemini-3-pro-preview",) so the live ListModels fetch does not re-surface the redirect alias. Route schema (models/inference.py): - enable_prompt_caching widened to Optional[Union[bool, str]] so the /v1/chat/completions caller can pass a Gemini cachedContent resource name (e.g. cachedContents/abc123). Without this widening _stream_gemini s string cachedContent passthrough was unreachable from the public route (bool_parsing 422). stream_chat_completion signature mirrors. Frontend (provider-capabilities.ts, chat-page.tsx, chat-adapter.ts): - providerSupportsBuiltinImageGeneration now also recognises nano-banana ids (nano-banana-pro-preview was hidden from the image pill before). - providerSupportsBuiltinWebSearch takes the model id so Gemini image models hide the Search pill (mirrors the backend skip). - providerSupportsBuiltinCodeExecution uses the same isGeminiImageModel guard for nano-banana ids. - GEMINI_THINKING_PRO_PREFIXES gains gemini-3.5-pro; gemini-3-pro tightened to gemini-3-pro-preview to avoid the image-id overlap. - Updated 3 callers of providerSupportsBuiltinWebSearch to thread the selected model id through. Tests (test_gemini_provider.py): 34 -> 42, all green - test_image_models_skip_thinking_config - test_image_models_drop_text_only_tools - test_gemini_35_pro_recognized_as_pro_thinking - test_legacy_openai_base_url_normalized - test_finish_reason_swaps_to_tool_calls_when_function_call_emitted - test_thought_signature_round_trips_into_gemini_function_call - test_thought_signature_emitted_in_tool_call_delta - test_usage_chunk_includes_thoughts_tokens Verification: - Backend pytest 1518/1519 passing (one unrelated Qwen3.5 flash-attn test fails on main as well; nothing in this PR touches that path). - Frontend npx tsc -b clean. - Live e2e 16/16 against generativelanguage.googleapis.com through the patched _stream_gemini code path (all 11 chat models single + multi turn, all 3 image models returned image bytes, web_search and code_execution tools both emit the expected envelope). - Live /api/providers/models against the patched backend surfaces 16 ids (gemini-3-pro-preview correctly filtered via deny_exact).

chatgpt-codex-connector · 2026-05-24T15:02:37Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

for more information, see https://pre-commit.ci

Round-2 reviewer.py flagged a phantom web_search card on image turns (12/12 reviewers), route-layer stripping of tool_calls / tool_call_id / name, an over-narrow image-mode tool guard, and silent safety blocks. This patch fixes all four. Backend (external_provider.py): - web_search_active is now derived from the outbound tools_array (whether googleSearch was actually forwarded), not the raw enabled_tools intent. Image-mode turns dropped the tool above so the inbound stream no longer emits a phantom "search complete" tool_start / tool_end on those turns. - text_tools_allowed now uses is_image_model (covers both `-image` / `nano-banana` picker models AND text models that requested `image_generation` via enabled_tools). Verified against the live Gemini API which rejects both googleSearch and codeExecution alongside responseModalities=["TEXT","IMAGE"] with explicit 400s ("Search as tool is not enabled for this model", "Code execution is not enabled for this model"). - promptFeedback.blockReason is surfaced as a 400 content-filter error chunk instead of returning an empty successful assistant response. The streaming loop closes the response before exiting. Route (routes/inference.py): - _build_external_messages now propagates tool_calls (assistant), tool_call_id, and name (tool result) through every code path (string content, multimodal content, non-vision fallback). Without this Gemini 3 function-call round trips lost their thoughtSignature + tool_call_id at the route boundary, and functionResponse.name arrived empty on the second turn. - Assistant messages with content=None and tool_calls populated are preserved as a synthetic empty-string content turn so the Gemini translator can rebuild the functionCall part. Tests (test_gemini_provider.py): 42 -> 45, all green - test_image_models_suppress_phantom_web_search_card - test_image_generation_tool_drops_text_tools - test_prompt_feedback_block_reason_surfaces_as_error Verification: - Backend pytest 1736 / 1736 (the two pre-existing unrelated fails on main, test_help_output and Qwen3.5 flash-attn pin, are skipped). - Frontend npx tsc -b clean. - Live e2e 16/16 against generativelanguage.googleapis.com: 11 chat models single + multi turn, 3 image models returning image bytes, web_search and code_execution both PASS.

chatgpt-codex-connector · 2026-05-24T15:30:26Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

for more information, see https://pre-commit.ci

chatgpt-codex-connector · 2026-05-24T15:30:49Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Round 3 review follow-ups: Backend (studio/backend/core/inference/external_provider.py): - Close response AND aiter_lines iterator in a finally so normal, prompt-block, and cancellation exits all clean up (eliminates the RuntimeWarning about aclose never being awaited). - Pair the synthetic web_search tool_start with a tool_end on the promptFeedback.blockReason path so the UI does not leave a stuck "searching..." spinner after the error toast. - Preserve native id and thoughtSignature on executableCode and codeExecutionResult tool events under google.native_part, and pair the tool_end on the code-exec id so multi-turn code-execution replays do not lose Gemini-required history. - Carry part-level thoughtSignature on text deltas via delta.extra_content.google.thought_signature and on inline image tool_end via google.thought_signature so Gemini 3 image editing and tool turns round-trip the signature on the next request. - Guess remote image_url MIME from the URL path so PNG / WebP / GIF inputs are not silently relabeled as JPEG. - Roll usageMetadata.toolUsePromptTokenCount into translated input tokens and surface thoughtsTokenCount as completion_tokens_details.reasoning_tokens in _build_usage_chunk. - Only normalize the Google-hosted /v1beta/openai legacy base URL; custom proxies whose paths happen to end in /openai are left untouched. - Forward ChatCompletionRequest.tools and tool_choice through stream_chat_completion into _stream_gemini, translating to tools[].functionDeclarations and toolConfig.functionCallingConfig. Frontend: - chat-adapter: when Gemini image-generation is enabled for the turn, also disable Search and Code so the request, builder, and active pills agree with what the backend actually sends (the backend already strips text tools when image_generation is in enabled_tools). - chat-adapter: consume OpenAI-shape delta.tool_calls chunks so Gemini function-call deltas without text surface as tool-call parts. - shared-composer: disable Search and Code pills while Gemini image mode is active so the UI matches the request. Tests (studio/backend/tests/test_gemini_provider.py): adds coverage for proxy base-url gating, remote image MIME inference, toolUsePromptTokenCount, reasoning_tokens propagation, prompt-block web_search tool_end pairing, native code-exec id/thoughtSignature metadata, inline image thoughtSignature, text-chunk extra_content, OpenAI tools/tool_choice translation, and image-model tool drop.

After a Gemini chat that ran code_execution / image_generation, switching the same thread to a local GGUF model used to forward the synthetic provider-side tool_calls (tagged with `args._server_tool` or carrying a Gemini `args.google.native_part` payload) and the message-level `extra_content` to llama-server. The receiving backend has no tool declaration for those names and no use for Gemini thoughtSignature metadata; in the worst case it can produce an orphan tool_call_id and a confused continuation. Add `_strip_provider_synthetic_tool_history()` and wire it through the two local message builders: - `_openai_messages_for_passthrough` (OAI-compat passthrough) - `_openai_messages_for_gguf_chat` (standard GGUF chat path) Real user-function `tool_calls` and their matching `role="tool"` replies survive unchanged; only synthetic provider-side cards and Gemini-only `extra_content` are stripped. If the synthetic call was the assistant turn's only payload, the now-empty turn is dropped too so llama-server does not reject the request. Adds 2 regression tests: - test_strip_provider_synthetic_tool_history_drops_synthetic_only - test_strip_provider_synthetic_tool_history_drops_empty_assistant 142 existing backend tests still pass.

for more information, see https://pre-commit.ci

For external Gemini image-tier models (gemini-2.5-flash-image, gemini-3.x-image-preview, etc.), the backend unconditionally strips code_execution and strips web_search on older image ids. Search is still allowed on Gemini 3.x Pro/Flash image models, which supportsBuiltinWebSearch already encodes per model. Before this commit the composer pill gates were: searchDisabled = !modelLoaded || !(supportsTools || supportsBuiltinWebSearch) codeDisabled = !modelLoaded || !(supportsTools || supportsBuiltinCodeExecution) || imageModeDisablesCode `supportsTools` here is a local-runtime fallback that becomes true when any tool-capable local model has been loaded in the session. With a local tool-capable runtime active, switching the chat to an external Gemini image-tier model used to leave Search/Code clickable, even though the backend will silently drop the tool on the wire. Detect "external provider is Gemini AND the model is image-tier" (via supportsBuiltinImageGeneration) and gate the two pills strictly on the provider's own builtin support in that case. Non-Gemini paths and non-image Gemini models keep the supportsTools fallback unchanged.

danielhanchen · 2026-05-25T14:24:56Z

Round 22 review pass landed. Twelve parallel codex reviewers, five APPROVE and seven REQUEST_CHANGES, no security findings. Convergent P1s plus the stale-rebase issue are now addressed:

Anthropic / OpenRouter / Kimi did not honour tool_choice={"type":"function", ...} as a hosted-tool opt-out, only tool_choice="none". The Gemini path already did. Mirror the gate symmetrically so a forced-function pin suppresses Anthropic web_search/web_fetch/code_execution, OpenRouter plugins:[{id:"web"}] and its synthetic SSE web_search card, and Kimi _stream_kimi_web_search. Backend tests for all four cases added (feature/gemini-provider commit fe1c5c1).
Synthetic provider-side tool history (Gemini code_execution / image_generation cards tagged with args._server_tool or carrying args.google.native_part) used to be forwarded verbatim into llama-server when a thread was switched from Gemini to a local GGUF backend. Add _strip_provider_synthetic_tool_history() and call it from _openai_messages_for_passthrough plus _openai_messages_for_gguf_chat. Real user-function tool_calls + their matching role="tool" replies survive; only the synthetic Gemini-only cards are dropped (commit 1890d85).
The Search and Code composer pills could be re-enabled by the local supportsTools fallback even on Gemini image-tier models, which the backend will silently strip. Detect "external provider is Gemini AND model is image-tier" and gate strictly on the provider builtin support in that case (commit 9d91db0).
The PR head was stale and would have accidentally reverted three current-main fixes when merged (studio/backend/core/export/export.py MLX save_method="merged_16bit", unsloth/chat_templates.py placeholder validation, and the matching tests/python/test_construct_chat_template_validation.py). Merged origin/main into the branch so those land cleanly with this PR (commit 68fd4d6).

The Gemini 3 Pro thinkingLevel: "medium" finding (Review 12) was checked against both Google AI docs and Vertex AI docs and medium is in fact a supported level for Gemini 3 Pro (low, medium, high). No change made there.

All 142 existing Gemini-provider backend tests still pass; six new regression tests added to lock the new behaviour in.

Round 22 added the gate for Gemini / Anthropic / OpenRouter / Kimi but missed the OpenAI Responses translator. When a caller pinned a user function via `tool_choice={"type":"function","function":{"name":...}}` plus `enabled_tools=["web_search","code_execution","image_generation"]`, the Responses body still attached `{"type":"web_search"}`, `{"type":"shell"}`, and `{"type":"image_generation"}` server tools. The function pin should suppress those for the same privacy + billing reason the other provider paths now do. Compute `_responses_tool_choice_forced_function` next to `_responses_tool_choice_none` and gate each hosted-tool append on `_responses_hosted_builtins_allowed = not none and not forced_function`. The fix has to be applied in TWO places: the initial body builder and `_build_body()` (called by the container-expiry retry path). User function declarations still flow through so the pin has something to target, and the Responses-shape `{type:"function", name:"..."}` `tool_choice` is forwarded unchanged. Adds regression test `test_openai_responses_forced_function_tool_choice_drops_hosted_tools`. All 166 existing backend tests across Gemini + Responses + image-gen + code-exec suites still pass.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: de046986ba

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-25T14:47:55Z

+        if not isinstance(tool_calls, list) or not tool_calls:
+            sanitized_assistant.append(m)


Strip Gemini extra_content from assistant turns without tools

_strip_provider_synthetic_tool_history returns assistant messages unchanged when tool_calls is missing/empty, so Gemini-only extra_content (e.g., text thought signatures) is still forwarded on provider-switch paths. That leaks a non-standard message key into /v1/chat/completions passthrough/local GGUF requests, where strict OpenAI-compatible backends may reject the request or receive irrelevant provider metadata. This regression appears when a thread with Gemini assistant text turns is continued on a non-Gemini backend.

Useful? React with 👍 / 👎.

…+ custom-Gemini model list Three convergent P1s from round 24 review: 1. SSRF: the shared SSRF validator in `tools._validate_and_resolve_host` used a denylist (is_private / loopback / link_local / multicast / reserved / unspecified). Python classifies shared address space (100.64.0.0/10 carrier-grade NAT, plus 240.0.0.0/4, benchmarking ranges, etc.) with `is_private=False` AND `is_global=False`. The new Gemini server-side image fetcher therefore accepts URLs whose hostname resolves to 100.64.0.1 in cloud/VPC deployments. Add `not ip.is_global` as the primary gate -- a single source of truth that covers every current and future non-global range. 2. _strip_provider_synthetic_tool_history previously only stripped message-level `extra_content` when the assistant turn had tool_calls. A plain text Gemini reply carrying `extra_content.google.thought_signature` flowed through to llama-server when the thread was switched to a local GGUF backend. Always strip message-level `extra_content` on assistant turns. 3. routes/providers.list_provider_models applied Gemini's native `model_id_allowlist` regex to every Gemini provider, including custom OAI-compatible bases (LiteLLM, deployment gateways). IDs like `google/gemini-2.5-flash` and team-prefixed deployment aliases got filtered out even though the chat-dispatch path now routes them via the OpenAI-compatible client. Skip registry-level model-id filters when the configured Gemini base_url host is not the canonical `generativelanguage.googleapis.com`, mirroring the chat-dispatch gate. Three regression tests added: - test_validate_and_resolve_host_blocks_shared_address_space - test_strip_provider_synthetic_tool_history_drops_text_only_extra_content - test_gemini_custom_oai_compat_base_skips_native_allowlist

for more information, see https://pre-commit.ci

chatgpt-codex-connector · 2026-05-25T16:43:19Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

chatgpt-codex-connector · 2026-05-25T16:43:29Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

…nto Gemini schema Two convergent reviewer findings on the native Gemini path: 1. _stream_gemini's tool_calls replay loop falls through to a generic functionCall emission whenever it sees an assistant tool_call. Marked server-side builtin cards (web_search / web_fetch tagged with _server_tool or args.google.native_part) hit that fallthrough with no replayable native_part, which produces an outbound functionCall whose name is not a declared user function. The Gemini turn 400s on the undeclared name. Guard the loop to drop those entries instead, while keeping the existing code_execution / image_generation native-part replay branch intact. 2. _sanitize_gemini_schema uses a strict allowlist that drops local $ref / $defs references. Pydantic-generated tool schemas hoist nested object shapes into $defs and reference them via {"$ref": "#/$defs/X"}, so a property like address: {"$ref": "#/$defs/Address"} collapsed to {} on the wire and the model lost the nested fields, types, and required keys. Resolve local #/... pointers against the schema root and inline the referenced subtree, with local siblings overriding the reference (normal JSON Schema composition) and a seen-ref guard for self-referential schemas. Added regression coverage: - test_gemini_native_skips_synthetic_server_builtin_replay - test_function_declarations_inline_local_refs_into_gemini_schema - test_function_declarations_inline_local_refs_in_anyof_and_items - test_function_declarations_self_referential_schema_terminates All 145 Gemini provider tests pass; touched provider regression set (OpenAI Responses, code execution, image generation, Anthropic code execution, Anthropic web_fetch) also 43/43 green.

for more information, see https://pre-commit.ci

…es synthetic-history strip Reviewer round 26 surfaced two convergent asymmetric-fix bugs. 1. _stream_gemini drops a synthetic server-tool tool_call (web_search / web_fetch tagged _server_tool) and also replays code_execution / image_generation tool_calls as Gemini-native executableCode / codeExecutionResult / inlineData parts. The matching role="tool" follow-up was still falling through to the generic functionResponse branch, producing either an orphan functionResponse (synthetic case) or a duplicate response pointing at a name with no functionDeclarations entry (native-part case). Both forms 400 the next Gemini turn. Track skipped + native-replayed tool_call_ids in _gemini_skip_tool_result_ids and short-circuit the role="tool" branch on a match. 2. The Anthropic-compatible local /v1/messages route only called _drop_empty_assistant_sentinels on the OpenAI-translated history, while the sibling /v1/chat/completions and GGUF passthrough builders chain that with _strip_provider_synthetic_tool_history. An Anthropic caller replaying a prior provider-side tool_use therefore forwarded fake builtin tool history straight into local llama-server. Apply the same strip on the Anthropic route after the anthropic_messages_to_openai conversion. Regression coverage added: - test_gemini_native_skips_orphan_function_response_for_dropped_builtin - test_gemini_native_skips_orphan_function_response_for_native_part_replay Gemini suite 147/147; touched provider regression set 43/43.

for more information, see https://pre-commit.ci

…dget for base64 Two convergent reviewer findings on the native Gemini path. 1. _stream_gemini's synthetic-builtin detector at lines 3519-3524 recognizes args.google.native_part as a server-tool marker, but _native_part was only loaded from tc.extra_content.google.native_part. A direct OpenAI-compatible API caller or imported third-party thread round-trips the payload through function.arguments because tool_calls[].extra_content is not in the OpenAI spec. The round-25 guard then saw a synthetic builtin with no _native_part and dropped the entire assistant turn, so the next native Gemini request lost the prior executableCode / inlineData / codeExecutionResult context. Fall back to args.google.native_part when extra_content path is missing, mirroring what the synthetic detector already accepts. 2. _GEMINI_REMOTE_IMAGE_MAX_TOTAL_BYTES capped DECODED bytes at 20MB. Gemini receives images base64-encoded inside JSON, and base64 inflates payload size by ~4/3. With 20MB decoded the actual JSON body is ~26.7MB plus prompt overhead, well over Gemini's ~20MB request limit. Drop the decoded cap to 14MB so realistic multi- image turns stay safely under 20MB encoded. Added regression test test_gemini_native_part_falls_back_to_args_google covering an OpenAI-compat-shaped image_generation tool_call whose native_part lives only in function.arguments. Gemini suite 148/148.

chatgpt-codex-connector · 2026-05-26T03:42:28Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Main moved forward 17 commits during PR review (latest: 953c8bf). Real conflicts in five files; resolved by combining both branches' changes. studio/backend/core/inference/external_provider.py - Add fast_mode (Anthropic Opus 4.6/4.7 speed flag, #5715) to stream_chat_completion and Anthropic-branch call site, alongside existing Gemini tools/tool_choice forwarding. - Add _openai_image_generation_tool() helper (action:"edit" for follow- up image edits, #5712) and use it inside the existing _responses_hosted_builtins_allowed gate so the forced-function / tool_choice="none" suppression added in rounds 21+ still applies. - Keep Anthropic web_fetch gated on _anthropic_hosted_builtins_allowed (round 19+ hosted-builtin gate) while taking main's per-model version selector (web_fetch_20260209 vs _20250910). studio/backend/routes/inference.py - Add `openai = provider_type == "openai"` (used by main's reasoning content forwarding for follow-up image edits). - Keep the round 25/26 Gemini filter chain (_filter_tool_calls drops synthetic server-builtin cards, marks tc_id so the matching role="tool" follow-up gets skipped, extra_content gated to native Gemini host). - Forward fast_mode alongside tools/tool_choice. studio/backend/tests/test_openai_image_generation.py - Combine assertions: both _server_tool: True (PR) and openai_image_generation_call_id (main) are present on the tool_start arguments. studio/frontend/src/features/chat/shared-composer.tsx - Add supportsBuiltinWebFetch declaration (separate Fetch pill from #5742) before the PR's isExternalGemini constant so both the Gemini image-tier gating and the standalone Anthropic Fetch pill compile. studio/frontend/src/features/chat/api/chat-adapter.ts - Add main's normalizeOpenAIReasoningItem, toOpenAIImageEditReferenceMessage, isAnthropicRefusalMessage helpers alongside PR's collectAssistantToolCalls, collectToolResultMessages, SerializedMessage, collectAssistantTextThoughtSignature. - toOpenAIMessages (PR) now also early-returns on isAnthropicRefusalMessage so refused turns get pruned from outbound history. - Add a thin toOpenAIMessage (singular) wrapper for the OpenAI image- edit replay path's flat .map() usage. - Merge per-turn enable flags: keep PR's imageGenerationEnabledForThisTurn, geminiImageModeForThisTurn, codeExecEnabledForThisTurn !geminiImageMode gate; take main's webFetchEnabledForThisTurn (sourced from independent webFetchToolsEnabled pill state). - Outbound build chains main's anthropic_refusal survivingMessages prune, then flatMap(toOpenAIMessages) (PR), then PR's selectedImageEditReference reference message prepend; image-edit unavailable toast from main fires before any of that when the pill is off. - tool_end merge: do main's nextArgs spread first, then PR's Gemini native_part parts concat so both OpenAI image-call ids and Gemini executableCode/codeExecutionResult/inlineData round-trip. - Cumulative + final yields: orderAssistantContent(pinTextThoughtSignature(...)) composes main's tool-vs-text ordering with PR's per-text thoughtSignature pin. Tests: gemini provider 148/148; openai_responses_translation + openai_code_execution + openai_image_generation + anthropic_code_execution + anthropic_web_fetch + external_provider_usage_chunk + providers_api: 50 passed, 42 skipped; main's new anthropic_fast_mode + citations + openai_citation_markers + openai_tool_result_fallbacks suites all 43/43.

chatgpt-codex-connector · 2026-05-26T14:13:15Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

…urn [] + cast image-edit ref Three errors in chat-adapter.ts surfaced by the frontend tsc step after merging main into feat/gemini-provider: 1. The Anthropic refusal early-return used main's but toOpenAIMessages returns SerializedMessage[]; flip to . 2. Restore -- the line was lost when removing main's conflict block from the function body. 3. selectedImageEditReference splice was inserting OpenAIChatMessage into a SerializedMessage[] array; the shapes differ on tool_calls.id nullability. Cast the reference message through unknown -- it carries no tool_calls, so the runtime payload is structurally compatible. Reproduced locally with `tsc -b --pretty false` (now passes). Build also failing in the in-repo `npm run build` step on PR CI; this commit unblocks all 12 failing UI/API workflows.

chatgpt-codex-connector · 2026-05-26T14:28:17Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Compress multi-line explanatory comments in the Gemini translator and the chat adapter without changing any behaviour. All 148 Gemini provider tests still pass; tsc --noEmit clean.

chatgpt-codex-connector · 2026-05-27T05:19:19Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

chatgpt-codex-connector · 2026-05-27T06:48:21Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

danielhanchen · 2026-05-27T11:36:50Z

End-to-end Playwright verification of this PR against the live Gemini API (generativelanguage.googleapis.com/v1beta) on a fresh PR-head install (feat/gemini-provider @ b19b662e, unsloth 2026.5.8, Studio bound to loopback). Five flows, each driving the real Studio chat UI:

Feature	Model	Status	Detail
Text + 4-turn chat (JP → ES → pirate → 5-word summary)	`gemini-2.5-flash`	✅	4 bubbles, per-turn 0.6 – 4.9 s
Image generation (Nano Banana)	`gemini-2.5-flash-image`	✅	1.8 MB PNG
User-defined function tool call	`gemini-2.5-flash`	✅	`get_weather({"city":"Sydney, Australia"})` returned in streamed `tool_calls[0]`
Web Search composer pill (hosted `googleSearch`)	`gemini-2.5-flash`	✅	2 `coinmarketcap.com` citation chips
Code Execution composer pill (hosted `codeExecution`)	`gemini-2.5-flash`	✅	Returned `24133` (sum of first 100 primes)

Backend log proves every call routed through this PR's _stream_gemini translator (native :streamGenerateContent?alt=sse, no OpenAI-compat shim):

Proxying Gemini streamGenerateContent ... gemini-2.5-flash       tools=[]                       image=False
Proxying Gemini streamGenerateContent ... gemini-2.5-flash-image tools=[]                       image=True
Proxying Gemini streamGenerateContent ... gemini-2.5-flash       tools=['functionDeclarations'] image=False
Proxying Gemini streamGenerateContent ... gemini-2.5-flash       tools=['googleSearch']         image=False
Proxying Gemini streamGenerateContent ... gemini-2.5-flash       tools=['codeExecution']        image=False

Recordings

Multi-turn text on gemini-2.5-flash

Nano Banana image generation (gemini-2.5-flash-image)

The decoded PNG round-tripped from the data URL:

Web Search pill → hosted googleSearch

Code Execution pill → hosted codeExecution

Full 41-second walkthrough (all four flows back-to-back): full_workflow.mp4

API keys never appear on-screen in any frame: the Gemini key is injected into Studio via localStorage (unsloth_chat_external_provider_keys) through Playwright's add_init_script before the first page load, and the bootstrap JWT is set the same way; no Settings → Connections click is recorded.

danielhanchen · 2026-05-27T12:10:42Z

Follow-up — verified the Think pill is a per-family effort selector for Gemini (same UX as GPT/Claude), and that Reasoning + Search + Code compose in a single request.

Per-model effort menu (live probe)

button[aria-label^="Reasoning effort"] with aria-haspopup="menu", rendered for every Gemini chat model; image-tier ids correctly hide it.

Model	Reasoning pill	Default	Menu levels
`gemini-2.5-flash`	✅	Medium	`None` / `Low` / `Medium` / `High` / `Max`
`gemini-2.5-flash-lite`	✅	Medium	`None` / `Minimal` / `Low` / `Medium` / `High` / `Max`
`gemini-2.5-pro`	✅	Medium	`Low` / `Medium` / `High` / `Max` (no off — API rejects `thinkingBudget=0` on Pro)
`gemini-3-flash-preview`	✅	Medium	`Minimal` / `Low` / `Medium` / `High`
`gemini-3.1-pro-preview`	✅	Medium	`Low` / `Medium` / `High` (no `Minimal` — API rejects on Pro tier)
`gemini-2.5-flash-image`	—	—	hidden (image-tier; correct)

Menu screenshots:

2.5 Flash	3 Flash Preview	3.1 Pro Preview

Backend mapping (provider-capabilities.ts:790-866 → external_provider.py:3846-3928):

Gemini 3.x family → thinkingConfig.thinkingLevel (string: low/medium/high, minimal on Flash only). Per ai.google.dev/gemini-api/docs/thinking.
Gemini 2.5 Flash + 2.5 Flash-Lite → thinkingConfig.thinkingBudget (int: 0 = off, mapped from "none"; 512 floor mapped from "minimal"; tier-specific upper bound mapped from "max").
Gemini 2.5 Pro → also thinkingBudget but Studio hides the off switch because the API rejects 0 ("only works in thinking mode").

Reasoning + Search + Code in a single request — PASS

Single chat on gemini-2.5-flash with Think: High plus the Search and Code composer pills both active. Prompt asked the model to look up the live USD→EUR rate then compute 1234.56 USD in EUR via code execution.

What landed in chat:

Header 3 tool calls — Gemini chained googleSearch (rate lookup) → googleSearch (cross-check) → codeExecution (Python multiply).
Cited the rate inline: "1 USD = 0.8593 EUR (Source: Xe.com, as of 01:11 UTC)" with xe.com and google.com citation chips.
Final answer: 1060.86 EUR for 1234.56 USD (correct: 1234.56 × 0.8593 ≈ 1060.85).
Composer pills all visible as on: Think: High, Search, Code.

Backend log shows the three tools entered Google's API together on one streamed request:

Proxying Gemini streamGenerateContent ... gemini-2.5-flash  tools=['googleSearch', 'codeExecution']  image=False

(The reasoning effort isn't logged because thinkingLevel/thinkingBudget is in generationConfig, not in the tools array; the response did surface a thoughts part as 3 tool calls chain-of-action.)

TL;DR

Yes — for Gemini models the Think pill is a clickable effort selector like the GPT/Claude pill, with the level menu correctly varying per family (image-tier ids hide it), and reasoning composes cleanly with the hosted Search and Code tools in a single streamed request.

Conflicts came from unslothai#5720 (native Gemini provider). All resolved keeping both branches' functionality: - provider-capabilities.ts: gemini bucket now uses unslothai#5720's narrow capability shape (temperature/topP/topK/presencePenalty true) plus the 27 extended-sampler fields from this PR (all false on gemini since Google's API doesn't accept them). stop=true added so the new generationConfig.stopSequences forwarding lights up the UI. - chat-adapter.ts: kept all 27-field forwarding from this PR; used the tighter comments from main. - routes/inference.py: pass both this PR's sampling kwargs (frequency_penalty/seed/stop/service_tier/parallel_tool_calls) and main's tools/tool_choice through to stream_chat_completion. - external_provider.py: same. Every dispatcher (anthropic/openai/ gemini) now takes both branches' new args. Added stop forwarding to _stream_gemini as generationConfig.stopSequences (capped at 5 per native API docs); updated test_gemini_stop_sequences_capped_to_5 to assert the native shape instead of the OAI-compat shape. 256/256 backend tests pass (test_sampling_params_routing 65 + anthropic/openai/gemini integration suites 191); frontend type-check plus vite build clean.

danielhanchen requested a review from rolandtannous as a code owner May 22, 2026 16:36

chatgpt-codex-connector Bot reviewed May 22, 2026

View reviewed changes

danielhanchen and others added 10 commits May 23, 2026 14:00

[pre-commit.ci] auto fixes from pre-commit.com hooks

9598c15

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

780d2e3

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

ca5777a

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

e41404d

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

696728d

for more information, see https://pre-commit.ci

danielhanchen force-pushed the feat/gemini-provider branch from 2b05fa7 to 696728d Compare May 23, 2026 14:00

chatgpt-codex-connector Bot reviewed May 23, 2026

View reviewed changes

danielhanchen and others added 2 commits May 24, 2026 14:12

[pre-commit.ci] auto fixes from pre-commit.com hooks

9732765

for more information, see https://pre-commit.ci

pre-commit-ci Bot and others added 2 commits May 24, 2026 15:02

[pre-commit.ci] auto fixes from pre-commit.com hooks

0785ceb

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

80ac142

for more information, see https://pre-commit.ci

danielhanchen added 2 commits May 24, 2026 15:56

Merge remote-tracking branch 'origin/main' into pr-5720

d2f972e

danielhanchen and others added 3 commits May 25, 2026 14:23

[pre-commit.ci] auto fixes from pre-commit.com hooks

38d5daf

for more information, see https://pre-commit.ci

chatgpt-codex-connector Bot reviewed May 25, 2026

View reviewed changes

danielhanchen and others added 2 commits May 25, 2026 16:42

[pre-commit.ci] auto fixes from pre-commit.com hooks

235b7ee

for more information, see https://pre-commit.ci

danielhanchen and others added 5 commits May 25, 2026 20:46

[pre-commit.ci] auto fixes from pre-commit.com hooks

31c9711

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

d0cf4b7

for more information, see https://pre-commit.ci

Tighten verbose comments in external_provider.py + chat-adapter.ts

0e73786

Compress multi-line explanatory comments in the Gemini translator and the chat adapter without changing any behaviour. All 148 Gemini provider tests still pass; tsc --noEmit clean.

Merge branch 'main' into feat/gemini-provider

b19b662

danielhanchen merged commit ab48465 into main May 27, 2026
35 checks passed

danielhanchen deleted the feat/gemini-provider branch May 27, 2026 13:01

danielhanchen mentioned this pull request May 27, 2026

Studio: add Codex SDK as a chat provider with parallel-calls fan-out #5724

Open

4 tasks

		if isinstance(enable_prompt_caching, str) and enable_prompt_caching:
		body["cachedContent"] = enable_prompt_caching

		if not isinstance(tool_calls, list) or not tool_calls:
		sanitized_assistant.append(m)

Uh oh!

Conversation

danielhanchen commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

References

Uh oh!

gemini-code-assist Bot commented May 22, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

danielhanchen commented May 24, 2026

What was broken before this commit

What's in this commit

E2E verified against live Gemini API on 2026-05-24

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 24, 2026

Uh oh!

danielhanchen commented May 25, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot commented May 25, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 25, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 26, 2026

Uh oh!

danielhanchen commented May 22, 2026 •

edited

Loading