Skip to content

Studio: add Gemini provider with web_search, code_execution, prompt caching, and Nano Banana image generation#5720

Merged
danielhanchen merged 75 commits into
mainfrom
feat/gemini-provider
May 27, 2026
Merged

Studio: add Gemini provider with web_search, code_execution, prompt caching, and Nano Banana image generation#5720
danielhanchen merged 75 commits into
mainfrom
feat/gemini-provider

Conversation

@danielhanchen
Copy link
Copy Markdown
Member

@danielhanchen danielhanchen commented May 22, 2026

Summary

Adds Google Gemini as a first-class chat provider in Studio.

  • New native translator _stream_gemini in studio/backend/core/inference/external_provider.py that converts OpenAI Chat Completions request/response to and from Gemini's contents / parts REST shape on POST /v1beta/models/{model}:streamGenerateContent?alt=sse.
  • Provider registry uses the native base URL https://generativelanguage.googleapis.com/v1beta with x-goog-api-key authentication. Legacy /v1beta/openai Gemini providers saved by older Studio builds are auto-rewritten to the native path (only when the host is generativelanguage.googleapis.com; custom proxy paths ending in /openai are left untouched).
  • Surfaces the current Gemini lineup: 3.5 Flash, 3.5 Pro, 3.1 Pro Preview, 3.1 Flash Lite, 3-pro / 3-flash previews plus the *-latest aliases, and the Nano Banana family (2.5-flash-image, 3.1-flash-image-preview = "Nano Banana 2", 3-pro-image-preview = "Nano Banana Pro").
  • Thinking control split by family per the Gemini docs:
    • Gemini 3.x family uses thinkingConfig.thinkingLevel (MINIMAL / LOW / MEDIUM / HIGH). Pro tier rejects MINIMAL, so "off" coerces to LOW; Flash tier coerces "off" to MINIMAL (Gemini 3 cannot fully disable thinking).
    • Gemini 2.5 keeps the integer thinkingBudget ladder. 2.5 Pro coerces 0 to a small positive budget because the API 400s with "only works in thinking mode". 2.5 Flash-Lite uses 512 as its lowest positive budget per the API floor.
    • Image models skip thinking config entirely.
  • Server tools wired natively:
    • tools: [{googleSearch: {}}] for web search, with grounding citations surfaced through the same tool_start / tool_end + sources envelope OpenAI / Anthropic use.
    • tools: [{codeExecution: {}}] for sandboxed Python; executableCode and codeExecutionResult parts emit as code_execution tool events with the native id and thoughtSignature stowed on google.native_part for follow-up replay.
    • tools: [{functionDeclarations: [...]}] for caller-supplied OpenAI-shape tools + tool_choice translated to toolConfig.functionCallingConfig.mode / allowedFunctionNames.
    • Google Search grounding is allowed on the documented Gemini 3 image models (gemini-3-pro-image-preview, gemini-3.1-flash-image-preview, nano-banana-pro). Older image ids continue to drop text tools (Search as tool is not enabled for this model).
  • Multimodal:
    • Inline image_url parts on user messages map to inlineData (base64) or fileData with MIME guessed from the URL path so remote PNG / WebP / GIF inputs are not relabeled as JPEG.
    • Gemini 3 image-editing replay groundwork: text deltas and inline image tool_end events carry extra_content.google.thought_signature so future multi-turn editing can echo the signature back per the Gemini contract.
    • Nano Banana inline image bytes surface as the existing image_b64 / image_mime tool_end envelope so the chat UI renders them inline with no extra plumbing.
  • functionCall parts translate to OpenAI-shape delta.tool_calls, distinct indices per call so parallel turns reassemble cleanly. finishReason="STOP" is rewritten to tool_calls when any function call was emitted so OAI clients trigger tool execution. The chat-adapter consumes these tool_calls deltas so user-supplied function calls render even when no text is emitted.
  • usageMetadata translation: promptTokenCount + toolUsePromptTokenCount -> input tokens; candidatesTokenCount + thoughtsTokenCount -> output tokens; thoughtsTokenCount surfaces as completion_tokens_details.reasoning_tokens; cachedContentTokenCount -> cached input detail.
  • Prompt caching: when enable_prompt_caching is a string, it forwards as cachedContent so callers that create CachedContent resources out of band can reference them by name. Boolean is silently ignored on Gemini (no implicit cache create).
  • Prompt-level safety blocks (promptFeedback.blockReason) surface as a content_filter error event. If a synthetic web_search tool_start was already emitted, a matching tool_end is paired before the error so the UI does not leave a "searching..." spinner stuck on screen.
  • Stream cleanup uses a single try / finally that always closes both the response and the manual aiter_lines() iterator on normal, prompt-block, and cancellation exits (no more RuntimeWarning: coroutine method 'aclose' of 'Response.aiter_lines' was never awaited).
  • Composer pills mirror the backend image-mode gate: when Gemini image generation is active on a text model, Search and Code pills are disabled so the request, builder, and active chips agree with what the backend actually sends.

Test plan

  • PYTHONPATH=studio/backend python -m pytest studio/backend/tests/test_gemini_provider.py -q (62 tests, all green)
  • Studio backend regression: 1828 passed, 46 skipped (the 2 pre-existing failures - test_help_output, qwen3.5 flash-attn tilelang - are unrelated to this PR)
  • cd studio/frontend && npx tsc -b --pretty false clean
  • Live e2e against https://generativelanguage.googleapis.com/v1beta covering single-turn, multi-turn, googleSearch grounding, codeExecution, image generation, function calling, thinkingLevel (Gemini 3) and thinkingBudget (Gemini 2.5) variants
  • Frontend Sources panel rendering for Gemini groundingMetadata citations: follow-up

References

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 68379b6534

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +370 to +378
async for line in self._stream_gemini(
messages,
model,
temperature,
top_p,
max_tokens,
top_k,
enabled_tools,
enable_prompt_caching,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Forward Gemini presence penalty from chat request

presence_penalty is accepted on the external chat request path and Gemini is marked as supporting it in provider-capabilities.ts, but the Gemini dispatch call drops that argument and _stream_gemini never writes generationConfig.presencePenalty. As a result, changing the Presence Penalty control has no effect for Gemini runs, which makes tuning and reproducibility misleading for users who expect that knob to be applied.

Useful? React with 👍 / 👎.

Comment on lines +2718 to +2730
tool_name = msg.get("name") or msg.get("tool_name") or ""
response_payload: Any
if isinstance(content, str):
try:
response_payload = _json.loads(content)
except Exception:
response_payload = {"result": content}
else:
response_payload = content or {}
parts = [
{
"functionResponse": {
"name": tool_name,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Derive functionResponse name when tool message omits name

This branch builds Gemini functionResponse using msg.name/msg.tool_name and falls back to an empty string, but tool-result messages are commonly keyed by tool_call_id without a required name field. In that case we emit functionResponse.name: "", which can break function-calling follow-up turns against Gemini because the tool result is no longer tied to a valid function name.

Useful? React with 👍 / 👎.

Comment on lines +2799 to +2800
if isinstance(enable_prompt_caching, str) and enable_prompt_caching:
body["cachedContent"] = enable_prompt_caching
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Don’t advertise Gemini caching without wiring a cache id

Gemini prompt caching is now exposed in the UI, but this backend path only sets cachedContent when enable_prompt_caching is a non-empty string. The normal external request model and chat adapter send enable_prompt_caching as a boolean, so toggling caching for Gemini never changes the outbound request and silently does nothing.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cd582df889

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

# OpenAI Chat Completions on this base. Requests/responses are
# translated in `_stream_gemini` in external_provider.py.
# API reference: https://ai.google.dev/gemini-api/docs
"base_url": "https://generativelanguage.googleapis.com/v1beta",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Parse Gemini native model catalog before switching base URL

Changing Gemini to the native https://generativelanguage.googleapis.com/v1beta endpoint routes /models to Gemini’s native schema, but our model discovery path still only consumes OpenAI-style data[].id in ExternalProviderClient.list_models(). That leaves Gemini model fetches empty in /api/providers/models (and provider test counts at 0), so users lose dynamic model discovery and rely only on hardcoded defaults. Add native parsing (models[] with name/baseModelId) or mark Gemini as curated-only to avoid silently empty catalogs.

Useful? React with 👍 / 👎.

Comment on lines +2720 to +2723
"functionCall": {
"name": fn_name,
"args": args,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve Gemini function call IDs across tool round-trips

The Gemini translation strips call identifiers when converting OpenAI tool calls into functionCall parts, so follow-up role=tool responses can only be matched by function name. This becomes ambiguous when one assistant turn issues multiple calls to the same function (different args but same name), which can mis-associate tool outputs in subsequent model context. Forward the OpenAI tool-call id into Gemini functionCall.id and mirror tool_call_id into functionResponse.id.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a2ed444404

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

"delta": {
"tool_calls": [
{
"index": 0,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Emit distinct tool_call indices for each Gemini function call

Do not hardcode tool_calls[0].index to 0 for every emitted Gemini functionCall. When one assistant turn contains multiple function calls, all streamed chunks collapse into the same slot in consumers that reassemble by index (including our own delta.tool_calls assemblers), so only one call survives or arguments get merged incorrectly. This breaks parallel/multi-tool execution for Gemini even when unique ids are present; each call needs a stable, distinct index per turn.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2b05fa73ba

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +3020 to +3024
fc_name = fc.get("name") or ""
fc_args = fc.get("args") or {}
fc_id = (
fc.get("id")
or f"call_{fc_name}_{time.time_ns()}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Forward thoughtSignature in Gemini function calls

When Gemini thinking models emit tool calls, the functionCall part can include thoughtSignature that must be replayed on the next turn with the corresponding tool result; this translation only reads name/args/id and drops the signature, so the follow-up request cannot reconstruct a valid functionCall history and can fail with Gemini 400 errors on tool round-trips. This impacts multi-turn function-calling flows on Gemini 2.5/3-class models even when IDs are preserved.

Useful? React with 👍 / 👎.

danielhanchen and others added 10 commits May 23, 2026 14:00
…aching, and Nano Banana image generation

Wires Google's native Gemini API into Studio's external-provider stack
so users can pick gemini-2.5-pro / gemini-2.5-flash / gemini-2.5-flash-image
(Nano Banana) alongside the existing OpenAI / Anthropic / OpenRouter
providers. Gemini does not speak OpenAI Chat Completions on its primary
endpoint; the new `_stream_gemini` async generator translates between
the two shapes the same way `_stream_anthropic` handles the Messages API.

Backend:
- New `_stream_gemini` translator in external_provider.py. Converts
  OpenAI messages -> Gemini `contents` + `systemInstruction`; maps
  generationConfig (temperature / topP / topK / maxOutputTokens);
  forwards `tools: [{googleSearch: {}}]` for web_search and
  `{codeExecution: {}}` for code_execution; passes `cachedContent`
  through for prompt caching; sets `responseModalities=[TEXT, IMAGE]`
  for Nano Banana image generation.
- Translates streamed `GenerateContentResponse` SSE frames back into
  OpenAI chat.completion.chunk frames (text deltas, function_call ->
  tool_calls deltas, inlineData -> image_b64 tool_end envelope, usage
  chunk before [DONE]).
- Registry entry switched to native base URL
  `https://generativelanguage.googleapis.com/v1beta` with
  `openai_compatible: False` and the `x-goog-api-key` auth header.
  Model lineup curated to current 2.5 / 2.0 family + Nano Banana.

Frontend:
- Provider-capability matrix: Gemini supports temperature, top_p, top_k,
  presence_penalty (matches generationConfig); min_p / repetition_penalty
  hidden because the API does not accept them.
- `providerSupportsBuiltinWebSearch` / `providerSupportsBuiltinCodeExecution`
  / `providerSupportsBuiltinImageGeneration` extended for Gemini.
- Prompt caching toggle now also lit on Gemini.

Tests:
- 21 new tests in `test_gemini_provider.py` using httpx.MockTransport.
  Cover request body shape conversion, URL/header wiring, web_search
  forwarded as googleSearch, function-call translation both directions,
  prompt caching passthrough, image generation emitting image_b64,
  grounded-search citations -> tool_end, finish_reason mapping, and
  vision data URL -> inlineData translation.
…from tool_call_id

Two follow-up fixes for the Gemini provider:

  * Thread presence_penalty into _stream_gemini and set
    generationConfig.presencePenalty when non-zero. The OpenAI-side
    capability matrix already exposes the slider for Gemini, so the
    value was being collected and silently dropped on the way out.

  * When an OpenAI role=tool message omits 'name' and only carries
    'tool_call_id', recover the function name from the matching
    functionCall on the prior assistant turn. Gemini 400s on an empty
    functionResponse name.
…ents

The Gemini stream parser only handled text/functionCall/inlineData
parts, so when the user toggled the Code pill on a Gemini model the
sandbox output (executableCode + codeExecutionResult parts) was
dropped on the floor while adjacent text reached the UI. Reviewers
flagged this as the headline feature being silently broken.

Translate both parts into the existing code_execution tool envelope
that CodeExecutionToolUI already consumes for OpenAI / Anthropic:

  * executableCode  -> tool_start with kind=code_execution and the
    source code under arguments.code. We mint a tool_call_id and
    stash it so the matching result block can pair to it.
  * codeExecutionResult -> tool_end on that id with the stdout under
    result. Non-OK outcomes (OUTCOME_FAILED / OUTCOME_DEADLINE_EXCEEDED)
    are prefixed onto the text so the failure is visible.
…che claim

Three follow-ups to the Gemini provider PR after the codex pass:

  * list_models() now translates Gemini's native /v1beta/models
    payload ({models[{name, baseModelId, displayName,
    supportedGenerationMethods}]}) into the OpenAI-compatible shape
    Studio expects. Without this the picker stayed empty for Gemini
    and fell back to hardcoded defaults. Embedding-only models are
    filtered out.

  * Forward the OpenAI tool_call id into Gemini's functionCall.id
    and mirror it onto functionResponse.id. Two parallel calls to
    the same function name can now be paired unambiguously on the
    follow-up turn.

  * Drop Gemini from the prompt-caching capability set. The wire
    flow requires a separate cachedContents POST first and the
    boolean Studio emits today is a no-op; the toggle should not
    advertise a feature it cannot apply. Leaves a pointer to the
    docs for the eventual two-step orchestration.
Codex flagged that the Gemini stream parser hardcoded
tool_calls[0].index to 0 on every emitted functionCall. OpenAI
reassemblers key tool_calls by index when joining deltas, so two
parallel function calls in one assistant turn collapsed onto a
single slot and the second call's arguments overwrote the first.

Track the running count via len(emitted_function_call_ids) - 1
and emit it as the per-call index. The dedupe guard above (skip
when fc_id already in the set) means the index is monotonic and
stable for the lifetime of the stream. Regression test asserts
[0, 1] across two parallel calls in one candidate parts list.
@danielhanchen danielhanchen force-pushed the feat/gemini-provider branch from 2b05fa7 to 696728d Compare May 23, 2026 14:00
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 696728d53f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// every 2.x model. Backend translation lives in `_stream_gemini`,
// citations surface on the response's `groundingMetadata`. See
// https://ai.google.dev/gemini-api/docs/grounding.
providerType === "gemini"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Gate Gemini web search by model capability

providerSupportsBuiltinWebSearch now enables the Search pill for all Gemini models, including gemini-2.5-flash-image that this commit also makes selectable; when that pill is on, the backend forwards tools: [{googleSearch:{}}], which is an unsupported config for the image model and can turn normal image requests into provider 4xx failures. Add model-level gating for Gemini web search (for example excluding -image variants), mirroring the existing Gemini code-execution guard.

Useful? React with 👍 / 👎.

danielhanchen and others added 2 commits May 24, 2026 14:12
…ng budget

`gemini-2.0-flash` / `gemini-2.0-flash-exp` were retired by Google in 2026
(`/v1beta/models/gemini-2.0-flash:streamGenerateContent` returns HTTP 404
"no longer available to new users"), and the picker had nothing past the
2.x family. Verified against the live ListModels catalog: drop the retired
ids from `default_models` + allowlist and surface the chat-capable
3.5 / 3.1 / 3 families plus the Nano Banana image trio.

Also plumb `enable_thinking` / `reasoning_effort` into Gemini's
`generationConfig.thinkingConfig`. Without this, Gemini 3.5 Flash,
gemini-pro-latest, and the 3.x previews silently spend the caller's
`max_tokens` budget on hidden "thoughts" before emitting any visible
answer -- the chat shows a truncated stub like "The capital of" and
streams stop. Mapping:
  - enable_thinking=False / reasoning_effort=none -> thinkingBudget=0
    (Flash tier; Pro tier coerces to a small positive budget because
    the API 400s on 0 with "This model only works in thinking mode")
  - minimal/low/medium/high -> 512/2048/8192/24576 budget tokens
  - max/xhigh -> -1 (dynamic)
  - default (neither knob set) -> thinkingConfig omitted, model decides

Frontend `getExternalReasoningCapabilities` now surfaces a
`reasoning_effort` picker for every Gemini chat id (Pro tier hides the
"none" option; image-tier ids stay knob-less). Adds 6 unit tests
covering Flash/Pro effort mapping, the off-toggle coercion on Pro,
default omission, and the nano-banana-pro-preview alias routing
through the image modalities path. 28 -> 34 tests in
`test_gemini_provider.py`, all green; full backend suite still passes
(1459/1460; the unrelated test_help_output flake is pre-existing and
not in any file this PR touches).

Live verification against generativelanguage.googleapis.com on
2026-05-24 with `_stream_gemini` directly:
  text   gemini-3.5-flash           single PASS  multi PASS
  text   gemini-3.1-pro-preview     single PASS  multi PASS
  text   gemini-3.1-flash-lite      single PASS  multi PASS
  text   gemini-3-pro-preview       single PASS  multi PASS
  text   gemini-3-flash-preview     single PASS  multi PASS
  text   gemini-2.5-pro             single PASS  multi PASS
  text   gemini-2.5-flash           single PASS  multi PASS
  text   gemini-2.5-flash-lite      single PASS  multi PASS
  text   gemini-flash-latest        single PASS  multi PASS
  text   gemini-flash-lite-latest   single PASS  multi PASS
  text   gemini-pro-latest          single PASS  multi PASS
  image  gemini-2.5-flash-image     PASS (1082 KB png returned)
  image  gemini-3.1-flash-image-preview  PASS (Nano Banana 2)
  image  gemini-3-pro-image-preview      PASS (Nano Banana Pro)
  tool   web_search                 PASS
  tool   code_execution             PASS
  -> 16/16 e2e through the actual ExternalProviderClient code path.
@danielhanchen
Copy link
Copy Markdown
Member Author

Pushed c6724dbd after end-to-end testing every Gemini model against the live API through the actual _stream_gemini code path.

What was broken before this commit

  1. gemini-2.0-flash and gemini-2.0-flash-exp were in default_models and the allowlist, but Google retired them in 2026. Picking either returns HTTP 404 ("no longer available to new users") and the chat fails silently.
  2. The picker had no Gemini 3.x/3.5 ids. The live ListModels catalog now serves gemini-3.5-flash (GA May 2026 flagship), gemini-3.1-pro-preview, gemini-3.1-flash-lite (GA), gemini-3.1-flash-image-preview (Nano Banana 2), gemini-3-pro-image-preview (Nano Banana Pro), gemini-3-flash-preview, etc. — the allowlist hid all of them.
  3. enable_thinking and reasoning_effort were never forwarded into Gemini's request body. On every thinking-capable id (gemini-3.5-flash, gemini-pro-latest, gemini-flash-latest, all 3.x previews, gemini-2.5-pro) the model spent the caller's max_tokens budget on hidden thoughts before emitting visible text. With Studio's default knob the chat streamed 'The capital of' and stopped.

What's in this commit

Registry (providers.py)

  • Drop gemini-2.0-flash / gemini-2.0-flash-exp.
  • Add gemini-3.5-flash, gemini-3.1-pro-preview, gemini-3.1-flash-lite, gemini-3-pro-preview, gemini-3-flash-preview, gemini-3.1-flash-image-preview (Nano Banana 2), gemini-3-pro-image-preview (Nano Banana Pro).
  • Allowlist regex now covers 3.5 / 3.1 / 3 / 2.5 + the rolling *-latest aliases + nano-banana-pro-preview.

Backend (external_provider.py)

  • _stream_gemini accepts enable_thinking and reasoning_effort, translates to generationConfig.thinkingConfig.thinkingBudget:
    • enable_thinking=False / reasoning_effort='none' → 0 (Flash tier) or 128 (Pro tier — the API 400s on 0 with "only works in thinking mode")
    • minimal/low/medium/high → 512 / 2048 / 8192 / 24576 budget tokens
    • max/xhigh → -1 (dynamic)
    • default (neither knob set) → thinkingConfig omitted, Google decides
  • is_image_model now also matches nano-banana (covers the nano-banana-pro-preview alias Google ships).

Frontend (provider-capabilities.ts)

  • getExternalReasoningCapabilities now resolves per Gemini model: Pro tier exposes low/medium/high/max (no off switch — the API rejects budget=0); Flash tier exposes none/low/medium/high/max; image-tier ids stay knob-less.

Tests (test_gemini_provider.py) — 28 → 34, all green

  • test_thinking_disabled_sets_budget_zero_on_flash
  • test_thinking_disabled_pro_tier_uses_small_budget
  • test_reasoning_effort_levels_map_to_budgets
  • test_reasoning_effort_none_disables_on_flash
  • test_thinking_default_omits_thinking_config
  • test_nano_banana_alias_routes_through_image_modalities

Full backend suite: 1459/1460 (the unrelated test_help_output flake doesn't touch any file in this PR).
Frontend: npx tsc -b --pretty false clean.

E2E verified against live Gemini API on 2026-05-24

Each row drives the actual installed ExternalProviderClient.stream_chat_completion (no mocks), one model per row, single-turn + multi-turn for text, image-bytes inspection for image models:

text   gemini-3.5-flash           single PASS  multi PASS
text   gemini-3.1-pro-preview     single PASS  multi PASS
text   gemini-3.1-flash-lite      single PASS  multi PASS
text   gemini-3-pro-preview       single PASS  multi PASS
text   gemini-3-flash-preview     single PASS  multi PASS
text   gemini-2.5-pro             single PASS  multi PASS
text   gemini-2.5-flash           single PASS  multi PASS
text   gemini-2.5-flash-lite      single PASS  multi PASS
text   gemini-flash-latest        single PASS  multi PASS
text   gemini-flash-lite-latest   single PASS  multi PASS
text   gemini-pro-latest          single PASS  multi PASS
image  gemini-2.5-flash-image          PASS (1082 KB PNG)
image  gemini-3.1-flash-image-preview  PASS (Nano Banana 2, 405 KB JPEG)
image  gemini-3-pro-image-preview      PASS (Nano Banana Pro, 401 KB JPEG)
tool   web_search                      PASS
tool   code_execution                  PASS

16 / 16 passing. OpenAI baseline (gpt-5.5, gpt-5.4, gpt-5.4-mini) also passes the same single-turn + multi-turn flow.

Live /api/providers/models against the patched backend with the regression key now surfaces 17 ids (was 8): the 11 chat-capable + 3 image + gemini-3.1-flash-lite-preview + gemini-3.1-pro-preview-customtools + nano-banana-pro-preview.

Fixes a batch of bugs surfaced by a second-pass review on top of the
3.5/3.1/3 + Nano Banana 2/Pro additions in c6724db.

Backend (external_provider.py):
- Constructor normalises legacy /v1beta/openai base URLs to /v1beta so
  Gemini providers saved before the native switch keep working without
  a manual re-config.
- Skip thinkingConfig, googleSearch, and codeExecution on image-tier
  models (-image / nano-banana). The image responseModalities path is
  mutually exclusive with text-tool wiring and stale UI state would
  otherwise 400 the turn.
- _PRO_THINKING_PREFIXES now includes gemini-3.5-pro and uses anchored
  prefix matching (exact id or "<prefix>-...") so the image-tier
  gemini-3-pro-image-preview cannot accidentally match the pro guard.
- Gemini 3 functionCall thoughtSignature is round-tripped through the
  tool_calls envelope via extra_content.google.thought_signature on
  emit, and replayed as a sibling of functionCall on the next request.
- finishReason swaps STOP -> tool_calls when any functionCall was
  emitted on the same turn so OAI clients trigger tool execution
  (matches the OpenAI Chat Completions contract).
- usageMetadata.thoughtsTokenCount is rolled into output_tokens and
  surfaced on output_tokens_details.reasoning_tokens so total_tokens
  reflects the full billable spend instead of dropping the hidden
  reasoning slice.

Registry (providers.py):
- Drop gemini-3-pro-preview from default_models. Google shut it down
  on 2026-03-09 and auto-redirects to gemini-3.1-pro-preview; we
  surface the canonical id only.
- Add model_id_deny_exact = ("gemini-3-pro-preview",) so the live
  ListModels fetch does not re-surface the redirect alias.

Route schema (models/inference.py):
- enable_prompt_caching widened to Optional[Union[bool, str]] so the
  /v1/chat/completions caller can pass a Gemini cachedContent resource
  name (e.g. cachedContents/abc123). Without this widening _stream_gemini
  s string cachedContent passthrough was unreachable from the public
  route (bool_parsing 422). stream_chat_completion signature mirrors.

Frontend (provider-capabilities.ts, chat-page.tsx, chat-adapter.ts):
- providerSupportsBuiltinImageGeneration now also recognises
  nano-banana ids (nano-banana-pro-preview was hidden from the image
  pill before).
- providerSupportsBuiltinWebSearch takes the model id so Gemini image
  models hide the Search pill (mirrors the backend skip).
- providerSupportsBuiltinCodeExecution uses the same isGeminiImageModel
  guard for nano-banana ids.
- GEMINI_THINKING_PRO_PREFIXES gains gemini-3.5-pro; gemini-3-pro
  tightened to gemini-3-pro-preview to avoid the image-id overlap.
- Updated 3 callers of providerSupportsBuiltinWebSearch to thread the
  selected model id through.

Tests (test_gemini_provider.py): 34 -> 42, all green
- test_image_models_skip_thinking_config
- test_image_models_drop_text_only_tools
- test_gemini_35_pro_recognized_as_pro_thinking
- test_legacy_openai_base_url_normalized
- test_finish_reason_swaps_to_tool_calls_when_function_call_emitted
- test_thought_signature_round_trips_into_gemini_function_call
- test_thought_signature_emitted_in_tool_call_delta
- test_usage_chunk_includes_thoughts_tokens

Verification:
- Backend pytest 1518/1519 passing (one unrelated Qwen3.5 flash-attn
  test fails on main as well; nothing in this PR touches that path).
- Frontend npx tsc -b clean.
- Live e2e 16/16 against generativelanguage.googleapis.com through the
  patched _stream_gemini code path (all 11 chat models single + multi
  turn, all 3 image models returned image bytes, web_search and
  code_execution tools both emit the expected envelope).
- Live /api/providers/models against the patched backend surfaces 16
  ids (gemini-3-pro-preview correctly filtered via deny_exact).
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

pre-commit-ci Bot and others added 2 commits May 24, 2026 15:02
Round-2 reviewer.py flagged a phantom web_search card on image
turns (12/12 reviewers), route-layer stripping of tool_calls /
tool_call_id / name, an over-narrow image-mode tool guard, and
silent safety blocks. This patch fixes all four.

Backend (external_provider.py):
- web_search_active is now derived from the outbound tools_array
  (whether googleSearch was actually forwarded), not the raw
  enabled_tools intent. Image-mode turns dropped the tool above so
  the inbound stream no longer emits a phantom "search complete"
  tool_start / tool_end on those turns.
- text_tools_allowed now uses is_image_model (covers both `-image`
  / `nano-banana` picker models AND text models that requested
  `image_generation` via enabled_tools). Verified against the live
  Gemini API which rejects both googleSearch and codeExecution
  alongside responseModalities=["TEXT","IMAGE"] with explicit 400s
  ("Search as tool is not enabled for this model", "Code execution
  is not enabled for this model").
- promptFeedback.blockReason is surfaced as a 400 content-filter
  error chunk instead of returning an empty successful assistant
  response. The streaming loop closes the response before exiting.

Route (routes/inference.py):
- _build_external_messages now propagates tool_calls (assistant),
  tool_call_id, and name (tool result) through every code path
  (string content, multimodal content, non-vision fallback). Without
  this Gemini 3 function-call round trips lost their thoughtSignature
  + tool_call_id at the route boundary, and functionResponse.name
  arrived empty on the second turn.
- Assistant messages with content=None and tool_calls populated are
  preserved as a synthetic empty-string content turn so the
  Gemini translator can rebuild the functionCall part.

Tests (test_gemini_provider.py): 42 -> 45, all green
- test_image_models_suppress_phantom_web_search_card
- test_image_generation_tool_drops_text_tools
- test_prompt_feedback_block_reason_surfaces_as_error

Verification:
- Backend pytest 1736 / 1736 (the two pre-existing unrelated fails
  on main, test_help_output and Qwen3.5 flash-attn pin, are skipped).
- Frontend npx tsc -b clean.
- Live e2e 16/16 against generativelanguage.googleapis.com:
  11 chat models single + multi turn, 3 image models returning
  image bytes, web_search and code_execution both PASS.
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Round 3 review follow-ups:

Backend (studio/backend/core/inference/external_provider.py):
- Close response AND aiter_lines iterator in a finally so normal,
  prompt-block, and cancellation exits all clean up (eliminates the
  RuntimeWarning about aclose never being awaited).
- Pair the synthetic web_search tool_start with a tool_end on the
  promptFeedback.blockReason path so the UI does not leave a stuck
  "searching..." spinner after the error toast.
- Preserve native id and thoughtSignature on executableCode and
  codeExecutionResult tool events under google.native_part, and pair
  the tool_end on the code-exec id so multi-turn code-execution
  replays do not lose Gemini-required history.
- Carry part-level thoughtSignature on text deltas via
  delta.extra_content.google.thought_signature and on inline image
  tool_end via google.thought_signature so Gemini 3 image editing
  and tool turns round-trip the signature on the next request.
- Guess remote image_url MIME from the URL path so PNG / WebP / GIF
  inputs are not silently relabeled as JPEG.
- Roll usageMetadata.toolUsePromptTokenCount into translated input
  tokens and surface thoughtsTokenCount as
  completion_tokens_details.reasoning_tokens in _build_usage_chunk.
- Only normalize the Google-hosted /v1beta/openai legacy base URL;
  custom proxies whose paths happen to end in /openai are left
  untouched.
- Forward ChatCompletionRequest.tools and tool_choice through
  stream_chat_completion into _stream_gemini, translating to
  tools[].functionDeclarations and toolConfig.functionCallingConfig.

Frontend:
- chat-adapter: when Gemini image-generation is enabled for the turn,
  also disable Search and Code so the request, builder, and active
  pills agree with what the backend actually sends (the backend
  already strips text tools when image_generation is in enabled_tools).
- chat-adapter: consume OpenAI-shape delta.tool_calls chunks so
  Gemini function-call deltas without text surface as tool-call parts.
- shared-composer: disable Search and Code pills while Gemini image
  mode is active so the UI matches the request.

Tests (studio/backend/tests/test_gemini_provider.py): adds coverage
for proxy base-url gating, remote image MIME inference,
toolUsePromptTokenCount, reasoning_tokens propagation, prompt-block
web_search tool_end pairing, native code-exec id/thoughtSignature
metadata, inline image thoughtSignature, text-chunk extra_content,
OpenAI tools/tool_choice translation, and image-model tool drop.
danielhanchen and others added 3 commits May 25, 2026 14:23
After a Gemini chat that ran code_execution / image_generation, switching
the same thread to a local GGUF model used to forward the synthetic
provider-side tool_calls (tagged with `args._server_tool` or carrying a
Gemini `args.google.native_part` payload) and the message-level
`extra_content` to llama-server. The receiving backend has no tool
declaration for those names and no use for Gemini thoughtSignature
metadata; in the worst case it can produce an orphan tool_call_id and a
confused continuation.

Add `_strip_provider_synthetic_tool_history()` and wire it through the
two local message builders:
  - `_openai_messages_for_passthrough`  (OAI-compat passthrough)
  - `_openai_messages_for_gguf_chat`    (standard GGUF chat path)

Real user-function `tool_calls` and their matching `role="tool"` replies
survive unchanged; only synthetic provider-side cards and Gemini-only
`extra_content` are stripped. If the synthetic call was the assistant
turn's only payload, the now-empty turn is dropped too so llama-server
does not reject the request.

Adds 2 regression tests:
  - test_strip_provider_synthetic_tool_history_drops_synthetic_only
  - test_strip_provider_synthetic_tool_history_drops_empty_assistant

142 existing backend tests still pass.
For external Gemini image-tier models (gemini-2.5-flash-image,
gemini-3.x-image-preview, etc.), the backend unconditionally strips
code_execution and strips web_search on older image ids. Search is
still allowed on Gemini 3.x Pro/Flash image models, which
supportsBuiltinWebSearch already encodes per model.

Before this commit the composer pill gates were:
  searchDisabled = !modelLoaded || !(supportsTools || supportsBuiltinWebSearch)
  codeDisabled   = !modelLoaded || !(supportsTools || supportsBuiltinCodeExecution) || imageModeDisablesCode

`supportsTools` here is a local-runtime fallback that becomes true when
any tool-capable local model has been loaded in the session. With a
local tool-capable runtime active, switching the chat to an external
Gemini image-tier model used to leave Search/Code clickable, even
though the backend will silently drop the tool on the wire.

Detect "external provider is Gemini AND the model is image-tier" (via
supportsBuiltinImageGeneration) and gate the two pills strictly on the
provider's own builtin support in that case. Non-Gemini paths and
non-image Gemini models keep the supportsTools fallback unchanged.
@danielhanchen
Copy link
Copy Markdown
Member Author

Round 22 review pass landed. Twelve parallel codex reviewers, five APPROVE and seven REQUEST_CHANGES, no security findings. Convergent P1s plus the stale-rebase issue are now addressed:

  1. Anthropic / OpenRouter / Kimi did not honour tool_choice={"type":"function", ...} as a hosted-tool opt-out, only tool_choice="none". The Gemini path already did. Mirror the gate symmetrically so a forced-function pin suppresses Anthropic web_search/web_fetch/code_execution, OpenRouter plugins:[{id:"web"}] and its synthetic SSE web_search card, and Kimi _stream_kimi_web_search. Backend tests for all four cases added (feature/gemini-provider commit fe1c5c1).

  2. Synthetic provider-side tool history (Gemini code_execution / image_generation cards tagged with args._server_tool or carrying args.google.native_part) used to be forwarded verbatim into llama-server when a thread was switched from Gemini to a local GGUF backend. Add _strip_provider_synthetic_tool_history() and call it from _openai_messages_for_passthrough plus _openai_messages_for_gguf_chat. Real user-function tool_calls + their matching role="tool" replies survive; only the synthetic Gemini-only cards are dropped (commit 1890d85).

  3. The Search and Code composer pills could be re-enabled by the local supportsTools fallback even on Gemini image-tier models, which the backend will silently strip. Detect "external provider is Gemini AND model is image-tier" and gate strictly on the provider builtin support in that case (commit 9d91db0).

  4. The PR head was stale and would have accidentally reverted three current-main fixes when merged (studio/backend/core/export/export.py MLX save_method="merged_16bit", unsloth/chat_templates.py placeholder validation, and the matching tests/python/test_construct_chat_template_validation.py). Merged origin/main into the branch so those land cleanly with this PR (commit 68fd4d6).

The Gemini 3 Pro thinkingLevel: "medium" finding (Review 12) was checked against both Google AI docs and Vertex AI docs and medium is in fact a supported level for Gemini 3 Pro (low, medium, high). No change made there.

All 142 existing Gemini-provider backend tests still pass; six new regression tests added to lock the new behaviour in.

Round 22 added the gate for Gemini / Anthropic / OpenRouter / Kimi but
missed the OpenAI Responses translator. When a caller pinned a user
function via `tool_choice={"type":"function","function":{"name":...}}`
plus `enabled_tools=["web_search","code_execution","image_generation"]`,
the Responses body still attached `{"type":"web_search"}`,
`{"type":"shell"}`, and `{"type":"image_generation"}` server tools. The
function pin should suppress those for the same privacy + billing reason
the other provider paths now do.

Compute `_responses_tool_choice_forced_function` next to
`_responses_tool_choice_none` and gate each hosted-tool append on
`_responses_hosted_builtins_allowed = not none and not forced_function`.
The fix has to be applied in TWO places: the initial body builder and
`_build_body()` (called by the container-expiry retry path). User
function declarations still flow through so the pin has something to
target, and the Responses-shape `{type:"function", name:"..."}`
`tool_choice` is forwarded unchanged.

Adds regression test `test_openai_responses_forced_function_tool_choice_drops_hosted_tools`.
All 166 existing backend tests across Gemini + Responses + image-gen +
code-exec suites still pass.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: de046986ba

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +5449 to +5450
if not isinstance(tool_calls, list) or not tool_calls:
sanitized_assistant.append(m)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Strip Gemini extra_content from assistant turns without tools

_strip_provider_synthetic_tool_history returns assistant messages unchanged when tool_calls is missing/empty, so Gemini-only extra_content (e.g., text thought signatures) is still forwarded on provider-switch paths. That leaks a non-standard message key into /v1/chat/completions passthrough/local GGUF requests, where strict OpenAI-compatible backends may reject the request or receive irrelevant provider metadata. This regression appears when a thread with Gemini assistant text turns is continued on a non-Gemini backend.

Useful? React with 👍 / 👎.

danielhanchen and others added 2 commits May 25, 2026 16:42
…+ custom-Gemini model list

Three convergent P1s from round 24 review:

1. SSRF: the shared SSRF validator in `tools._validate_and_resolve_host`
   used a denylist (is_private / loopback / link_local / multicast /
   reserved / unspecified). Python classifies shared address space
   (100.64.0.0/10 carrier-grade NAT, plus 240.0.0.0/4, benchmarking
   ranges, etc.) with `is_private=False` AND `is_global=False`. The new
   Gemini server-side image fetcher therefore accepts URLs whose
   hostname resolves to 100.64.0.1 in cloud/VPC deployments. Add
   `not ip.is_global` as the primary gate -- a single source of truth
   that covers every current and future non-global range.

2. _strip_provider_synthetic_tool_history previously only stripped
   message-level `extra_content` when the assistant turn had tool_calls.
   A plain text Gemini reply carrying
   `extra_content.google.thought_signature` flowed through to
   llama-server when the thread was switched to a local GGUF backend.
   Always strip message-level `extra_content` on assistant turns.

3. routes/providers.list_provider_models applied Gemini's native
   `model_id_allowlist` regex to every Gemini provider, including
   custom OAI-compatible bases (LiteLLM, deployment gateways). IDs like
   `google/gemini-2.5-flash` and team-prefixed deployment aliases got
   filtered out even though the chat-dispatch path now routes them via
   the OpenAI-compatible client. Skip registry-level model-id filters
   when the configured Gemini base_url host is not the canonical
   `generativelanguage.googleapis.com`, mirroring the chat-dispatch
   gate.

Three regression tests added:
  - test_validate_and_resolve_host_blocks_shared_address_space
  - test_strip_provider_synthetic_tool_history_drops_text_only_extra_content
  - test_gemini_custom_oai_compat_base_skips_native_allowlist
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

1 similar comment
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

danielhanchen and others added 5 commits May 25, 2026 20:46
…nto Gemini schema

Two convergent reviewer findings on the native Gemini path:

1. _stream_gemini's tool_calls replay loop falls through to a generic
   functionCall emission whenever it sees an assistant tool_call. Marked
   server-side builtin cards (web_search / web_fetch tagged with
   _server_tool or args.google.native_part) hit that fallthrough with no
   replayable native_part, which produces an outbound functionCall whose
   name is not a declared user function. The Gemini turn 400s on the
   undeclared name. Guard the loop to drop those entries instead, while
   keeping the existing code_execution / image_generation native-part
   replay branch intact.

2. _sanitize_gemini_schema uses a strict allowlist that drops local
   $ref / $defs references. Pydantic-generated tool schemas hoist nested
   object shapes into $defs and reference them via {"$ref": "#/$defs/X"},
   so a property like address: {"$ref": "#/$defs/Address"} collapsed to
   {} on the wire and the model lost the nested fields, types, and
   required keys. Resolve local #/... pointers against the schema root
   and inline the referenced subtree, with local siblings overriding
   the reference (normal JSON Schema composition) and a seen-ref guard
   for self-referential schemas.

Added regression coverage:
- test_gemini_native_skips_synthetic_server_builtin_replay
- test_function_declarations_inline_local_refs_into_gemini_schema
- test_function_declarations_inline_local_refs_in_anyof_and_items
- test_function_declarations_self_referential_schema_terminates

All 145 Gemini provider tests pass; touched provider regression set
(OpenAI Responses, code execution, image generation, Anthropic code
execution, Anthropic web_fetch) also 43/43 green.
…es synthetic-history strip

Reviewer round 26 surfaced two convergent asymmetric-fix bugs.

1. _stream_gemini drops a synthetic server-tool tool_call (web_search /
   web_fetch tagged _server_tool) and also replays code_execution /
   image_generation tool_calls as Gemini-native executableCode /
   codeExecutionResult / inlineData parts. The matching role="tool"
   follow-up was still falling through to the generic functionResponse
   branch, producing either an orphan functionResponse (synthetic case)
   or a duplicate response pointing at a name with no
   functionDeclarations entry (native-part case). Both forms 400 the
   next Gemini turn. Track skipped + native-replayed tool_call_ids in
   _gemini_skip_tool_result_ids and short-circuit the role="tool"
   branch on a match.

2. The Anthropic-compatible local /v1/messages route only called
   _drop_empty_assistant_sentinels on the OpenAI-translated history,
   while the sibling /v1/chat/completions and GGUF passthrough builders
   chain that with _strip_provider_synthetic_tool_history. An Anthropic
   caller replaying a prior provider-side tool_use therefore forwarded
   fake builtin tool history straight into local llama-server. Apply
   the same strip on the Anthropic route after the
   anthropic_messages_to_openai conversion.

Regression coverage added:
- test_gemini_native_skips_orphan_function_response_for_dropped_builtin
- test_gemini_native_skips_orphan_function_response_for_native_part_replay

Gemini suite 147/147; touched provider regression set 43/43.
…dget for base64

Two convergent reviewer findings on the native Gemini path.

1. _stream_gemini's synthetic-builtin detector at lines 3519-3524
   recognizes args.google.native_part as a server-tool marker, but
   _native_part was only loaded from tc.extra_content.google.native_part.
   A direct OpenAI-compatible API caller or imported third-party thread
   round-trips the payload through function.arguments because
   tool_calls[].extra_content is not in the OpenAI spec. The round-25
   guard then saw a synthetic builtin with no _native_part and dropped
   the entire assistant turn, so the next native Gemini request lost
   the prior executableCode / inlineData / codeExecutionResult context.
   Fall back to args.google.native_part when extra_content path is
   missing, mirroring what the synthetic detector already accepts.

2. _GEMINI_REMOTE_IMAGE_MAX_TOTAL_BYTES capped DECODED bytes at 20MB.
   Gemini receives images base64-encoded inside JSON, and base64
   inflates payload size by ~4/3. With 20MB decoded the actual JSON
   body is ~26.7MB plus prompt overhead, well over Gemini's ~20MB
   request limit. Drop the decoded cap to 14MB so realistic multi-
   image turns stay safely under 20MB encoded.

Added regression test test_gemini_native_part_falls_back_to_args_google
covering an OpenAI-compat-shaped image_generation tool_call whose
native_part lives only in function.arguments.

Gemini suite 148/148.
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Main moved forward 17 commits during PR review (latest: 953c8bf). Real
conflicts in five files; resolved by combining both branches' changes.

studio/backend/core/inference/external_provider.py
- Add fast_mode (Anthropic Opus 4.6/4.7 speed flag, #5715) to
  stream_chat_completion and Anthropic-branch call site, alongside
  existing Gemini tools/tool_choice forwarding.
- Add _openai_image_generation_tool() helper (action:"edit" for follow-
  up image edits, #5712) and use it inside the existing
  _responses_hosted_builtins_allowed gate so the forced-function /
  tool_choice="none" suppression added in rounds 21+ still applies.
- Keep Anthropic web_fetch gated on _anthropic_hosted_builtins_allowed
  (round 19+ hosted-builtin gate) while taking main's per-model
  version selector (web_fetch_20260209 vs _20250910).

studio/backend/routes/inference.py
- Add `openai = provider_type == "openai"` (used by main's reasoning
  content forwarding for follow-up image edits).
- Keep the round 25/26 Gemini filter chain (_filter_tool_calls drops
  synthetic server-builtin cards, marks tc_id so the matching
  role="tool" follow-up gets skipped, extra_content gated to native
  Gemini host).
- Forward fast_mode alongside tools/tool_choice.

studio/backend/tests/test_openai_image_generation.py
- Combine assertions: both _server_tool: True (PR) and
  openai_image_generation_call_id (main) are present on the tool_start
  arguments.

studio/frontend/src/features/chat/shared-composer.tsx
- Add supportsBuiltinWebFetch declaration (separate Fetch pill from
  #5742) before the PR's isExternalGemini constant so both the Gemini
  image-tier gating and the standalone Anthropic Fetch pill compile.

studio/frontend/src/features/chat/api/chat-adapter.ts
- Add main's normalizeOpenAIReasoningItem, toOpenAIImageEditReferenceMessage,
  isAnthropicRefusalMessage helpers alongside PR's collectAssistantToolCalls,
  collectToolResultMessages, SerializedMessage, collectAssistantTextThoughtSignature.
- toOpenAIMessages (PR) now also early-returns on isAnthropicRefusalMessage
  so refused turns get pruned from outbound history.
- Add a thin toOpenAIMessage (singular) wrapper for the OpenAI image-
  edit replay path's flat .map() usage.
- Merge per-turn enable flags: keep PR's imageGenerationEnabledForThisTurn,
  geminiImageModeForThisTurn, codeExecEnabledForThisTurn !geminiImageMode
  gate; take main's webFetchEnabledForThisTurn (sourced from independent
  webFetchToolsEnabled pill state).
- Outbound build chains main's anthropic_refusal survivingMessages prune,
  then flatMap(toOpenAIMessages) (PR), then PR's selectedImageEditReference
  reference message prepend; image-edit unavailable toast from main fires
  before any of that when the pill is off.
- tool_end merge: do main's nextArgs spread first, then PR's Gemini
  native_part parts concat so both OpenAI image-call ids and Gemini
  executableCode/codeExecutionResult/inlineData round-trip.
- Cumulative + final yields: orderAssistantContent(pinTextThoughtSignature(...))
  composes main's tool-vs-text ordering with PR's per-text thoughtSignature pin.

Tests: gemini provider 148/148; openai_responses_translation + openai_code_execution
+ openai_image_generation + anthropic_code_execution + anthropic_web_fetch +
external_provider_usage_chunk + providers_api: 50 passed, 42 skipped; main's
new anthropic_fast_mode + citations + openai_citation_markers + openai_tool_result_fallbacks
suites all 43/43.
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

…urn [] + cast image-edit ref

Three errors in chat-adapter.ts surfaced by the frontend tsc step after merging
main into feat/gemini-provider:

1. The Anthropic refusal early-return used main's  but
   toOpenAIMessages returns SerializedMessage[]; flip to .
2. Restore  -- the line
   was lost when removing main's conflict block from the function body.
3. selectedImageEditReference splice was inserting OpenAIChatMessage
   into a SerializedMessage[] array; the shapes differ on tool_calls.id
   nullability. Cast the reference message through unknown -- it carries
   no tool_calls, so the runtime payload is structurally compatible.

Reproduced locally with `tsc -b --pretty false` (now passes). Build
also failing in the in-repo `npm run build` step on PR CI; this commit
unblocks all 12 failing UI/API workflows.
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Compress multi-line explanatory comments in the Gemini translator
and the chat adapter without changing any behaviour. All 148 Gemini
provider tests still pass; tsc --noEmit clean.
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@danielhanchen
Copy link
Copy Markdown
Member Author

End-to-end Playwright verification of this PR against the live Gemini API (generativelanguage.googleapis.com/v1beta) on a fresh PR-head install (feat/gemini-provider @ b19b662e, unsloth 2026.5.8, Studio bound to loopback). Five flows, each driving the real Studio chat UI:

Feature Model Status Detail
Text + 4-turn chat (JP → ES → pirate → 5-word summary) gemini-2.5-flash 4 bubbles, per-turn 0.6 – 4.9 s
Image generation (Nano Banana) gemini-2.5-flash-image 1.8 MB PNG
User-defined function tool call gemini-2.5-flash get_weather({"city":"Sydney, Australia"}) returned in streamed tool_calls[0]
Web Search composer pill (hosted googleSearch) gemini-2.5-flash 2 coinmarketcap.com citation chips
Code Execution composer pill (hosted codeExecution) gemini-2.5-flash Returned 24133 (sum of first 100 primes)

Backend log proves every call routed through this PR's _stream_gemini translator (native :streamGenerateContent?alt=sse, no OpenAI-compat shim):

Proxying Gemini streamGenerateContent ... gemini-2.5-flash       tools=[]                       image=False
Proxying Gemini streamGenerateContent ... gemini-2.5-flash-image tools=[]                       image=True
Proxying Gemini streamGenerateContent ... gemini-2.5-flash       tools=['functionDeclarations'] image=False
Proxying Gemini streamGenerateContent ... gemini-2.5-flash       tools=['googleSearch']         image=False
Proxying Gemini streamGenerateContent ... gemini-2.5-flash       tools=['codeExecution']        image=False

Recordings

Multi-turn text on gemini-2.5-flash

text

Nano Banana image generation (gemini-2.5-flash-image)

image

The decoded PNG round-tripped from the data URL:

red panda

Web Search pill → hosted googleSearch

search

Code Execution pill → hosted codeExecution

code

Full 41-second walkthrough (all four flows back-to-back): full_workflow.mp4

API keys never appear on-screen in any frame: the Gemini key is injected into Studio via localStorage (unsloth_chat_external_provider_keys) through Playwright's add_init_script before the first page load, and the bootstrap JWT is set the same way; no Settings → Connections click is recorded.

@danielhanchen
Copy link
Copy Markdown
Member Author

Follow-up — verified the Think pill is a per-family effort selector for Gemini (same UX as GPT/Claude), and that Reasoning + Search + Code compose in a single request.

Per-model effort menu (live probe)

button[aria-label^="Reasoning effort"] with aria-haspopup="menu", rendered for every Gemini chat model; image-tier ids correctly hide it.

Model Reasoning pill Default Menu levels
gemini-2.5-flash Medium None / Low / Medium / High / Max
gemini-2.5-flash-lite Medium None / Minimal / Low / Medium / High / Max
gemini-2.5-pro Medium Low / Medium / High / Max  (no off — API rejects thinkingBudget=0 on Pro)
gemini-3-flash-preview Medium Minimal / Low / Medium / High
gemini-3.1-pro-preview Medium Low / Medium / High  (no Minimal — API rejects on Pro tier)
gemini-2.5-flash-image hidden (image-tier; correct)

Menu screenshots:

2.5 Flash 3 Flash Preview 3.1 Pro Preview

Backend mapping (provider-capabilities.ts:790-866external_provider.py:3846-3928):

  • Gemini 3.x familythinkingConfig.thinkingLevel (string: low/medium/high, minimal on Flash only). Per ai.google.dev/gemini-api/docs/thinking.
  • Gemini 2.5 Flash + 2.5 Flash-LitethinkingConfig.thinkingBudget (int: 0 = off, mapped from "none"; 512 floor mapped from "minimal"; tier-specific upper bound mapped from "max").
  • Gemini 2.5 Pro → also thinkingBudget but Studio hides the off switch because the API rejects 0 ("only works in thinking mode").

Reasoning + Search + Code in a single request — PASS

Single chat on gemini-2.5-flash with Think: High plus the Search and Code composer pills both active. Prompt asked the model to look up the live USD→EUR rate then compute 1234.56 USD in EUR via code execution.

combined

What landed in chat:

  • Header 3 tool calls — Gemini chained googleSearch (rate lookup) → googleSearch (cross-check) → codeExecution (Python multiply).
  • Cited the rate inline: "1 USD = 0.8593 EUR (Source: Xe.com, as of 01:11 UTC)" with xe.com and google.com citation chips.
  • Final answer: 1060.86 EUR for 1234.56 USD (correct: 1234.56 × 0.8593 ≈ 1060.85).
  • Composer pills all visible as on: Think: High, Search, Code.

Backend log shows the three tools entered Google's API together on one streamed request:

Proxying Gemini streamGenerateContent ... gemini-2.5-flash  tools=['googleSearch', 'codeExecution']  image=False

(The reasoning effort isn't logged because thinkingLevel/thinkingBudget is in generationConfig, not in the tools array; the response did surface a thoughts part as 3 tool calls chain-of-action.)

TL;DR

Yes — for Gemini models the Think pill is a clickable effort selector like the GPT/Claude pill, with the level menu correctly varying per family (image-tier ids hide it), and reasoning composes cleanly with the hosted Search and Code tools in a single streamed request.

@danielhanchen danielhanchen merged commit ab48465 into main May 27, 2026
35 checks passed
@danielhanchen danielhanchen deleted the feat/gemini-provider branch May 27, 2026 13:01
rhsCZ pushed a commit to rhsCZ/unsloth that referenced this pull request May 27, 2026
Conflicts came from unslothai#5720 (native Gemini provider). All resolved
keeping both branches' functionality:

- provider-capabilities.ts: gemini bucket now uses unslothai#5720's narrow
  capability shape (temperature/topP/topK/presencePenalty true) plus
  the 27 extended-sampler fields from this PR (all false on gemini
  since Google's API doesn't accept them). stop=true added so the new
  generationConfig.stopSequences forwarding lights up the UI.
- chat-adapter.ts: kept all 27-field forwarding from this PR; used
  the tighter comments from main.
- routes/inference.py: pass both this PR's sampling kwargs
  (frequency_penalty/seed/stop/service_tier/parallel_tool_calls) and
  main's tools/tool_choice through to stream_chat_completion.
- external_provider.py: same. Every dispatcher (anthropic/openai/
  gemini) now takes both branches' new args. Added stop forwarding to
  _stream_gemini as generationConfig.stopSequences (capped at 5 per
  native API docs); updated test_gemini_stop_sequences_capped_to_5
  to assert the native shape instead of the OAI-compat shape.

256/256 backend tests pass (test_sampling_params_routing 65 +
anthropic/openai/gemini integration suites 191); frontend type-check
plus vite build clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants