Studio: add Gemini provider with web_search, code_execution, prompt caching, and Nano Banana image generation#5720
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 68379b6534
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| async for line in self._stream_gemini( | ||
| messages, | ||
| model, | ||
| temperature, | ||
| top_p, | ||
| max_tokens, | ||
| top_k, | ||
| enabled_tools, | ||
| enable_prompt_caching, |
There was a problem hiding this comment.
Forward Gemini presence penalty from chat request
presence_penalty is accepted on the external chat request path and Gemini is marked as supporting it in provider-capabilities.ts, but the Gemini dispatch call drops that argument and _stream_gemini never writes generationConfig.presencePenalty. As a result, changing the Presence Penalty control has no effect for Gemini runs, which makes tuning and reproducibility misleading for users who expect that knob to be applied.
Useful? React with 👍 / 👎.
| tool_name = msg.get("name") or msg.get("tool_name") or "" | ||
| response_payload: Any | ||
| if isinstance(content, str): | ||
| try: | ||
| response_payload = _json.loads(content) | ||
| except Exception: | ||
| response_payload = {"result": content} | ||
| else: | ||
| response_payload = content or {} | ||
| parts = [ | ||
| { | ||
| "functionResponse": { | ||
| "name": tool_name, |
There was a problem hiding this comment.
Derive functionResponse name when tool message omits name
This branch builds Gemini functionResponse using msg.name/msg.tool_name and falls back to an empty string, but tool-result messages are commonly keyed by tool_call_id without a required name field. In that case we emit functionResponse.name: "", which can break function-calling follow-up turns against Gemini because the tool result is no longer tied to a valid function name.
Useful? React with 👍 / 👎.
| if isinstance(enable_prompt_caching, str) and enable_prompt_caching: | ||
| body["cachedContent"] = enable_prompt_caching |
There was a problem hiding this comment.
Don’t advertise Gemini caching without wiring a cache id
Gemini prompt caching is now exposed in the UI, but this backend path only sets cachedContent when enable_prompt_caching is a non-empty string. The normal external request model and chat adapter send enable_prompt_caching as a boolean, so toggling caching for Gemini never changes the outbound request and silently does nothing.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cd582df889
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| # OpenAI Chat Completions on this base. Requests/responses are | ||
| # translated in `_stream_gemini` in external_provider.py. | ||
| # API reference: https://ai.google.dev/gemini-api/docs | ||
| "base_url": "https://generativelanguage.googleapis.com/v1beta", |
There was a problem hiding this comment.
Parse Gemini native model catalog before switching base URL
Changing Gemini to the native https://generativelanguage.googleapis.com/v1beta endpoint routes /models to Gemini’s native schema, but our model discovery path still only consumes OpenAI-style data[].id in ExternalProviderClient.list_models(). That leaves Gemini model fetches empty in /api/providers/models (and provider test counts at 0), so users lose dynamic model discovery and rely only on hardcoded defaults. Add native parsing (models[] with name/baseModelId) or mark Gemini as curated-only to avoid silently empty catalogs.
Useful? React with 👍 / 👎.
| "functionCall": { | ||
| "name": fn_name, | ||
| "args": args, | ||
| } |
There was a problem hiding this comment.
Preserve Gemini function call IDs across tool round-trips
The Gemini translation strips call identifiers when converting OpenAI tool calls into functionCall parts, so follow-up role=tool responses can only be matched by function name. This becomes ambiguous when one assistant turn issues multiple calls to the same function (different args but same name), which can mis-associate tool outputs in subsequent model context. Forward the OpenAI tool-call id into Gemini functionCall.id and mirror tool_call_id into functionResponse.id.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a2ed444404
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| "delta": { | ||
| "tool_calls": [ | ||
| { | ||
| "index": 0, |
There was a problem hiding this comment.
Emit distinct tool_call indices for each Gemini function call
Do not hardcode tool_calls[0].index to 0 for every emitted Gemini functionCall. When one assistant turn contains multiple function calls, all streamed chunks collapse into the same slot in consumers that reassemble by index (including our own delta.tool_calls assemblers), so only one call survives or arguments get merged incorrectly. This breaks parallel/multi-tool execution for Gemini even when unique ids are present; each call needs a stable, distinct index per turn.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2b05fa73ba
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| fc_name = fc.get("name") or "" | ||
| fc_args = fc.get("args") or {} | ||
| fc_id = ( | ||
| fc.get("id") | ||
| or f"call_{fc_name}_{time.time_ns()}" |
There was a problem hiding this comment.
Forward thoughtSignature in Gemini function calls
When Gemini thinking models emit tool calls, the functionCall part can include thoughtSignature that must be replayed on the next turn with the corresponding tool result; this translation only reads name/args/id and drops the signature, so the follow-up request cannot reconstruct a valid functionCall history and can fail with Gemini 400 errors on tool round-trips. This impacts multi-turn function-calling flows on Gemini 2.5/3-class models even when IDs are preserved.
Useful? React with 👍 / 👎.
…aching, and Nano Banana image generation
Wires Google's native Gemini API into Studio's external-provider stack
so users can pick gemini-2.5-pro / gemini-2.5-flash / gemini-2.5-flash-image
(Nano Banana) alongside the existing OpenAI / Anthropic / OpenRouter
providers. Gemini does not speak OpenAI Chat Completions on its primary
endpoint; the new `_stream_gemini` async generator translates between
the two shapes the same way `_stream_anthropic` handles the Messages API.
Backend:
- New `_stream_gemini` translator in external_provider.py. Converts
OpenAI messages -> Gemini `contents` + `systemInstruction`; maps
generationConfig (temperature / topP / topK / maxOutputTokens);
forwards `tools: [{googleSearch: {}}]` for web_search and
`{codeExecution: {}}` for code_execution; passes `cachedContent`
through for prompt caching; sets `responseModalities=[TEXT, IMAGE]`
for Nano Banana image generation.
- Translates streamed `GenerateContentResponse` SSE frames back into
OpenAI chat.completion.chunk frames (text deltas, function_call ->
tool_calls deltas, inlineData -> image_b64 tool_end envelope, usage
chunk before [DONE]).
- Registry entry switched to native base URL
`https://generativelanguage.googleapis.com/v1beta` with
`openai_compatible: False` and the `x-goog-api-key` auth header.
Model lineup curated to current 2.5 / 2.0 family + Nano Banana.
Frontend:
- Provider-capability matrix: Gemini supports temperature, top_p, top_k,
presence_penalty (matches generationConfig); min_p / repetition_penalty
hidden because the API does not accept them.
- `providerSupportsBuiltinWebSearch` / `providerSupportsBuiltinCodeExecution`
/ `providerSupportsBuiltinImageGeneration` extended for Gemini.
- Prompt caching toggle now also lit on Gemini.
Tests:
- 21 new tests in `test_gemini_provider.py` using httpx.MockTransport.
Cover request body shape conversion, URL/header wiring, web_search
forwarded as googleSearch, function-call translation both directions,
prompt caching passthrough, image generation emitting image_b64,
grounded-search citations -> tool_end, finish_reason mapping, and
vision data URL -> inlineData translation.
for more information, see https://pre-commit.ci
…from tool_call_id
Two follow-up fixes for the Gemini provider:
* Thread presence_penalty into _stream_gemini and set
generationConfig.presencePenalty when non-zero. The OpenAI-side
capability matrix already exposes the slider for Gemini, so the
value was being collected and silently dropped on the way out.
* When an OpenAI role=tool message omits 'name' and only carries
'tool_call_id', recover the function name from the matching
functionCall on the prior assistant turn. Gemini 400s on an empty
functionResponse name.
for more information, see https://pre-commit.ci
…ents
The Gemini stream parser only handled text/functionCall/inlineData
parts, so when the user toggled the Code pill on a Gemini model the
sandbox output (executableCode + codeExecutionResult parts) was
dropped on the floor while adjacent text reached the UI. Reviewers
flagged this as the headline feature being silently broken.
Translate both parts into the existing code_execution tool envelope
that CodeExecutionToolUI already consumes for OpenAI / Anthropic:
* executableCode -> tool_start with kind=code_execution and the
source code under arguments.code. We mint a tool_call_id and
stash it so the matching result block can pair to it.
* codeExecutionResult -> tool_end on that id with the stdout under
result. Non-OK outcomes (OUTCOME_FAILED / OUTCOME_DEADLINE_EXCEEDED)
are prefixed onto the text so the failure is visible.
for more information, see https://pre-commit.ci
…che claim
Three follow-ups to the Gemini provider PR after the codex pass:
* list_models() now translates Gemini's native /v1beta/models
payload ({models[{name, baseModelId, displayName,
supportedGenerationMethods}]}) into the OpenAI-compatible shape
Studio expects. Without this the picker stayed empty for Gemini
and fell back to hardcoded defaults. Embedding-only models are
filtered out.
* Forward the OpenAI tool_call id into Gemini's functionCall.id
and mirror it onto functionResponse.id. Two parallel calls to
the same function name can now be paired unambiguously on the
follow-up turn.
* Drop Gemini from the prompt-caching capability set. The wire
flow requires a separate cachedContents POST first and the
boolean Studio emits today is a no-op; the toggle should not
advertise a feature it cannot apply. Leaves a pointer to the
docs for the eventual two-step orchestration.
for more information, see https://pre-commit.ci
Codex flagged that the Gemini stream parser hardcoded tool_calls[0].index to 0 on every emitted functionCall. OpenAI reassemblers key tool_calls by index when joining deltas, so two parallel function calls in one assistant turn collapsed onto a single slot and the second call's arguments overwrote the first. Track the running count via len(emitted_function_call_ids) - 1 and emit it as the per-call index. The dedupe guard above (skip when fc_id already in the set) means the index is monotonic and stable for the lifetime of the stream. Regression test asserts [0, 1] across two parallel calls in one candidate parts list.
for more information, see https://pre-commit.ci
2b05fa7 to
696728d
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 696728d53f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // every 2.x model. Backend translation lives in `_stream_gemini`, | ||
| // citations surface on the response's `groundingMetadata`. See | ||
| // https://ai.google.dev/gemini-api/docs/grounding. | ||
| providerType === "gemini" |
There was a problem hiding this comment.
Gate Gemini web search by model capability
providerSupportsBuiltinWebSearch now enables the Search pill for all Gemini models, including gemini-2.5-flash-image that this commit also makes selectable; when that pill is on, the backend forwards tools: [{googleSearch:{}}], which is an unsupported config for the image model and can turn normal image requests into provider 4xx failures. Add model-level gating for Gemini web search (for example excluding -image variants), mirroring the existing Gemini code-execution guard.
Useful? React with 👍 / 👎.
…ng budget
`gemini-2.0-flash` / `gemini-2.0-flash-exp` were retired by Google in 2026
(`/v1beta/models/gemini-2.0-flash:streamGenerateContent` returns HTTP 404
"no longer available to new users"), and the picker had nothing past the
2.x family. Verified against the live ListModels catalog: drop the retired
ids from `default_models` + allowlist and surface the chat-capable
3.5 / 3.1 / 3 families plus the Nano Banana image trio.
Also plumb `enable_thinking` / `reasoning_effort` into Gemini's
`generationConfig.thinkingConfig`. Without this, Gemini 3.5 Flash,
gemini-pro-latest, and the 3.x previews silently spend the caller's
`max_tokens` budget on hidden "thoughts" before emitting any visible
answer -- the chat shows a truncated stub like "The capital of" and
streams stop. Mapping:
- enable_thinking=False / reasoning_effort=none -> thinkingBudget=0
(Flash tier; Pro tier coerces to a small positive budget because
the API 400s on 0 with "This model only works in thinking mode")
- minimal/low/medium/high -> 512/2048/8192/24576 budget tokens
- max/xhigh -> -1 (dynamic)
- default (neither knob set) -> thinkingConfig omitted, model decides
Frontend `getExternalReasoningCapabilities` now surfaces a
`reasoning_effort` picker for every Gemini chat id (Pro tier hides the
"none" option; image-tier ids stay knob-less). Adds 6 unit tests
covering Flash/Pro effort mapping, the off-toggle coercion on Pro,
default omission, and the nano-banana-pro-preview alias routing
through the image modalities path. 28 -> 34 tests in
`test_gemini_provider.py`, all green; full backend suite still passes
(1459/1460; the unrelated test_help_output flake is pre-existing and
not in any file this PR touches).
Live verification against generativelanguage.googleapis.com on
2026-05-24 with `_stream_gemini` directly:
text gemini-3.5-flash single PASS multi PASS
text gemini-3.1-pro-preview single PASS multi PASS
text gemini-3.1-flash-lite single PASS multi PASS
text gemini-3-pro-preview single PASS multi PASS
text gemini-3-flash-preview single PASS multi PASS
text gemini-2.5-pro single PASS multi PASS
text gemini-2.5-flash single PASS multi PASS
text gemini-2.5-flash-lite single PASS multi PASS
text gemini-flash-latest single PASS multi PASS
text gemini-flash-lite-latest single PASS multi PASS
text gemini-pro-latest single PASS multi PASS
image gemini-2.5-flash-image PASS (1082 KB png returned)
image gemini-3.1-flash-image-preview PASS (Nano Banana 2)
image gemini-3-pro-image-preview PASS (Nano Banana Pro)
tool web_search PASS
tool code_execution PASS
-> 16/16 e2e through the actual ExternalProviderClient code path.
for more information, see https://pre-commit.ci
|
Pushed What was broken before this commit
What's in this commitRegistry (
Backend (
Frontend (
Tests (
Full backend suite: 1459/1460 (the unrelated E2E verified against live Gemini API on 2026-05-24Each row drives the actual installed 16 / 16 passing. OpenAI baseline ( Live |
Fixes a batch of bugs surfaced by a second-pass review on top of the 3.5/3.1/3 + Nano Banana 2/Pro additions in c6724db. Backend (external_provider.py): - Constructor normalises legacy /v1beta/openai base URLs to /v1beta so Gemini providers saved before the native switch keep working without a manual re-config. - Skip thinkingConfig, googleSearch, and codeExecution on image-tier models (-image / nano-banana). The image responseModalities path is mutually exclusive with text-tool wiring and stale UI state would otherwise 400 the turn. - _PRO_THINKING_PREFIXES now includes gemini-3.5-pro and uses anchored prefix matching (exact id or "<prefix>-...") so the image-tier gemini-3-pro-image-preview cannot accidentally match the pro guard. - Gemini 3 functionCall thoughtSignature is round-tripped through the tool_calls envelope via extra_content.google.thought_signature on emit, and replayed as a sibling of functionCall on the next request. - finishReason swaps STOP -> tool_calls when any functionCall was emitted on the same turn so OAI clients trigger tool execution (matches the OpenAI Chat Completions contract). - usageMetadata.thoughtsTokenCount is rolled into output_tokens and surfaced on output_tokens_details.reasoning_tokens so total_tokens reflects the full billable spend instead of dropping the hidden reasoning slice. Registry (providers.py): - Drop gemini-3-pro-preview from default_models. Google shut it down on 2026-03-09 and auto-redirects to gemini-3.1-pro-preview; we surface the canonical id only. - Add model_id_deny_exact = ("gemini-3-pro-preview",) so the live ListModels fetch does not re-surface the redirect alias. Route schema (models/inference.py): - enable_prompt_caching widened to Optional[Union[bool, str]] so the /v1/chat/completions caller can pass a Gemini cachedContent resource name (e.g. cachedContents/abc123). Without this widening _stream_gemini s string cachedContent passthrough was unreachable from the public route (bool_parsing 422). stream_chat_completion signature mirrors. Frontend (provider-capabilities.ts, chat-page.tsx, chat-adapter.ts): - providerSupportsBuiltinImageGeneration now also recognises nano-banana ids (nano-banana-pro-preview was hidden from the image pill before). - providerSupportsBuiltinWebSearch takes the model id so Gemini image models hide the Search pill (mirrors the backend skip). - providerSupportsBuiltinCodeExecution uses the same isGeminiImageModel guard for nano-banana ids. - GEMINI_THINKING_PRO_PREFIXES gains gemini-3.5-pro; gemini-3-pro tightened to gemini-3-pro-preview to avoid the image-id overlap. - Updated 3 callers of providerSupportsBuiltinWebSearch to thread the selected model id through. Tests (test_gemini_provider.py): 34 -> 42, all green - test_image_models_skip_thinking_config - test_image_models_drop_text_only_tools - test_gemini_35_pro_recognized_as_pro_thinking - test_legacy_openai_base_url_normalized - test_finish_reason_swaps_to_tool_calls_when_function_call_emitted - test_thought_signature_round_trips_into_gemini_function_call - test_thought_signature_emitted_in_tool_call_delta - test_usage_chunk_includes_thoughts_tokens Verification: - Backend pytest 1518/1519 passing (one unrelated Qwen3.5 flash-attn test fails on main as well; nothing in this PR touches that path). - Frontend npx tsc -b clean. - Live e2e 16/16 against generativelanguage.googleapis.com through the patched _stream_gemini code path (all 11 chat models single + multi turn, all 3 image models returned image bytes, web_search and code_execution tools both emit the expected envelope). - Live /api/providers/models against the patched backend surfaces 16 ids (gemini-3-pro-preview correctly filtered via deny_exact).
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
for more information, see https://pre-commit.ci
Round-2 reviewer.py flagged a phantom web_search card on image
turns (12/12 reviewers), route-layer stripping of tool_calls /
tool_call_id / name, an over-narrow image-mode tool guard, and
silent safety blocks. This patch fixes all four.
Backend (external_provider.py):
- web_search_active is now derived from the outbound tools_array
(whether googleSearch was actually forwarded), not the raw
enabled_tools intent. Image-mode turns dropped the tool above so
the inbound stream no longer emits a phantom "search complete"
tool_start / tool_end on those turns.
- text_tools_allowed now uses is_image_model (covers both `-image`
/ `nano-banana` picker models AND text models that requested
`image_generation` via enabled_tools). Verified against the live
Gemini API which rejects both googleSearch and codeExecution
alongside responseModalities=["TEXT","IMAGE"] with explicit 400s
("Search as tool is not enabled for this model", "Code execution
is not enabled for this model").
- promptFeedback.blockReason is surfaced as a 400 content-filter
error chunk instead of returning an empty successful assistant
response. The streaming loop closes the response before exiting.
Route (routes/inference.py):
- _build_external_messages now propagates tool_calls (assistant),
tool_call_id, and name (tool result) through every code path
(string content, multimodal content, non-vision fallback). Without
this Gemini 3 function-call round trips lost their thoughtSignature
+ tool_call_id at the route boundary, and functionResponse.name
arrived empty on the second turn.
- Assistant messages with content=None and tool_calls populated are
preserved as a synthetic empty-string content turn so the
Gemini translator can rebuild the functionCall part.
Tests (test_gemini_provider.py): 42 -> 45, all green
- test_image_models_suppress_phantom_web_search_card
- test_image_generation_tool_drops_text_tools
- test_prompt_feedback_block_reason_surfaces_as_error
Verification:
- Backend pytest 1736 / 1736 (the two pre-existing unrelated fails
on main, test_help_output and Qwen3.5 flash-attn pin, are skipped).
- Frontend npx tsc -b clean.
- Live e2e 16/16 against generativelanguage.googleapis.com:
11 chat models single + multi turn, 3 image models returning
image bytes, web_search and code_execution both PASS.
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
for more information, see https://pre-commit.ci
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Round 3 review follow-ups: Backend (studio/backend/core/inference/external_provider.py): - Close response AND aiter_lines iterator in a finally so normal, prompt-block, and cancellation exits all clean up (eliminates the RuntimeWarning about aclose never being awaited). - Pair the synthetic web_search tool_start with a tool_end on the promptFeedback.blockReason path so the UI does not leave a stuck "searching..." spinner after the error toast. - Preserve native id and thoughtSignature on executableCode and codeExecutionResult tool events under google.native_part, and pair the tool_end on the code-exec id so multi-turn code-execution replays do not lose Gemini-required history. - Carry part-level thoughtSignature on text deltas via delta.extra_content.google.thought_signature and on inline image tool_end via google.thought_signature so Gemini 3 image editing and tool turns round-trip the signature on the next request. - Guess remote image_url MIME from the URL path so PNG / WebP / GIF inputs are not silently relabeled as JPEG. - Roll usageMetadata.toolUsePromptTokenCount into translated input tokens and surface thoughtsTokenCount as completion_tokens_details.reasoning_tokens in _build_usage_chunk. - Only normalize the Google-hosted /v1beta/openai legacy base URL; custom proxies whose paths happen to end in /openai are left untouched. - Forward ChatCompletionRequest.tools and tool_choice through stream_chat_completion into _stream_gemini, translating to tools[].functionDeclarations and toolConfig.functionCallingConfig. Frontend: - chat-adapter: when Gemini image-generation is enabled for the turn, also disable Search and Code so the request, builder, and active pills agree with what the backend actually sends (the backend already strips text tools when image_generation is in enabled_tools). - chat-adapter: consume OpenAI-shape delta.tool_calls chunks so Gemini function-call deltas without text surface as tool-call parts. - shared-composer: disable Search and Code pills while Gemini image mode is active so the UI matches the request. Tests (studio/backend/tests/test_gemini_provider.py): adds coverage for proxy base-url gating, remote image MIME inference, toolUsePromptTokenCount, reasoning_tokens propagation, prompt-block web_search tool_end pairing, native code-exec id/thoughtSignature metadata, inline image thoughtSignature, text-chunk extra_content, OpenAI tools/tool_choice translation, and image-model tool drop.
After a Gemini chat that ran code_execution / image_generation, switching the same thread to a local GGUF model used to forward the synthetic provider-side tool_calls (tagged with `args._server_tool` or carrying a Gemini `args.google.native_part` payload) and the message-level `extra_content` to llama-server. The receiving backend has no tool declaration for those names and no use for Gemini thoughtSignature metadata; in the worst case it can produce an orphan tool_call_id and a confused continuation. Add `_strip_provider_synthetic_tool_history()` and wire it through the two local message builders: - `_openai_messages_for_passthrough` (OAI-compat passthrough) - `_openai_messages_for_gguf_chat` (standard GGUF chat path) Real user-function `tool_calls` and their matching `role="tool"` replies survive unchanged; only synthetic provider-side cards and Gemini-only `extra_content` are stripped. If the synthetic call was the assistant turn's only payload, the now-empty turn is dropped too so llama-server does not reject the request. Adds 2 regression tests: - test_strip_provider_synthetic_tool_history_drops_synthetic_only - test_strip_provider_synthetic_tool_history_drops_empty_assistant 142 existing backend tests still pass.
for more information, see https://pre-commit.ci
For external Gemini image-tier models (gemini-2.5-flash-image, gemini-3.x-image-preview, etc.), the backend unconditionally strips code_execution and strips web_search on older image ids. Search is still allowed on Gemini 3.x Pro/Flash image models, which supportsBuiltinWebSearch already encodes per model. Before this commit the composer pill gates were: searchDisabled = !modelLoaded || !(supportsTools || supportsBuiltinWebSearch) codeDisabled = !modelLoaded || !(supportsTools || supportsBuiltinCodeExecution) || imageModeDisablesCode `supportsTools` here is a local-runtime fallback that becomes true when any tool-capable local model has been loaded in the session. With a local tool-capable runtime active, switching the chat to an external Gemini image-tier model used to leave Search/Code clickable, even though the backend will silently drop the tool on the wire. Detect "external provider is Gemini AND the model is image-tier" (via supportsBuiltinImageGeneration) and gate the two pills strictly on the provider's own builtin support in that case. Non-Gemini paths and non-image Gemini models keep the supportsTools fallback unchanged.
|
Round 22 review pass landed. Twelve parallel codex reviewers, five APPROVE and seven REQUEST_CHANGES, no security findings. Convergent P1s plus the stale-rebase issue are now addressed:
The Gemini 3 Pro All 142 existing Gemini-provider backend tests still pass; six new regression tests added to lock the new behaviour in. |
Round 22 added the gate for Gemini / Anthropic / OpenRouter / Kimi but
missed the OpenAI Responses translator. When a caller pinned a user
function via `tool_choice={"type":"function","function":{"name":...}}`
plus `enabled_tools=["web_search","code_execution","image_generation"]`,
the Responses body still attached `{"type":"web_search"}`,
`{"type":"shell"}`, and `{"type":"image_generation"}` server tools. The
function pin should suppress those for the same privacy + billing reason
the other provider paths now do.
Compute `_responses_tool_choice_forced_function` next to
`_responses_tool_choice_none` and gate each hosted-tool append on
`_responses_hosted_builtins_allowed = not none and not forced_function`.
The fix has to be applied in TWO places: the initial body builder and
`_build_body()` (called by the container-expiry retry path). User
function declarations still flow through so the pin has something to
target, and the Responses-shape `{type:"function", name:"..."}`
`tool_choice` is forwarded unchanged.
Adds regression test `test_openai_responses_forced_function_tool_choice_drops_hosted_tools`.
All 166 existing backend tests across Gemini + Responses + image-gen +
code-exec suites still pass.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: de046986ba
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if not isinstance(tool_calls, list) or not tool_calls: | ||
| sanitized_assistant.append(m) |
There was a problem hiding this comment.
Strip Gemini extra_content from assistant turns without tools
_strip_provider_synthetic_tool_history returns assistant messages unchanged when tool_calls is missing/empty, so Gemini-only extra_content (e.g., text thought signatures) is still forwarded on provider-switch paths. That leaks a non-standard message key into /v1/chat/completions passthrough/local GGUF requests, where strict OpenAI-compatible backends may reject the request or receive irrelevant provider metadata. This regression appears when a thread with Gemini assistant text turns is continued on a non-Gemini backend.
Useful? React with 👍 / 👎.
…+ custom-Gemini model list Three convergent P1s from round 24 review: 1. SSRF: the shared SSRF validator in `tools._validate_and_resolve_host` used a denylist (is_private / loopback / link_local / multicast / reserved / unspecified). Python classifies shared address space (100.64.0.0/10 carrier-grade NAT, plus 240.0.0.0/4, benchmarking ranges, etc.) with `is_private=False` AND `is_global=False`. The new Gemini server-side image fetcher therefore accepts URLs whose hostname resolves to 100.64.0.1 in cloud/VPC deployments. Add `not ip.is_global` as the primary gate -- a single source of truth that covers every current and future non-global range. 2. _strip_provider_synthetic_tool_history previously only stripped message-level `extra_content` when the assistant turn had tool_calls. A plain text Gemini reply carrying `extra_content.google.thought_signature` flowed through to llama-server when the thread was switched to a local GGUF backend. Always strip message-level `extra_content` on assistant turns. 3. routes/providers.list_provider_models applied Gemini's native `model_id_allowlist` regex to every Gemini provider, including custom OAI-compatible bases (LiteLLM, deployment gateways). IDs like `google/gemini-2.5-flash` and team-prefixed deployment aliases got filtered out even though the chat-dispatch path now routes them via the OpenAI-compatible client. Skip registry-level model-id filters when the configured Gemini base_url host is not the canonical `generativelanguage.googleapis.com`, mirroring the chat-dispatch gate. Three regression tests added: - test_validate_and_resolve_host_blocks_shared_address_space - test_strip_provider_synthetic_tool_history_drops_text_only_extra_content - test_gemini_custom_oai_compat_base_skips_native_allowlist
for more information, see https://pre-commit.ci
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
1 similar comment
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
…nto Gemini schema
Two convergent reviewer findings on the native Gemini path:
1. _stream_gemini's tool_calls replay loop falls through to a generic
functionCall emission whenever it sees an assistant tool_call. Marked
server-side builtin cards (web_search / web_fetch tagged with
_server_tool or args.google.native_part) hit that fallthrough with no
replayable native_part, which produces an outbound functionCall whose
name is not a declared user function. The Gemini turn 400s on the
undeclared name. Guard the loop to drop those entries instead, while
keeping the existing code_execution / image_generation native-part
replay branch intact.
2. _sanitize_gemini_schema uses a strict allowlist that drops local
$ref / $defs references. Pydantic-generated tool schemas hoist nested
object shapes into $defs and reference them via {"$ref": "#/$defs/X"},
so a property like address: {"$ref": "#/$defs/Address"} collapsed to
{} on the wire and the model lost the nested fields, types, and
required keys. Resolve local #/... pointers against the schema root
and inline the referenced subtree, with local siblings overriding
the reference (normal JSON Schema composition) and a seen-ref guard
for self-referential schemas.
Added regression coverage:
- test_gemini_native_skips_synthetic_server_builtin_replay
- test_function_declarations_inline_local_refs_into_gemini_schema
- test_function_declarations_inline_local_refs_in_anyof_and_items
- test_function_declarations_self_referential_schema_terminates
All 145 Gemini provider tests pass; touched provider regression set
(OpenAI Responses, code execution, image generation, Anthropic code
execution, Anthropic web_fetch) also 43/43 green.
for more information, see https://pre-commit.ci
…es synthetic-history strip Reviewer round 26 surfaced two convergent asymmetric-fix bugs. 1. _stream_gemini drops a synthetic server-tool tool_call (web_search / web_fetch tagged _server_tool) and also replays code_execution / image_generation tool_calls as Gemini-native executableCode / codeExecutionResult / inlineData parts. The matching role="tool" follow-up was still falling through to the generic functionResponse branch, producing either an orphan functionResponse (synthetic case) or a duplicate response pointing at a name with no functionDeclarations entry (native-part case). Both forms 400 the next Gemini turn. Track skipped + native-replayed tool_call_ids in _gemini_skip_tool_result_ids and short-circuit the role="tool" branch on a match. 2. The Anthropic-compatible local /v1/messages route only called _drop_empty_assistant_sentinels on the OpenAI-translated history, while the sibling /v1/chat/completions and GGUF passthrough builders chain that with _strip_provider_synthetic_tool_history. An Anthropic caller replaying a prior provider-side tool_use therefore forwarded fake builtin tool history straight into local llama-server. Apply the same strip on the Anthropic route after the anthropic_messages_to_openai conversion. Regression coverage added: - test_gemini_native_skips_orphan_function_response_for_dropped_builtin - test_gemini_native_skips_orphan_function_response_for_native_part_replay Gemini suite 147/147; touched provider regression set 43/43.
for more information, see https://pre-commit.ci
…dget for base64 Two convergent reviewer findings on the native Gemini path. 1. _stream_gemini's synthetic-builtin detector at lines 3519-3524 recognizes args.google.native_part as a server-tool marker, but _native_part was only loaded from tc.extra_content.google.native_part. A direct OpenAI-compatible API caller or imported third-party thread round-trips the payload through function.arguments because tool_calls[].extra_content is not in the OpenAI spec. The round-25 guard then saw a synthetic builtin with no _native_part and dropped the entire assistant turn, so the next native Gemini request lost the prior executableCode / inlineData / codeExecutionResult context. Fall back to args.google.native_part when extra_content path is missing, mirroring what the synthetic detector already accepts. 2. _GEMINI_REMOTE_IMAGE_MAX_TOTAL_BYTES capped DECODED bytes at 20MB. Gemini receives images base64-encoded inside JSON, and base64 inflates payload size by ~4/3. With 20MB decoded the actual JSON body is ~26.7MB plus prompt overhead, well over Gemini's ~20MB request limit. Drop the decoded cap to 14MB so realistic multi- image turns stay safely under 20MB encoded. Added regression test test_gemini_native_part_falls_back_to_args_google covering an OpenAI-compat-shaped image_generation tool_call whose native_part lives only in function.arguments. Gemini suite 148/148.
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Main moved forward 17 commits during PR review (latest: 953c8bf). Real conflicts in five files; resolved by combining both branches' changes. studio/backend/core/inference/external_provider.py - Add fast_mode (Anthropic Opus 4.6/4.7 speed flag, #5715) to stream_chat_completion and Anthropic-branch call site, alongside existing Gemini tools/tool_choice forwarding. - Add _openai_image_generation_tool() helper (action:"edit" for follow- up image edits, #5712) and use it inside the existing _responses_hosted_builtins_allowed gate so the forced-function / tool_choice="none" suppression added in rounds 21+ still applies. - Keep Anthropic web_fetch gated on _anthropic_hosted_builtins_allowed (round 19+ hosted-builtin gate) while taking main's per-model version selector (web_fetch_20260209 vs _20250910). studio/backend/routes/inference.py - Add `openai = provider_type == "openai"` (used by main's reasoning content forwarding for follow-up image edits). - Keep the round 25/26 Gemini filter chain (_filter_tool_calls drops synthetic server-builtin cards, marks tc_id so the matching role="tool" follow-up gets skipped, extra_content gated to native Gemini host). - Forward fast_mode alongside tools/tool_choice. studio/backend/tests/test_openai_image_generation.py - Combine assertions: both _server_tool: True (PR) and openai_image_generation_call_id (main) are present on the tool_start arguments. studio/frontend/src/features/chat/shared-composer.tsx - Add supportsBuiltinWebFetch declaration (separate Fetch pill from #5742) before the PR's isExternalGemini constant so both the Gemini image-tier gating and the standalone Anthropic Fetch pill compile. studio/frontend/src/features/chat/api/chat-adapter.ts - Add main's normalizeOpenAIReasoningItem, toOpenAIImageEditReferenceMessage, isAnthropicRefusalMessage helpers alongside PR's collectAssistantToolCalls, collectToolResultMessages, SerializedMessage, collectAssistantTextThoughtSignature. - toOpenAIMessages (PR) now also early-returns on isAnthropicRefusalMessage so refused turns get pruned from outbound history. - Add a thin toOpenAIMessage (singular) wrapper for the OpenAI image- edit replay path's flat .map() usage. - Merge per-turn enable flags: keep PR's imageGenerationEnabledForThisTurn, geminiImageModeForThisTurn, codeExecEnabledForThisTurn !geminiImageMode gate; take main's webFetchEnabledForThisTurn (sourced from independent webFetchToolsEnabled pill state). - Outbound build chains main's anthropic_refusal survivingMessages prune, then flatMap(toOpenAIMessages) (PR), then PR's selectedImageEditReference reference message prepend; image-edit unavailable toast from main fires before any of that when the pill is off. - tool_end merge: do main's nextArgs spread first, then PR's Gemini native_part parts concat so both OpenAI image-call ids and Gemini executableCode/codeExecutionResult/inlineData round-trip. - Cumulative + final yields: orderAssistantContent(pinTextThoughtSignature(...)) composes main's tool-vs-text ordering with PR's per-text thoughtSignature pin. Tests: gemini provider 148/148; openai_responses_translation + openai_code_execution + openai_image_generation + anthropic_code_execution + anthropic_web_fetch + external_provider_usage_chunk + providers_api: 50 passed, 42 skipped; main's new anthropic_fast_mode + citations + openai_citation_markers + openai_tool_result_fallbacks suites all 43/43.
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
…urn [] + cast image-edit ref Three errors in chat-adapter.ts surfaced by the frontend tsc step after merging main into feat/gemini-provider: 1. The Anthropic refusal early-return used main's but toOpenAIMessages returns SerializedMessage[]; flip to . 2. Restore -- the line was lost when removing main's conflict block from the function body. 3. selectedImageEditReference splice was inserting OpenAIChatMessage into a SerializedMessage[] array; the shapes differ on tool_calls.id nullability. Cast the reference message through unknown -- it carries no tool_calls, so the runtime payload is structurally compatible. Reproduced locally with `tsc -b --pretty false` (now passes). Build also failing in the in-repo `npm run build` step on PR CI; this commit unblocks all 12 failing UI/API workflows.
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Compress multi-line explanatory comments in the Gemini translator and the chat adapter without changing any behaviour. All 148 Gemini provider tests still pass; tsc --noEmit clean.
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
End-to-end Playwright verification of this PR against the live Gemini API (
Backend log proves every call routed through this PR's RecordingsMulti-turn text on Nano Banana image generation ( The decoded PNG round-tripped from the data URL: Web Search pill → hosted Code Execution pill → hosted Full 41-second walkthrough (all four flows back-to-back): API keys never appear on-screen in any frame: the Gemini key is injected into Studio via |
|
Follow-up — verified the Think pill is a per-family effort selector for Gemini (same UX as GPT/Claude), and that Reasoning + Search + Code compose in a single request. Per-model effort menu (live probe)
Menu screenshots:
Backend mapping (
Reasoning + Search + Code in a single request — PASSSingle chat on What landed in chat:
Backend log shows the three tools entered Google's API together on one streamed request: (The reasoning effort isn't logged because TL;DRYes — for Gemini models the Think pill is a clickable effort selector like the GPT/Claude pill, with the level menu correctly varying per family (image-tier ids hide it), and reasoning composes cleanly with the hosted Search and Code tools in a single streamed request. |
Conflicts came from unslothai#5720 (native Gemini provider). All resolved keeping both branches' functionality: - provider-capabilities.ts: gemini bucket now uses unslothai#5720's narrow capability shape (temperature/topP/topK/presencePenalty true) plus the 27 extended-sampler fields from this PR (all false on gemini since Google's API doesn't accept them). stop=true added so the new generationConfig.stopSequences forwarding lights up the UI. - chat-adapter.ts: kept all 27-field forwarding from this PR; used the tighter comments from main. - routes/inference.py: pass both this PR's sampling kwargs (frequency_penalty/seed/stop/service_tier/parallel_tool_calls) and main's tools/tool_choice through to stream_chat_completion. - external_provider.py: same. Every dispatcher (anthropic/openai/ gemini) now takes both branches' new args. Added stop forwarding to _stream_gemini as generationConfig.stopSequences (capped at 5 per native API docs); updated test_gemini_stop_sequences_capped_to_5 to assert the native shape instead of the OAI-compat shape. 256/256 backend tests pass (test_sampling_params_routing 65 + anthropic/openai/gemini integration suites 191); frontend type-check plus vite build clean.









Summary
Adds Google Gemini as a first-class chat provider in Studio.
_stream_geminiinstudio/backend/core/inference/external_provider.pythat converts OpenAI Chat Completions request/response to and from Gemini'scontents/partsREST shape onPOST /v1beta/models/{model}:streamGenerateContent?alt=sse.https://generativelanguage.googleapis.com/v1betawithx-goog-api-keyauthentication. Legacy/v1beta/openaiGemini providers saved by older Studio builds are auto-rewritten to the native path (only when the host isgenerativelanguage.googleapis.com; custom proxy paths ending in/openaiare left untouched).*-latestaliases, and the Nano Banana family (2.5-flash-image, 3.1-flash-image-preview = "Nano Banana 2", 3-pro-image-preview = "Nano Banana Pro").thinkingConfig.thinkingLevel(MINIMAL / LOW / MEDIUM / HIGH). Pro tier rejects MINIMAL, so "off" coerces to LOW; Flash tier coerces "off" to MINIMAL (Gemini 3 cannot fully disable thinking).thinkingBudgetladder. 2.5 Pro coerces 0 to a small positive budget because the API 400s with "only works in thinking mode". 2.5 Flash-Lite uses 512 as its lowest positive budget per the API floor.tools: [{googleSearch: {}}]for web search, with grounding citations surfaced through the same tool_start / tool_end + sources envelope OpenAI / Anthropic use.tools: [{codeExecution: {}}]for sandboxed Python;executableCodeandcodeExecutionResultparts emit as code_execution tool events with the native id andthoughtSignaturestowed ongoogle.native_partfor follow-up replay.tools: [{functionDeclarations: [...]}]for caller-supplied OpenAI-shapetools+tool_choicetranslated totoolConfig.functionCallingConfig.mode/allowedFunctionNames.Search as tool is not enabled for this model).image_urlparts on user messages map toinlineData(base64) orfileDatawith MIME guessed from the URL path so remote PNG / WebP / GIF inputs are not relabeled as JPEG.extra_content.google.thought_signatureso future multi-turn editing can echo the signature back per the Gemini contract.image_b64/image_mimetool_end envelope so the chat UI renders them inline with no extra plumbing.functionCallparts translate to OpenAI-shapedelta.tool_calls, distinct indices per call so parallel turns reassemble cleanly.finishReason="STOP"is rewritten totool_callswhen any function call was emitted so OAI clients trigger tool execution. The chat-adapter consumes these tool_calls deltas so user-supplied function calls render even when no text is emitted.usageMetadatatranslation:promptTokenCount+toolUsePromptTokenCount-> input tokens;candidatesTokenCount+thoughtsTokenCount-> output tokens;thoughtsTokenCountsurfaces ascompletion_tokens_details.reasoning_tokens;cachedContentTokenCount-> cached input detail.enable_prompt_cachingis a string, it forwards ascachedContentso callers that create CachedContent resources out of band can reference them by name. Boolean is silently ignored on Gemini (no implicit cache create).promptFeedback.blockReason) surface as a content_filter error event. If a synthetic web_search tool_start was already emitted, a matching tool_end is paired before the error so the UI does not leave a "searching..." spinner stuck on screen.try / finallythat always closes both the response and the manualaiter_lines()iterator on normal, prompt-block, and cancellation exits (no moreRuntimeWarning: coroutine method 'aclose' of 'Response.aiter_lines' was never awaited).Test plan
PYTHONPATH=studio/backend python -m pytest studio/backend/tests/test_gemini_provider.py -q(62 tests, all green)cd studio/frontend && npx tsc -b --pretty falsecleanhttps://generativelanguage.googleapis.com/v1betacovering single-turn, multi-turn, googleSearch grounding, codeExecution, image generation, function calling, thinkingLevel (Gemini 3) and thinkingBudget (Gemini 2.5) variantsgroundingMetadatacitations: follow-upReferences