Studio: re-introduce multi-format tool calling with parser bug fixes by danielhanchen · Pull Request #5811 · unslothai/unsloth

danielhanchen · 2026-05-27T13:16:06Z

What this PR does

Resubmits the work from #5615 (reverted in #5619) with the four parser / route bug fixes developed on #5620 folded in. Same 4-file scope as the original #5615 (tool_call_parser.py, routes/inference.py, two test files). The healing-parity package in #5620 -- the GGUF canonical heal key in llama_cpp.py and the safetensors_agentic re-prompt loop -- is deliberately left to #5620 so this redo stays scoped to the same files #5615 touched.

Why a redo and not just merging #5620

#5620 is open as DRAFT, holding while real-model end-to-end validation finishes. Splitting the parser bug fixes out into this PR lets the four documented #5615 regressions land sooner without waiting on the healing-parity validation that #5620 is gated on. Once both pieces are in, the studio tool-call behaviour matches what llama-server already normalises for GGUF.

Bug fixes folded in from #5620

The four parser bugs that prompted the #5615 revert are all fixed here:

#	Where	Bug	Fix
1	`tool_call_parser.py` Mistral closed-pair patterns	`\[TOOL_CALLS\]...\{.*?\}` is non-greedy on `}`, truncating nested JSON (`[TOOL_CALLS]search{"filters":{"date":"2024"},"q":"foo"}` leaked `,"q":"foo"}`)	`_strip_mistral_closed_calls` + `_balanced_brace_end` / `_balanced_bracket_end` helpers that ignore braces inside JSON strings
2	`routes/inference.py` `_TOOL_XML_RE`	`<\|python_tag\|>[^\n<]*` stops at literal `<` (`<\|python_tag\|>python.call(code="if x < 10: pass")` sliced to `< 10: pass")`)	`<\|python_tag\|>(?:[^<]\|<(?!\|))*` -- consumes any character that is not a Llama-3 `<\|sentinel\|>` start, so literal `<`, newlines, and embedded JSON stay inside the strip
3	`tool_call_parser.py` Llama-3 sentinel loop	Fixed-order `for sentinel in (...)` silently dropped calls when the stream had `<\|eot_id\|><\|begin_of_text\|>{json}` because `begin_of_text` was tested before `eot_id` consumed its prefix	`while True / matched` loop -- order of sentinels in the stream no longer matters
4	`tool_call_parser.py` Llama-3 KV decoder	`bytes(s, "utf-8").decode("unicode_escape")` mangles non-ASCII (`"café日本"` -> `'cafÃ©æ\x97¥æ\x9c¬'`)	`json.loads('"' + value + '"')` -- handles `\n` / `\t` / `\uXXXX` escapes correctly while preserving literal UTF-8 bytes

_TOOL_XML_RE keeps the orphan-handling clauses that #5735 added for the speculative-buffer leak shapes (closed pair OR orphan-open to EOF, bare orphan close, tail-only </parameter>). The new _strip_tool_xml(text) helper composes _TOOL_XML_RE with _strip_mistral_closed_calls at the 8 route-layer call sites so the Mistral nested-JSON shape gets balanced-brace handling everywhere.

Out of scope -- still belongs to #5620

These five concerns are NOT included here and stay reserved for #5620:

GGUF canonical heal key in llama_cpp.py ({"query": raw_args} -> per-tool dict lookup) -- bug 5 from the studio: tool calling for Llama-3, Mistral, Gemma 4 on safetensors + MLX #5615 post-mortem.
safetensors_agentic.py re-prompt loop (_MAX_REPROMPTS = 3) -- nudges the model when it emits intent without calling a tool.
TestLoopRePrompt (6 tests) -- exercises the re-prompt loop.
TestLoopCanonicalHealKey (3 tests) -- exercises the canonical heal key under the loop.
TestGGUFSafetensorsHealingParity (5 tests) -- pins shared constants between GGUF and safetensors paths.

Capability gating

_detect_safetensors_features allows templates whose tool-call format is any of the seven supported markers. The gate still suppresses supports_tools for templates that advertise tools but use a shape the parser cannot honour, so the UI never enables a pill the loop will not return.

Tests

pytest studio/backend/tests/test_safetensors_tool_loop.py studio/backend/tests/test_safetensors_capability_advertise.py -q -> 104 passed.

That number is exactly #5620's 118 passing - 14 dropped tests (TestLoopRePrompt 6 + TestLoopCanonicalHealKey 3 + TestGGUFSafetensorsHealingParity 5).

Coverage matrix:

TestParser / TestParserMultiFormat -- parser shape coverage for all seven formats (Qwen/Hermes, Llama-3 python_tag and bare JSON, Mistral pre-v11 / v11+ / Ministral, Gemma 4). These tests are what catch regressions of bugs 1, 3, 4 -- bugs 1 and 3 surface as wrong parse output on representative inputs, bug 4 surfaces as garbled non-ASCII argument values.
TestRoutesPythonTagStrip (8 tests) -- pins bug 2's regex on single-line, less-than-in-code, multi-line, multi-line-with-less-than, EOM sentinel stop, EOT sentinel stop, JSON-form multi-line, and "EOM then trailing python_tag" cases.
TestLoopBasic / TestLoopBehaviour / TestLoopControl / TestStatusFormatting / TestProseMentioningToolCall / TestChatTemplateHelper / TestGuardrails / TestGptOssNameDetection -- existing safetensors loop coverage.
test_safetensors_capability_advertise.py -- capability gate keeps tools enabled for Llama-3 / Mistral / Gemma 4 / Llama-3.2 bare-JSON templates while still suppressing tools for unknown emission formats.

Draft status

Open as draft to mirror #5620's caution. Flip to ready after the CI matrix is green and someone validates with real models (studio-backend-ci, studio-api-smoke, studio-inference-smoke, the mac / windows variants, mlx-ci, consolidated-tests-ci, lint-ci). This is the same validation gate that #5620 is currently holding on.

Refs: #5615 (reverted) , #5619 (the revert) , #5620 (open draft, source of the bug fixes folded in here).

Resubmits the work from #5615 (reverted in #5619) with the parser / route bug fixes that were subsequently developed on #5620 folded in. The healing-parity package in #5620 -- the GGUF canonical heal key in `llama_cpp.py` and the safetensors_agentic re-prompt loop -- is deliberately left to #5620 so this redo stays scoped to the same 4 files that #5615 originally touched. Adds multi-format tool-call parsing for the safetensors / MLX agentic loop so Llama-3, Llama-3.2 bare JSON, Mistral pre-v11 / v11+ / Ministral, and Gemma 4 tool emissions are normalised to OpenAI shape instead of leaking as prose, plus a route-layer strip that removes the same shapes from streamed and non-streamed completions. Formats: Qwen / Hermes <tool_call>{json}</tool_call> Qwen3.5 / Hermes <function=name><parameter=k>v</parameter></function> Llama-3 built-in <|python_tag|>NAME.call(k="v", ...) Llama-3 custom <|python_tag|>{"name":..., "parameters":...} Llama-3.2 bare {"name":..., "parameters":...} (no marker) Mistral pre-v11 [TOOL_CALLS] [{"name":..., "arguments":...}, ...] Mistral v11+ [TOOL_CALLS]name{json} (may chain) Ministral / Large 3 [TOOL_CALLS]name[ARGS]{json} Gemma 4 <|tool_call>call:NAME{k:<|"|>v<|"|>}<tool_call|> The four parser bugs that motivated the revert are fixed here: 1. Mistral nested-JSON truncation. The closed-pair Mistral regex `\[TOOL_CALLS\]...\{.*?\}` was non-greedy on `}`, so `[TOOL_CALLS]search{"filters":{"date":"2024"},"q":"foo"}` was stripped only up to the inner `}`, leaking `,"q":"foo"}` to the user. Replaced with `_strip_mistral_closed_calls` + the balanced-brace / balanced-bracket helpers that ignore braces inside JSON strings. 2. `<|python_tag|>` stop-on-`<`. The route-layer strip clause `<\|python_tag\|>[^\n<]*` stopped at any literal `<`, so `<|python_tag|>python.call(code="if x < 10: pass")` was sliced to `< 10: pass")`. Replaced with `<\|python_tag\|>(?:[^<]|<(?!\|))*` so the strip consumes any character that is not a Llama-3 `<|sentinel|>` start -- literal `<`, newlines, and embedded JSON all stay inside. 3. Llama-3 sentinel single-pass loop. The fixed-order `for sentinel in (...)` loop in the bare-JSON parser silently dropped calls when the stream contained `<|eot_id|><|begin_of_text|>{json}` because `begin_of_text` was tested before `eot_id` consumed its prefix. Replaced with a `while True / matched` loop so the order of sentinels in the stream no longer matters. 4. UTF-8 corruption in Llama-3 KV decoder. `bytes(s, "utf-8").decode("unicode_escape")` mangles non-ASCII bytes (`"café日本"` -> `'cafÃ©æ\x97¥æ\x9c¬'`). Replaced with `json.loads('"' + value + '"')` which handles `\n` / `\t` / `\uXXXX` escapes correctly while preserving literal UTF-8 bytes (emoji, CJK, etc.). `_TOOL_XML_RE` keeps the orphan-handling clauses that #5735 added for the speculative buffer leak shapes (closed pair OR orphan-open to EOF, bare orphan close, tail-only `</parameter>`) so the route layer continues to strip in-flight tool markup as well as the multi-format closed pairs. The new `_strip_tool_xml(text)` helper composes `_TOOL_XML_RE` with `_strip_mistral_closed_calls` so the Mistral nested-JSON shape gets balanced-brace handling at every call site (8 sites updated). Capability gating in `_detect_safetensors_features` now allows templates whose tool-call format is any of the seven supported markers; the gate still suppresses `supports_tools` for templates that advertise tools but use a shape the parser cannot honour, so the UI never enables a pill the loop will not return. Tests in scope: - tests/test_safetensors_tool_loop.py: full multi-format parser coverage (Qwen/Hermes, Llama-3 python_tag and bare JSON, Mistral all variants, Gemma 4), plus `TestRoutesPythonTagStrip` (8 tests) pinning the multi-line / less-than-in-code / sentinel-stop behaviour of bug 2's regex. - tests/test_safetensors_capability_advertise.py: capability gate keeps tools enabled for Llama-3 / Mistral / Gemma 4 / Llama-3.2 bare-JSON templates while still suppressing tools for unknown emission formats. Tests deliberately out of scope (they belong to #5620 because they exercise `llama_cpp.py` / `safetensors_agentic.py`): - TestLoopRePrompt (6) -- safetensors_agentic re-prompt loop. - TestLoopCanonicalHealKey (3) -- canonical heal key under loop. - TestGGUFSafetensorsHealingParity (5) -- GGUF / safetensors parity assertions on shared constants and `_MAX_REPROMPTS`. `pytest studio/backend/tests/test_safetensors_tool_loop.py studio/backend/tests/test_safetensors_capability_advertise.py -q` -> 104 passed.

for more information, see https://pre-commit.ci

gemini-code-assist

Code Review

This pull request expands the backend-neutral tool-call parser to support a wide range of model emission formats, including Llama-3, Llama-3.2 bare JSON, Mistral, and Gemma 4. It also updates the safetensors feature detection and streaming response stripping to handle these new formats, backed by comprehensive test coverage. A critical infinite loop vulnerability was identified in the Gemma 4 parser (_gemma_parse_value) when handling malformed list inputs, which could lead to a Denial of Service. A fix was suggested to ensure the index always progresses.

gemini-code-assist · 2026-05-27T13:18:12Z

+    end = i
+    while (
+        end < len(text)
+        and text[end] not in ",}]"
+        and not text.startswith(_GEMMA_STR_BEGIN, end)
+    ):
+        end += 1
+    raw = text[i:end].strip()


Infinite Loop Vulnerability in Gemma 4 Parser

There is a critical infinite loop vulnerability in _gemma_parse_value when parsing malformed or unexpected input inside list/array brackets.

If the input contains an unexpected closing character (like } or ]) at the start of a primitive value (e.g., [1} or [1, ]), the while loop condition text[end] not in ",}]" is immediately False. As a result, end remains equal to i, and _gemma_parse_value returns ("", i) without progressing the index.

When called inside the list parsing loop (while k < len(body):), this causes k to never increment, leading to an infinite loop that hangs the backend process at 100% CPU usage.

To prevent this, we must guarantee that _gemma_parse_value always consumes at least one character and progresses the index when i < len(text).

Suggested change

end = i

while (

end < len(text)

and text[end] not in ",}]"

and not text.startswith(_GEMMA_STR_BEGIN, end)

):

end += 1

raw = text[i:end].strip()

end = i

while (

end < len(text)

and text[end] not in ",}]"

and not text.startswith(_GEMMA_STR_BEGIN, end)

):

end += 1

if end == i and i < len(text):

end = i + 1

raw = text[i:end].strip()

Three label corrections surfaced while cross-checking the parser against official HF chat templates. Comment-only, no behaviour change; 104/104 tests still pass. 1. ``<function=...>`` XML was labelled "Qwen3.5 xml". The canonical emitter is Qwen3-Coder and it always wraps the <function=...> block inside an outer <tool_call>...</tool_call>, never bare. Updated labels in the module docstring and the three inline comments. 2. Mistral grouping was wrong on two counts: Ministral-8B-2410 uses Tekken V3 and emits the ``[TOOL_CALLS] [...]`` array form, not ``[ARGS]``. Mistral-Large-2411 ships tokenizer.model.v7 (no ``[ARGS]`` token) and also emits the array form. ``[ARGS]`` only enters with Tekken V13 (Devstral, Magistral-Small-2509). Regrouped the doc lines accordingly. 3. Gemma 4 ``<|tool_call>`` is forward-looking. Neither gemma-3-12b-it nor gemma-3n-E4B-it emit this shape in their chat templates today. Noted that the capability gate correctly suppresses the tools pill on real Gemma 3 templates, so the parser stays as ready-when-Google-ships scaffolding.

danielhanchen · 2026-05-27T14:16:49Z

Cross-checked this PR against the canonical tool-call parsers in llama.cpp (common/chat-parser.cpp + the legacy pre-PEG branch at 34df42f7be), vLLM (vllm/entrypoints/openai/tool_parsers/), SGLang (python/sglang/srt/function_call/), and the official chat templates of the families it claims to support on Hugging Face. Then ran an adversarial simulation suite head-to-head against pr-5615 to confirm the 4 documented regressions are gone.

Bug fixes confirmed independently

Bug	Confirmed by
1. Mistral nested-JSON truncation (`_strip_mistral_closed_calls` + `_balanced_brace_end`)	llama.cpp PEG balanced-scan parity, SGLang `mistral_detector` parity, simulation regression (truncates on pr-5615, full args preserved on this branch)
2. `<\|python_tag\|>` strip stop-on-`<` (`(?:[^<]\|<(?!\|))*`)	llama.cpp upstream handling, vLLM never had this bug because it doesn't pre-strip, simulation regression (route-layer leaked `< 10: pass\")` on pr-5615, fully stripped here)
3. Llama-3 sentinel single-pass loop (`while True / matched`)	llama.cpp, SGLang `llama32_detector` (PR is strictly stronger here, SGLang only checks `text.startswith(\"{\")`), simulation (both orderings + triple-sentinel pile work on this branch; pr-5615 drops them)
4. UTF-8 KV decoder (`json.loads('\"' + s + '\"')`)	llama.cpp uses proper JSON decoding, SGLang same, all 4 upstream sources confirm `unicode_escape` was the wrong tool. Simulation shows `café日本🎉` preserved here, mangled to `cafÃ©æ\x97¥æ\x9c¬ð\x9f\x8e\x89` on pr-5615

Custom regression suite (56 assertions across bug-fix + adversarial + capability + streaming cases): all pass on this branch, 5/8 fail on pr-5615 exactly where they should.

Parity per family

Family	llama.cpp	vLLM	SGLang	HF template
Qwen / Hermes `<tool_call>`	byte-perfect	parity	parity	parity (Qwen2.5, Qwen3, Hermes-3, Hermes-4)
Qwen3-Coder XML `<function=...>`	byte-perfect	parity	parity	parity (always nested in outer `<tool_call>`)
Llama-3 `<\|python_tag\|>` (builtin + custom)	byte-perfect	parity	this PR stronger (handles built-in `NAME.call(...)` form vLLM and SGLang miss)	parity (Llama-3.1, Llama-3.3)
Llama-3.2 bare JSON	byte-perfect	this PR stricter (false-positive-safe via strict guard)	this PR stronger (sentinel loop)	parity (Llama-3.2)
Mistral V3 array / V11+ / V13 `[ARGS]`	byte-perfect	parity	parity	parity
Gemma 4 `<\|tool_call>...<tool_call\|>`	byte-perfect on marker	this PR preserves native types (vLLM force-casts to string)	parity	forward-looking, no shipping Gemma 3 model emits this shape yet, but capability gate correctly suppresses the tools pill on real Gemma 3

Docstring corrections pushed in `518d0a5570`

Three label fixes surfaced by the HF-template cross-check (comment-only, no code change, 104/104 still pass):

<function=name> XML was labelled Qwen3.5 xml. Canonical emitter is Qwen3-Coder and it always wraps in <tool_call>...</tool_call> (never bare). Relabelled.
Mistral grouping: Ministral-8B-Instruct-2410 was incorrectly grouped with [TOOL_CALLS]name[ARGS]{json}. It ships Tekken V3 and emits the array form. Same for Mistral-Large-Instruct-2411 (ships tokenizer.model.v7, no [ARGS] token). [ARGS] only enters with Tekken V13 (Devstral, Magistral-Small-2509). Regrouped.
Gemma 4 docstring now notes the parser is forward-looking scaffolding; no shipping Gemma 3 model uses this template today.

Architectural notes for reviewers

_parse_function_xml returns string-typed args. vLLM and SGLang's Qwen-coder parsers schema-coerce to ints / bools when the tool schema is known. Conscious trade-off because Studio's parser doesn't carry schema at parse time. Flag if any downstream consumer needs typed args.
Streaming partials are handled at the route layer via _MAX_BUFFER_CHARS=32 + the TOOL_XML_SIGNALS set (longest signal is 14 chars, well under cap). vLLM and SGLang use per-parser partial JSON state. Architectural fit for Studio's buffer-then-parse route is fine.
vLLM serialises args with ensure_ascii=False; this PR uses default (ensure_ascii=True). Doesn't affect parsing, only the string form of arguments going to the OpenAI client. Worth syncing in a follow-up if any client cares.

Follow-up parsers flagged by multiple upstream sources (deliberately out of scope for this PR)

Format	llama.cpp	vLLM	SGLang	Notes
Pythonic / list-of-calls (Llama-3.2 / Llama-4 `[fn(k=\"v\")]`)	yes	yes	yes	Flagged 3x. Highest-value follow-up.
DeepSeek-V3 / V3.1 `<｜tool▁calls▁begin｜>...<｜tool▁calls▁end｜>`	yes	yes	yes	Flagged 3x. Next priority.
Granite `<\|tool_call\|>[...]`	yes	yes	partial	Lower priority.
Phi-4-mini `functools[...]`, GLM-4 bare-name + JSON, Kimi K2, MiniMax-M2, Cohere Command-R	partial	yes	partial	Lower priority.

These are all gaps the original #5615 also did not cover, so leaving them out keeps this PR scoped to the same surface area #5615 already shipped.

Test gate

pytest studio/backend/tests/test_safetensors_tool_loop.py studio/backend/tests/test_safetensors_capability_advertise.py -q is green at 104 passed. That number is exactly #5620's 118 passing minus the 14 dropped parity tests (TestLoopRePrompt 6 + TestLoopCanonicalHealKey 3 + TestGGUFSafetensorsHealingParity 5), all of which still belong to #5620.

Ready to flip to ready-for-review once the Studio CI matrix is green and someone validates with real models, same gate #5620 holds on.

A fuzz pass turned up that ``_parse_llama3_bare_json`` accepted ``parameters`` as a string, contradicting the docstring's "parameters or arguments is a dict" guard. Prose like ``{"name":"foo","parameters":"a sentence"}`` would wrongly fire the parser, which the agentic loop would then heal into a real ``foo(query="a sentence")`` call. Tightened guard: - ``parameters`` must be a dict (Llama-3 spec). - ``arguments`` may be a dict, or a JSON-encoded string that decodes to a dict (OpenAI shape, e.g. ``"arguments":"{\"q\":\"x\"}"``). Plain non-JSON strings, JSON-strings of lists / scalars / null no longer pass. Added 4 regression tests under TestParserMultiFormat: - test_llama3_2_bare_json_string_parameters_does_not_fire - test_llama3_2_bare_json_string_arguments_not_json_does_not_fire - test_llama3_2_bare_json_string_arguments_json_dict_fires - test_llama3_2_bare_json_string_arguments_json_non_dict_does_not_fire Existing tests stay green (104 -> 108 passing) and a 50-case cross-version fuzz suite passes on Python 3.10 / 3.11 / 3.12 / 3.13.

danielhanchen · 2026-05-27T14:28:36Z

Cross-version fuzz pass on Python 3.10 / 3.11 / 3.12 / 3.13 in clean uv venvs. Pure-stdlib parser (json, re, typing), no platform branches, so the result is the same shape on Linux / macOS / Windows. The PR is server-side only (no browser code touched).

Fuzz suite (50 assertions across 4 venvs)

Loads tool_call_parser.py by file path so it runs with zero backend deps. Categories:

T1 bug-fix regressions (the 4 documented bugs)
T2 false-positive guards on bare JSON
T3 Unicode (NFD vs NFC, RTL, ZWJ emoji, surrogates, BOM, CRLF)
T4 catastrophic-backtracking probes (< runs, deep nesting, repeated triggers, unclosed tails -- all bounded under 1s)
T5 truncation at every byte position for 9 fixture shapes
T6 thread-safety (1000 parses across 16 threads, distinct expected names)
T7 strip idempotency (strip(strip(x)) == strip(x))
T8 1MB prose + 1MB-with-tail-call stress, under 2s
T9 multi-call chaining for every family
T10 mixed-family pollution (e.g. [TOOL_CALLS] literal inside Qwen arguments)
1000-trial random fuzz + 500-trial marker-injection fuzz, no crashes, no hangs

Result on every venv:

venv_3.10: 50 passed in 0.19s
venv_3.11: 50 passed in 0.15s
venv_3.12: 50 passed in 0.16s
venv_3.13: 50 passed in 0.16s

One real finding, fixed in `615b8608`

_parse_llama3_bare_json accepted parameters as a string (e.g. {\"name\":\"foo\",\"parameters\":\"a sentence\"}), contradicting the docstring's strict guard (parameters or arguments is a dict). Prose JSON would wrongly trigger; the agentic loop's heal step would then turn it into foo(query=\"a sentence\").

Tightened guard:

parameters must be a dict (Llama-3 spec).
arguments may be a dict, or a JSON-encoded string that decodes to a dict (OpenAI shape, e.g. \"arguments\":\"{\\\"q\\\":\\\"x\\\"}\"). Plain non-JSON strings, and JSON-strings of lists / scalars / null no longer pass.

Added four regression tests under TestParserMultiFormat:

test_llama3_2_bare_json_string_parameters_does_not_fire
test_llama3_2_bare_json_string_arguments_not_json_does_not_fire
test_llama3_2_bare_json_string_arguments_json_dict_fires
test_llama3_2_bare_json_string_arguments_json_non_dict_does_not_fire

PR suite is 108 passing (was 104, +4 new).

Why merging this PR does not break old paths

Pure stdlib (json, re, typing); no hardware path, no UI / browser code, no platform branch.
Pre-existing Qwen / Hermes parsers are unchanged, just dispatched first so they still claim those inputs.
The route-layer _TOOL_XML_RE kept every clause Studio: strip orphan tool_call XML leaking into visible content #5735 added for orphan-tool-call handling; the new shapes are added as new alternatives, not by changing the old ones.
Module-level state is regex / string / tuple constants only; safe to share across threads (T6 confirms).
The capability gate still suppresses supports_tools for templates whose format the parser cannot honour, so a model running an unrecognised template never gets a tools pill it cannot fulfil.

Reports + fuzz tests landed at:

temp/sim/fuzz_parser.py (50 assertions, no backend deps)
temp/sim/venv_3.10..3.13/ (clean uv venvs)
async_task_outputs/verify_5811_{llamacpp,vllm,sglang,hf_templates,simulation}.md

A fuzz pass on PR unslothai#5811 turned up that ``_parse_llama3_bare_json`` accepted ``parameters`` as a string, contradicting the docstring's "parameters or arguments is a dict" guard. Prose JSON like ``{"name":"foo","parameters":"a sentence"}`` would wrongly fire the parser, which the agentic loop would then heal into a real ``foo(query="a sentence")`` call. Same code lives on this branch, so the same fix applies here. Tightened guard: - ``parameters`` must be a dict (Llama-3 spec). - ``arguments`` may be a dict, or a JSON-encoded string that decodes to a dict (OpenAI shape, e.g. ``"arguments":"{\"q\":\"x\"}"``). Plain non-JSON strings or JSON-strings of lists / scalars / null no longer pass. Mirrors the fix landed in PR unslothai#5811 commit 615b860. Adds the same 4 regression tests under TestParserMultiFormat. Existing test suite stays green: 127 -> 131 passing.

danielhanchen and others added 2 commits May 27, 2026 13:15

[pre-commit.ci] auto fixes from pre-commit.com hooks

4d69577

for more information, see https://pre-commit.ci

gemini-code-assist Bot reviewed May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Studio: re-introduce multi-format tool calling with parser bug fixes#5811

Studio: re-introduce multi-format tool calling with parser bug fixes#5811
danielhanchen wants to merge 4 commits into
mainfrom
studio-tools-multi-format-v3

danielhanchen commented May 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 27, 2026

Uh oh!

danielhanchen commented May 27, 2026

Uh oh!

danielhanchen commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

danielhanchen commented May 27, 2026

What this PR does

Why a redo and not just merging #5620

Bug fixes folded in from #5620

Out of scope -- still belongs to #5620

Capability gating

Tests

Draft status

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 27, 2026

Choose a reason for hiding this comment

Infinite Loop Vulnerability in Gemma 4 Parser

Uh oh!

danielhanchen commented May 27, 2026

Bug fixes confirmed independently

Parity per family

Docstring corrections pushed in 518d0a5570

Architectural notes for reviewers

Follow-up parsers flagged by multiple upstream sources (deliberately out of scope for this PR)

Test gate

Uh oh!

danielhanchen commented May 27, 2026

Fuzz suite (50 assertions across 4 venvs)

One real finding, fixed in 615b8608

Why merging this PR does not break old paths

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Docstring corrections pushed in `518d0a5570`

One real finding, fixed in `615b8608`