studio: tool calling for Llama-3, Mistral, Gemma 4 on safetensors + MLX by danielhanchen · Pull Request #5615 · unslothai/unsloth

danielhanchen · 2026-05-19T14:01:18Z

See PR description below — full summary written inline.

The shared tool_call_parser used by safetensors and MLX now recognises the canonical emission shapes for the most popular families so the agentic loop sees the same call shape llama-server normalises for GGUF. Patched against llama.cpp's per-family parsers (common/chat- parser.cpp, legacy pre-PEG branch at 34df42f7be), vLLM's tool_parsers/, and SGLang's function_call/ modules. Formats covered: Qwen / Hermes <tool_call>{json}</tool_call> Qwen3.5 / Hermes <function=name><parameter=k>v</parameter></function> Llama-3 built-in <|python_tag|>NAME.call(k="v", ...) Llama-3 custom <|python_tag|>{"name":..., "parameters":...} Llama-3.2 bare {"name":..., "parameters":...} (no marker) Mistral pre-v11 [TOOL_CALLS] [{"name":..., "arguments":...}, ...] Mistral v11+ [TOOL_CALLS]name{json} (may chain) Ministral / Large 3 [TOOL_CALLS]name[ARGS]{json} Gemma 4 <|tool_call>call:NAME{k:<|"|>v<|"|>}<tool_call|> All parsers normalise to OpenAI shape ``{id, type:"function", function:{name, arguments(json_string)}}``. Truncated emissions (unclosed brackets, missing close tags) are tolerated -- balanced-brace walkers fall back to per-object healing so a mid-stream cut does not lose the call. Llama-3.2 bare-JSON parser is strict: it only fires when stripped content starts with ``{`` and the parsed object has ``name`` (str) plus a dict in ``parameters`` or ``arguments``. Plain assistant prose, tool-message echoes, and JSON missing those keys all leave it dormant. routes/inference._detect_safetensors_features now allows templates whose tool-call format is any of the seven supported markers; the gate still suppresses ``supports_tools`` for templates that advertise tools but use a shape the parser cannot honour, so the UI never enables a pill the loop will not return. Streaming buffer wakes up on five markers (was two) so the safetensors / MLX state machine drains tool calls instead of leaking them as prose: TOOL_XML_SIGNALS = ( "<tool_call>", "<function=", "<|python_tag|>", "[TOOL_CALLS]", "<|tool_call>", ) The route-layer markup-strip regex ``_TOOL_XML_RE`` is extended to match every closed-pair shape, including Mistral v11+ ``name{json}`` and Llama-3 ``<|python_tag|>...\n`` so leaked markup is removed from SSE / non-streaming completions across all five families. Tests: 37 new unit tests covering each emission shape (parser + streaming buffer + strip_tool_markup + agentic loop), 11 bare-JSON edge cases guarding against false positives, and 4 new capability advertise tests pinning the gate to recognise Llama-3 / Mistral / Gemma 4 / Llama-3.2 bare-JSON templates as supports_tools=True while still suppressing tools for unknown emission formats. The previous suppression tests (Llama-3 template suppresses tools, Mistral template suppresses tools) are inverted to assert the new gate keeps tools enabled for those families -- the loop now supports them end to end. Cross-OS validation (ubuntu / macos-14 / windows) lives on the staging fork: danielhanchen/unsloth-staging-2 #126, which exercises the multi-format parser against 9 representative fixtures plus the existing macos-14 MLX Qwen3.5-0.8B cartesian probe.

for more information, see https://pre-commit.ci

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c5c707fbe7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-19T14:03:04Z

+TOOL_XML_SIGNALS = (
+    "<tool_call>",
+    "<function=",
+    "<|python_tag|>",
+    "[TOOL_CALLS]",
+    "<|tool_call>",
+)


Detect bare Llama JSON before streaming it

When a Llama-3.2 custom-tools template emits the newly supported bare {"name":..., "parameters":...} shape, there is no marker from this tuple in the output. In the safetensors loop, BUFFERING therefore treats the first { as ordinary text and switches to STREAMING, and the end-of-stream safety path only calls parse_tool_calls_from_text when has_tool_signal(content_accum) is true, so the bare JSON parser added below is never reached; the raw tool call is shown as the assistant answer and no tool executes. Please add a streaming/final detection path for the bare JSON shape or run the final parser even without one of these signals.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request significantly expands the tool-call parsing capabilities to support a wide range of model emission formats, including Llama-3, Llama-3.2, Mistral (pre-v11 and v11+), and Gemma 4. It updates the inference routes to correctly detect these capabilities in chat templates and provides regex-based stripping for the new markers. Feedback focuses on improving the robustness of regex patterns for Mistral and Llama-3 to handle nested structures and multi-line content, as well as correcting a potential character corruption issue when decoding escape sequences in Llama-3 tool calls.

gemini-code-assist · 2026-05-19T14:05:43Z

+    re.compile(r"\[TOOL_CALLS\]\s*\[.*?\](?:\s*</s>)?", re.DOTALL),
+    # Mistral v11+ ``[TOOL_CALLS]name{json}`` (may chain), close at ``}``.
+    re.compile(r"\[TOOL_CALLS\]\s*[\w\.\-]+\s*(?:\[ARGS\])?\s*\{.*?\}", re.DOTALL),


The regex patterns for Mistral tool calls in _TOOL_CLOSED_PATS use non-greedy matching (.*?) ending at the first ] or }. This will cause incorrect partial matches and stripping if the tool arguments contain nested arrays or objects (e.g., [TOOL_CALLS]name{"a": {"b": 1}} would match up to the first }). This leads to leaked markup and corrupted text in the UI during streaming and final cleanup.

gemini-code-assist · 2026-05-19T14:05:43Z

+            r"\[TOOL_CALLS\]\s*\[.*?\](?:\s*</s>)?",
+            r"\[TOOL_CALLS\]\s*[\w\.\-]+\s*(?:\[ARGS\])?\s*\{.*?\}",


These Mistral stripping patterns suffer from the same issue as those in tool_call_parser.py: the non-greedy .*? will stop at the first nested bracket or brace within the JSON arguments, causing partial stripping and leaking the remainder of the tool call to the user.

gemini-code-assist · 2026-05-19T14:05:43Z

+        args: dict[str, Any] = {}
+        for kv in _LLAMA3_KV_RE.finditer(body):
+            k = kv.group(1)
+            if kv.group(2) is not None:


Decoding escape sequences by encoding to UTF-8 and then using unicode_escape will corrupt non-ASCII characters. For example, a string like "München\n" will be incorrectly decoded as "MÃ¼nchen\n". Since the regex already ensures the content is a valid quoted string, json.loads is a safer and more robust way to decode standard escape sequences while preserving Unicode characters.

Suggested change

if kv.group(2) is not None:

args[k] = json.loads('"' + kv.group(2) + '"')

gemini-code-assist · 2026-05-19T14:05:43Z

+            r"<\|tool_call>.*?<tool_call\|>",
+            r"\[TOOL_CALLS\]\s*\[.*?\](?:\s*</s>)?",
+            r"\[TOOL_CALLS\]\s*[\w\.\-]+\s*(?:\[ARGS\])?\s*\{.*?\}",
+            r"<\|python_tag\|>[^\n<]*",


The Llama-3 stripping regex r"<\|python_tag\|>[^\n<]*" is too restrictive and will fail to strip multi-line JSON tool calls (common in Llama-3.1/3.2 custom tools). It stops at the first newline, leaving the rest of the JSON visible to the user. A more robust approach for this tag (which lacks a formal closing tag) is to match until the next potential tag start or the end of the string.

Suggested change

r"<\|python_tag\|>[^\n<]*",

r"<\|python_tag\|>.*?(?=<|$)",

…sors + MLX (#5615)" (#5619) Reverts PR #5615 to give the safetensors + MLX healing parity work more time to bake before re-merging. The reverted feature branch `studio-tools-multi-format` remains untouched, and the follow-up PR will layer the healing-parity commits on top.

danielhanchen · 2026-05-21T13:03:57Z

Re-reviewed after the fact (this PR was already reverted in #5619, with the replacement under #5620). Confirming the revert was the right call — pr-5615 shipped four concrete parser bugs that are end-user-visible:

Mistral nested-JSON truncation. \[TOOL_CALLS\]\s*[\w\.\-]+\s*(?:\[ARGS\])?\s*\{.*?\} (tool_call_parser.py:55-56) is non-greedy on }. For [TOOL_CALLS]search{"filters":{"date":"2024"},"q":"foo"} it stripped only up to the inner }, leaking ,"q":"foo"} to the user. Fixed in studio: tool calling + healing parity for Llama-3, Mistral, Gemma 4 on safetensors + MLX #5620 via _strip_mistral_closed_calls + _balanced_brace_end.
<|python_tag|> stop-on-<. <\|python_tag\|>[^\n<]* (routes/inference.py:452) stops at any <. A Llama-3 call like <|python_tag|>python.call(code="if x < 10: pass") got stripped to < 10: pass"). Fixed in studio: tool calling + healing parity for Llama-3, Mistral, Gemma 4 on safetensors + MLX #5620 to [^\n]*.
Llama-3 sentinel single-pass. The fixed-order for sentinel in (...) loop (tool_call_parser.py:402-415) silently dropped calls when the stream contained <|eot_id|><|begin_of_text|>{json} because begin_of_text was tested before eot_id consumed its prefix. Fixed by the while True / matched loop in studio: tool calling + healing parity for Llama-3, Mistral, Gemma 4 on safetensors + MLX #5620.
UTF-8 corruption in Llama-3 KV decoder. bytes(s, "utf-8").decode("unicode_escape") (tool_call_parser.py:419-422) mangles non-ASCII bytes — "café日本" → 'cafÃ©æ\x97¥æ\x9c¬'. Any tool call with emoji/CJK silently corrupted arguments. Fixed via json.loads('"' + value + '"') in studio: tool calling + healing parity for Llama-3, Mistral, Gemma 4 on safetensors + MLX #5620.

Plus a fifth that motivated the parity package in #5620:
5. GGUF heal key always query. llama_cpp.py:4852 healed bare-string args on python/terminal calls to {"query": …} instead of {"code": …} / {"command": …}, so the python sandbox / terminal never executed the content. Safetensors had _CANONICAL_HEAL_ARG here in #5615; GGUF gets parity in #5620 at llama_cpp.py:4861-4870.

For the record: pytest studio/backend/tests/test_safetensors_tool_loop.py studio/backend/tests/test_safetensors_capability_advertise.py -q is green on both pr-5615 (96 passed) and pr-5620 (110 passed) — pr-5615's own suite simply did not cover any of the five regressions above, which is why they shipped. #5620 adds 14 dedicated tests (TestLoopRePrompt, TestLoopCanonicalHealKey) that lock the fixes down.

No cleanup needed on this PR — #5619 reverted the merge cleanly.

Comments were narrating what the code already says. Cut historical "earlier revisions used X, then Y" narratives down to one-line WHY notes where the footgun still matters (canonical heal-key parity, balanced-brace vs non-greedy regex, ``(?:[^<]|<(?!\|))*`` over ``[^\n<]*``/``[^\n]*``). Drop section-header banners. No behaviour change. Re-ran: pytest studio/backend/tests/test_safetensors_tool_loop.py \ studio/backend/tests/test_safetensors_capability_advertise.py -q -> 118 passed. Regression replay (parser + _coerce_arguments on the 5 #5615 inputs) -> 21/21.

…LX (unslothai#5615) Adds tool calling for Llama-3, Mistral (pre-v11 + v11+ + [ARGS]), and Gemma 4 to the safetensors / transformers and MLX backends. Parser patched against llama.cpp / vLLM / SGLang per-family parsers and normalises to OpenAI shape. 96 targeted unit tests + cross-OS staging CI (ubuntu / macos-14 / windows) green on the multi-format probe.

…sors + MLX (unslothai#5615)" (unslothai#5619) Reverts PR unslothai#5615 to give the safetensors + MLX healing parity work more time to bake before re-merging. The reverted feature branch `studio-tools-multi-format` remains untouched, and the follow-up PR will layer the healing-parity commits on top.

danielhanchen requested a review from rolandtannous as a code owner May 19, 2026 14:01

[pre-commit.ci] auto fixes from pre-commit.com hooks

62c9e8f

for more information, see https://pre-commit.ci

chatgpt-codex-connector Bot reviewed May 19, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 19, 2026

View reviewed changes

danielhanchen merged commit af35ed8 into main May 19, 2026
1 of 31 checks passed

danielhanchen deleted the studio-tools-multi-format branch May 19, 2026 14:14

danielhanchen mentioned this pull request May 19, 2026

Revert "studio: tool calling for Llama-3, Mistral, Gemma 4 on safetensors + MLX (#5615)" #5619

Merged

danielhanchen mentioned this pull request May 21, 2026

studio: tool calling + healing parity for Llama-3, Mistral, Gemma 4 on safetensors + MLX #5620

Draft

danielhanchen mentioned this pull request May 27, 2026

Studio: re-introduce multi-format tool calling with parser bug fixes #5811

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

studio: tool calling for Llama-3, Mistral, Gemma 4 on safetensors + MLX#5615

studio: tool calling for Llama-3, Mistral, Gemma 4 on safetensors + MLX#5615
danielhanchen merged 2 commits into
mainfrom
studio-tools-multi-format

danielhanchen commented May 19, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 19, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

gemini-code-assist Bot May 19, 2026

Uh oh!

Uh oh!

danielhanchen commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		r"\[TOOL_CALLS\]\s\[.?\](?:\s*</s>)?",
		r"\[TOOL_CALLS\]\s[\w\.\-]+\s(?:\[ARGS\])?\s\{.?\}",

	if kv.group(2) is not None:
	args[k] = json.loads('"' + kv.group(2) + '"')

Uh oh!

Conversation

danielhanchen commented May 19, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

danielhanchen commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant