studio: tool calling for Llama-3, Mistral, Gemma 4 on safetensors + MLX#5615
Conversation
The shared tool_call_parser used by safetensors and MLX now recognises
the canonical emission shapes for the most popular families so the
agentic loop sees the same call shape llama-server normalises for
GGUF. Patched against llama.cpp's per-family parsers (common/chat-
parser.cpp, legacy pre-PEG branch at 34df42f7be), vLLM's
tool_parsers/, and SGLang's function_call/ modules.
Formats covered:
Qwen / Hermes <tool_call>{json}</tool_call>
Qwen3.5 / Hermes <function=name><parameter=k>v</parameter></function>
Llama-3 built-in <|python_tag|>NAME.call(k="v", ...)
Llama-3 custom <|python_tag|>{"name":..., "parameters":...}
Llama-3.2 bare {"name":..., "parameters":...} (no marker)
Mistral pre-v11 [TOOL_CALLS] [{"name":..., "arguments":...}, ...]
Mistral v11+ [TOOL_CALLS]name{json} (may chain)
Ministral / Large 3 [TOOL_CALLS]name[ARGS]{json}
Gemma 4 <|tool_call>call:NAME{k:<|"|>v<|"|>}<tool_call|>
All parsers normalise to OpenAI shape
``{id, type:"function", function:{name, arguments(json_string)}}``.
Truncated emissions (unclosed brackets, missing close tags) are
tolerated -- balanced-brace walkers fall back to per-object healing
so a mid-stream cut does not lose the call.
Llama-3.2 bare-JSON parser is strict: it only fires when stripped
content starts with ``{`` and the parsed object has ``name`` (str)
plus a dict in ``parameters`` or ``arguments``. Plain assistant
prose, tool-message echoes, and JSON missing those keys all leave
it dormant.
routes/inference._detect_safetensors_features now allows templates
whose tool-call format is any of the seven supported markers; the
gate still suppresses ``supports_tools`` for templates that
advertise tools but use a shape the parser cannot honour, so the UI
never enables a pill the loop will not return.
Streaming buffer wakes up on five markers (was two) so the
safetensors / MLX state machine drains tool calls instead of leaking
them as prose:
TOOL_XML_SIGNALS = (
"<tool_call>", "<function=",
"<|python_tag|>", "[TOOL_CALLS]", "<|tool_call>",
)
The route-layer markup-strip regex ``_TOOL_XML_RE`` is extended to
match every closed-pair shape, including Mistral v11+ ``name{json}``
and Llama-3 ``<|python_tag|>...\n`` so leaked markup is removed from
SSE / non-streaming completions across all five families.
Tests: 37 new unit tests covering each emission shape (parser +
streaming buffer + strip_tool_markup + agentic loop), 11 bare-JSON
edge cases guarding against false positives, and 4 new capability
advertise tests pinning the gate to recognise Llama-3 / Mistral /
Gemma 4 / Llama-3.2 bare-JSON templates as supports_tools=True
while still suppressing tools for unknown emission formats.
The previous suppression tests (Llama-3 template suppresses tools,
Mistral template suppresses tools) are inverted to assert the new
gate keeps tools enabled for those families -- the loop now
supports them end to end.
Cross-OS validation (ubuntu / macos-14 / windows) lives on the
staging fork: danielhanchen/unsloth-staging-2 #126, which exercises
the multi-format parser against 9 representative fixtures plus the
existing macos-14 MLX Qwen3.5-0.8B cartesian probe.
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c5c707fbe7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| TOOL_XML_SIGNALS = ( | ||
| "<tool_call>", | ||
| "<function=", | ||
| "<|python_tag|>", | ||
| "[TOOL_CALLS]", | ||
| "<|tool_call>", | ||
| ) |
There was a problem hiding this comment.
Detect bare Llama JSON before streaming it
When a Llama-3.2 custom-tools template emits the newly supported bare {"name":..., "parameters":...} shape, there is no marker from this tuple in the output. In the safetensors loop, BUFFERING therefore treats the first { as ordinary text and switches to STREAMING, and the end-of-stream safety path only calls parse_tool_calls_from_text when has_tool_signal(content_accum) is true, so the bare JSON parser added below is never reached; the raw tool call is shown as the assistant answer and no tool executes. Please add a streaming/final detection path for the bare JSON shape or run the final parser even without one of these signals.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request significantly expands the tool-call parsing capabilities to support a wide range of model emission formats, including Llama-3, Llama-3.2, Mistral (pre-v11 and v11+), and Gemma 4. It updates the inference routes to correctly detect these capabilities in chat templates and provides regex-based stripping for the new markers. Feedback focuses on improving the robustness of regex patterns for Mistral and Llama-3 to handle nested structures and multi-line content, as well as correcting a potential character corruption issue when decoding escape sequences in Llama-3 tool calls.
| re.compile(r"\[TOOL_CALLS\]\s*\[.*?\](?:\s*</s>)?", re.DOTALL), | ||
| # Mistral v11+ ``[TOOL_CALLS]name{json}`` (may chain), close at ``}``. | ||
| re.compile(r"\[TOOL_CALLS\]\s*[\w\.\-]+\s*(?:\[ARGS\])?\s*\{.*?\}", re.DOTALL), |
There was a problem hiding this comment.
The regex patterns for Mistral tool calls in _TOOL_CLOSED_PATS use non-greedy matching (.*?) ending at the first ] or }. This will cause incorrect partial matches and stripping if the tool arguments contain nested arrays or objects (e.g., [TOOL_CALLS]name{"a": {"b": 1}} would match up to the first }). This leads to leaked markup and corrupted text in the UI during streaming and final cleanup.
| r"\[TOOL_CALLS\]\s*\[.*?\](?:\s*</s>)?", | ||
| r"\[TOOL_CALLS\]\s*[\w\.\-]+\s*(?:\[ARGS\])?\s*\{.*?\}", |
There was a problem hiding this comment.
| args: dict[str, Any] = {} | ||
| for kv in _LLAMA3_KV_RE.finditer(body): | ||
| k = kv.group(1) | ||
| if kv.group(2) is not None: |
There was a problem hiding this comment.
Decoding escape sequences by encoding to UTF-8 and then using unicode_escape will corrupt non-ASCII characters. For example, a string like "München\n" will be incorrectly decoded as "München\n". Since the regex already ensures the content is a valid quoted string, json.loads is a safer and more robust way to decode standard escape sequences while preserving Unicode characters.
| if kv.group(2) is not None: | |
| args[k] = json.loads('"' + kv.group(2) + '"') |
| r"<\|tool_call>.*?<tool_call\|>", | ||
| r"\[TOOL_CALLS\]\s*\[.*?\](?:\s*</s>)?", | ||
| r"\[TOOL_CALLS\]\s*[\w\.\-]+\s*(?:\[ARGS\])?\s*\{.*?\}", | ||
| r"<\|python_tag\|>[^\n<]*", |
There was a problem hiding this comment.
The Llama-3 stripping regex r"<\|python_tag\|>[^\n<]*" is too restrictive and will fail to strip multi-line JSON tool calls (common in Llama-3.1/3.2 custom tools). It stops at the first newline, leaving the rest of the JSON visible to the user. A more robust approach for this tag (which lacks a formal closing tag) is to match until the next potential tag start or the end of the string.
| r"<\|python_tag\|>[^\n<]*", | |
| r"<\|python_tag\|>.*?(?=<|$)", |
|
Re-reviewed after the fact (this PR was already reverted in #5619, with the replacement under #5620). Confirming the revert was the right call —
Plus a fifth that motivated the parity package in #5620: For the record: No cleanup needed on this PR — #5619 reverted the merge cleanly. |
Comments were narrating what the code already says. Cut historical
"earlier revisions used X, then Y" narratives down to one-line WHY
notes where the footgun still matters (canonical heal-key parity,
balanced-brace vs non-greedy regex, ``(?:[^<]|<(?!\|))*`` over
``[^\n<]*``/``[^\n]*``). Drop section-header banners.
No behaviour change. Re-ran:
pytest studio/backend/tests/test_safetensors_tool_loop.py \
studio/backend/tests/test_safetensors_capability_advertise.py -q
-> 118 passed.
Regression replay (parser + _coerce_arguments on the 5 #5615 inputs)
-> 21/21.
…LX (unslothai#5615) Adds tool calling for Llama-3, Mistral (pre-v11 + v11+ + [ARGS]), and Gemma 4 to the safetensors / transformers and MLX backends. Parser patched against llama.cpp / vLLM / SGLang per-family parsers and normalises to OpenAI shape. 96 targeted unit tests + cross-OS staging CI (ubuntu / macos-14 / windows) green on the multi-format probe.
…sors + MLX (unslothai#5615)" (unslothai#5619) Reverts PR unslothai#5615 to give the safetensors + MLX healing parity work more time to bake before re-merging. The reverted feature branch `studio-tools-multi-format` remains untouched, and the follow-up PR will layer the healing-parity commits on top.
See PR description below — full summary written inline.