fix: pass tools to chat template in MLLM path by kargarisaac · Pull Request #139 · waybarrios/vllm-mlx

kargarisaac · 2026-03-04T06:33:18Z

Summary

Bug: Models loaded with --mllm (e.g., Qwen3.5 which requires it due to vision_tower weights) silently dropped tool definitions from the prompt, making --enable-auto-tool-choice --tool-call-parser ineffective for MLLM models
Root cause: SimpleEngine.chat()/stream_chat() MLLM branches did not pass template_tools to the model (only the LLM branch did), and MLXMultimodalLM.chat()/stream_chat() did not extract tools from **kwargs to forward to get_chat_template()
Fix: Pass template_tools in the MLLM branch of SimpleEngine, and extract/forward tools in MLXMultimodalLM's chat template calls

Files Changed

vllm_mlx/engine/simple.py — Pass template_tools to MLLM model in chat() and stream_chat()
vllm_mlx/models/mllm.py — Extract tools from **kwargs and pass to get_chat_template() in chat() and stream_chat()

Test plan

Tested with Qwen3.5-4B-8bit — single tool call works, multi-tool parallel calls work
Tested with Qwen3.5-9B-MLX-8bit — single and multi-tool calls work
Verified prompt_tokens increases when tools are provided (284 with tools vs 29 without)
Verified finish_reason: "tool_calls" and proper OpenAI wire format output
Confirmed qwen3_coder parser correctly translates <function=name><parameter=key>value</parameter></function> to OpenAI tool_calls JSON

Reproduction

# Before fix: tools silently dropped, model responds with text
vllm-mlx serve mlx-community/Qwen3.5-4B-8bit --mllm \
  --enable-auto-tool-choice --tool-call-parser qwen3_coder

# prompt_tokens: 29, tool_calls: null

# After fix: tools properly injected, model makes tool calls
# Same command, same model — now works correctly
# prompt_tokens: 284, tool_calls: [{...}], finish_reason: "tool_calls"

The MLLM code path in SimpleEngine.chat() and stream_chat() did not pass tool definitions to MLXMultimodalLM.chat()/stream_chat(), while the LLM path did. Additionally, MLXMultimodalLM.chat()/stream_chat() did not extract tools from **kwargs to pass to get_chat_template(). This meant any model loaded with --mllm (e.g. Qwen3.5 which requires --mllm due to vision_tower weights) silently dropped tool definitions from the prompt, making --enable-auto-tool-choice --tool-call-parser ineffective. Fix: pass template_tools in the MLLM branch of SimpleEngine, and extract/forward tools in MLXMultimodalLM's chat template calls. Tested with Qwen3.5-4B-8bit and Qwen3.5-9B-MLX-8bit - both now produce correct OpenAI-compatible tool_calls with the qwen3_coder parser. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Thump604

Solid fix. I can see exactly what was broken and why this solves it.

The bug: MLLM path in SimpleEngine.chat()/stream_chat() never passed tools to the model, even though the LLM branch did. MLXMultimodalLM wasn't extracting tools from kwargs to pass to get_chat_template(). Result: silently dropped tool definitions, breaking auto-tool-choice for Qwen3.5 VLM and any other MLLM that supports function calling.

The fix:

SimpleEngine now builds mllm_kwargs/mllm_stream_kwargs dict and adds tools before calling model.chat()/stream_chat()
MLXMultimodalLM extracts tools from kwargs and passes them through template_kwargs to get_chat_template()
Same pattern as the LLM path — consistent

Code quality: Clean implementation. Shallow copy of kwargs avoids mutation. Conditional injection (if template_tools/if tools) avoids passing None. Removed noisy debug logging (the before/after message previews) without losing observability.

Test coverage: Verified on two model sizes (4B, 9B), checked prompt_tokens delta, confirmed tool_calls in output and finish_reason. Parser translation checked.

One consideration: This pattern (passing tools through kwargs chain) now exists in three places — SimpleEngine chat/stream_chat and MLXMultimodalLM chat/stream_chat. If the tools extraction logic needs to change later (e.g., filter/validate tools), it'll need updates in two files. Not a blocker, just noting for future refactors.

This unblocks tool calling for MLLM models. Ship it.

Thump604 · 2026-04-08T00:26:01Z

@waybarrios, @kargarisaac: status note plus coordination.

This PR addresses the gap where MLLM-loaded models (e.g. Qwen3.5 which requires --mllm due to vision_tower weights) silently dropped tool definitions from the chat template, making --enable-auto-tool-choice --tool-call-parser ineffective for MLLM models. The fix shape is correct: pass template_tools in the MLLM branch of SimpleEngine, extract tools from **kwargs in MLXMultimodalLM.chat()/stream_chat(), forward to get_chat_template().

PR currently shows CONFLICTING merge status. Likely needs a rebase on current main since the SimpleEngine and MLXMultimodalLM code paths have had other changes since this branch was created.

Coordination note: PR #116 (swaylenhayes, "Enable tool calling for MLLM/VLM chat paths") targets the same problem with a broader scope (passes both tools AND tool_choice through the MLLM chat-template paths in both SimpleEngine and BatchedEngine). #116 is also CONFLICTING and from Feb 25 (~6 weeks old). The two PRs are not currently cross-linked. Either could land first, with the other becoming a follow-up or being closed.

Last activity Mar 31.

janhilgard · 2026-04-11T15:59:43Z

Hey @kargarisaac — good catch on the root cause! The MLLM path silently dropping tools was a real issue.

This is now fixed in main (landed via PR #278):

SimpleEngine: convert_tools_for_template(tools) passed in both LLM and MLLM branches of chat()/stream_chat()
MLXMultimodalLM: tools extracted from kwargs and forwarded to get_chat_template() in chat(), stream_chat(), and batch paths

Same fix as PR #116 (which also addressed this), both superseded by the production backport.

This PR has merge conflicts with current main. The underlying issue is fully resolved.

Thump604 · 2026-04-11T16:10:26Z

Closing as superseded by the current MLLM tool-calling path in main. Jan's read is the right current one here: the underlying tools-propagation bug is already fixed via the later mainline work, so I don't think reviving this conflicting branch is the right use of review bandwidth.

Thump604 approved these changes Mar 31, 2026

View reviewed changes

Thump604 mentioned this pull request Apr 8, 2026

Enable tool calling for MLLM/VLM chat paths by passing tools and tool_choice into chat templates #116

Closed

Thump604 closed this Apr 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: pass tools to chat template in MLLM path#139

fix: pass tools to chat template in MLLM path#139
kargarisaac wants to merge 1 commit intowaybarrios:mainfrom
kargarisaac:fix/mllm-tool-calling

kargarisaac commented Mar 4, 2026 •

edited

Loading

Uh oh!

Thump604 left a comment

Uh oh!

Thump604 commented Apr 8, 2026

Uh oh!

janhilgard commented Apr 11, 2026

Uh oh!

Thump604 commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kargarisaac commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files Changed

Test plan

Reproduction

Uh oh!

Thump604 left a comment

Choose a reason for hiding this comment

Uh oh!

Thump604 commented Apr 8, 2026

Uh oh!

janhilgard commented Apr 11, 2026

Uh oh!

Thump604 commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kargarisaac commented Mar 4, 2026 •

edited

Loading