Skip to content

fix: pass tools to chat template in MLLM path#139

Closed
kargarisaac wants to merge 1 commit intowaybarrios:mainfrom
kargarisaac:fix/mllm-tool-calling
Closed

fix: pass tools to chat template in MLLM path#139
kargarisaac wants to merge 1 commit intowaybarrios:mainfrom
kargarisaac:fix/mllm-tool-calling

Conversation

@kargarisaac
Copy link
Copy Markdown

@kargarisaac kargarisaac commented Mar 4, 2026

Summary

  • Bug: Models loaded with --mllm (e.g., Qwen3.5 which requires it due to vision_tower weights) silently dropped tool definitions from the prompt, making --enable-auto-tool-choice --tool-call-parser ineffective for MLLM models
  • Root cause: SimpleEngine.chat()/stream_chat() MLLM branches did not pass template_tools to the model (only the LLM branch did), and MLXMultimodalLM.chat()/stream_chat() did not extract tools from **kwargs to forward to get_chat_template()
  • Fix: Pass template_tools in the MLLM branch of SimpleEngine, and extract/forward tools in MLXMultimodalLM's chat template calls

Files Changed

  • vllm_mlx/engine/simple.py — Pass template_tools to MLLM model in chat() and stream_chat()
  • vllm_mlx/models/mllm.py — Extract tools from **kwargs and pass to get_chat_template() in chat() and stream_chat()

Test plan

  • Tested with Qwen3.5-4B-8bit — single tool call works, multi-tool parallel calls work
  • Tested with Qwen3.5-9B-MLX-8bit — single and multi-tool calls work
  • Verified prompt_tokens increases when tools are provided (284 with tools vs 29 without)
  • Verified finish_reason: "tool_calls" and proper OpenAI wire format output
  • Confirmed qwen3_coder parser correctly translates <function=name><parameter=key>value</parameter></function> to OpenAI tool_calls JSON

Reproduction

# Before fix: tools silently dropped, model responds with text
vllm-mlx serve mlx-community/Qwen3.5-4B-8bit --mllm \
  --enable-auto-tool-choice --tool-call-parser qwen3_coder

# prompt_tokens: 29, tool_calls: null

# After fix: tools properly injected, model makes tool calls
# Same command, same model — now works correctly
# prompt_tokens: 284, tool_calls: [{...}], finish_reason: "tool_calls"

The MLLM code path in SimpleEngine.chat() and stream_chat() did not
pass tool definitions to MLXMultimodalLM.chat()/stream_chat(), while
the LLM path did. Additionally, MLXMultimodalLM.chat()/stream_chat()
did not extract tools from **kwargs to pass to get_chat_template().

This meant any model loaded with --mllm (e.g. Qwen3.5 which requires
--mllm due to vision_tower weights) silently dropped tool definitions
from the prompt, making --enable-auto-tool-choice --tool-call-parser
ineffective.

Fix: pass template_tools in the MLLM branch of SimpleEngine, and
extract/forward tools in MLXMultimodalLM's chat template calls.

Tested with Qwen3.5-4B-8bit and Qwen3.5-9B-MLX-8bit - both now
produce correct OpenAI-compatible tool_calls with the qwen3_coder
parser.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@Thump604 Thump604 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid fix. I can see exactly what was broken and why this solves it.

The bug: MLLM path in SimpleEngine.chat()/stream_chat() never passed tools to the model, even though the LLM branch did. MLXMultimodalLM wasn't extracting tools from kwargs to pass to get_chat_template(). Result: silently dropped tool definitions, breaking auto-tool-choice for Qwen3.5 VLM and any other MLLM that supports function calling.

The fix:

  • SimpleEngine now builds mllm_kwargs/mllm_stream_kwargs dict and adds tools before calling model.chat()/stream_chat()
  • MLXMultimodalLM extracts tools from kwargs and passes them through template_kwargs to get_chat_template()
  • Same pattern as the LLM path — consistent

Code quality: Clean implementation. Shallow copy of kwargs avoids mutation. Conditional injection (if template_tools/if tools) avoids passing None. Removed noisy debug logging (the before/after message previews) without losing observability.

Test coverage: Verified on two model sizes (4B, 9B), checked prompt_tokens delta, confirmed tool_calls in output and finish_reason. Parser translation checked.

One consideration: This pattern (passing tools through kwargs chain) now exists in three places — SimpleEngine chat/stream_chat and MLXMultimodalLM chat/stream_chat. If the tools extraction logic needs to change later (e.g., filter/validate tools), it'll need updates in two files. Not a blocker, just noting for future refactors.

This unblocks tool calling for MLLM models. Ship it.

@Thump604
Copy link
Copy Markdown
Collaborator

Thump604 commented Apr 8, 2026

@waybarrios, @kargarisaac: status note plus coordination.

This PR addresses the gap where MLLM-loaded models (e.g. Qwen3.5 which requires --mllm due to vision_tower weights) silently dropped tool definitions from the chat template, making --enable-auto-tool-choice --tool-call-parser ineffective for MLLM models. The fix shape is correct: pass template_tools in the MLLM branch of SimpleEngine, extract tools from **kwargs in MLXMultimodalLM.chat()/stream_chat(), forward to get_chat_template().

PR currently shows CONFLICTING merge status. Likely needs a rebase on current main since the SimpleEngine and MLXMultimodalLM code paths have had other changes since this branch was created.

Coordination note: PR #116 (swaylenhayes, "Enable tool calling for MLLM/VLM chat paths") targets the same problem with a broader scope (passes both tools AND tool_choice through the MLLM chat-template paths in both SimpleEngine and BatchedEngine). #116 is also CONFLICTING and from Feb 25 (~6 weeks old). The two PRs are not currently cross-linked. Either could land first, with the other becoming a follow-up or being closed.

Last activity Mar 31.

@janhilgard
Copy link
Copy Markdown
Collaborator

Hey @kargarisaac — good catch on the root cause! The MLLM path silently dropping tools was a real issue.

This is now fixed in main (landed via PR #278):

  • SimpleEngine: convert_tools_for_template(tools) passed in both LLM and MLLM branches of chat()/stream_chat()
  • MLXMultimodalLM: tools extracted from kwargs and forwarded to get_chat_template() in chat(), stream_chat(), and batch paths

Same fix as PR #116 (which also addressed this), both superseded by the production backport.

This PR has merge conflicts with current main. The underlying issue is fully resolved.

@Thump604
Copy link
Copy Markdown
Collaborator

Closing as superseded by the current MLLM tool-calling path in main. Jan's read is the right current one here: the underlying tools-propagation bug is already fixed via the later mainline work, so I don't think reviving this conflicting branch is the right use of review bandwidth.

@Thump604 Thump604 closed this Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants