Skip to content

Forward chat template kwargs in batched chat#7

Merged
krystophny merged 1 commit intomainfrom
fix/chat-template-kwargs-forwarding
Mar 24, 2026
Merged

Forward chat template kwargs in batched chat#7
krystophny merged 1 commit intomainfrom
fix/chat-template-kwargs-forwarding

Conversation

@krystophny
Copy link
Copy Markdown
Collaborator

Summary

  • accept chat_template_kwargs in the OpenAI chat completion request model
  • forward those kwargs through the server into the batched engine chat-template path
  • cover the request-model, engine, and endpoint forwarding behavior with regressions

Why

Tabura relies on chat_template_kwargs.enable_thinking=false for Qwen 3.5 local turns. In continuous batching mode the request field was dropped before it reached the tokenizer, so Qwen kept emitting thinking text even when the client explicitly disabled it.

Verification

Test fails on main

$ cd /tmp/vllm-mlx-main.dV8sAj
$ PYTHONPATH=/tmp/vllm-mlx-main.dV8sAj "$HOME/Library/Application Support/tabura/llm/venv/bin/python" -m pytest /tmp/vllm_mlx_chat_template_regression.py -q
F                                                                        [100%]
=================================== FAILURES ===================================
...
E       KeyError: 'chat_template_kwargs'
...
1 failed, 3 warnings in 2.46s

Test passes after fix

$ cd /tmp/vllm-mlx-fork
$ PYTHONPATH=/tmp/vllm-mlx-fork "$HOME/Library/Application Support/tabura/llm/venv/bin/python" -m pytest /tmp/vllm_mlx_chat_template_regression.py -q
.                                                                        [100%]
1 passed, 3 warnings in 2.43s

$ PYTHONPATH=/tmp/vllm-mlx-fork "$HOME/Library/Application Support/tabura/llm/venv/bin/python" -m pytest tests/test_chat_template_kwargs.py -q
...                                                                      [100%]
3 passed in 2.10s

@krystophny krystophny merged commit e2d1bbd into main Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant