[Bugfix] Fix step3p5 parser when using mtp#33690
[Bugfix] Fix step3p5 parser when using mtp#33690chaunceyjiang merged 4 commits intovllm-project:mainfrom
Conversation
Signed-off-by: mariohong <mariohong128@gmail.com>
Signed-off-by: mariohong <mariohong128@gmail.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Code Review
This pull request aims to fix the step3.5 tool parser for scenarios involving multiple tool calls by introducing a fallback_call_id. This new variable is intended to ensure that fallback logic for closing tags applies to the correct tool call, especially in streaming mode. While the overall direction is correct, the implementation for determining fallback_call_id is overly complex and misses a key transition case, which could lead to parsing failures. I've provided a critical review comment with a suggested simplification that makes the logic more robust and corrects the identified issue.
|
@chaunceyjiang We have internally verified the correctness of this parser modification. Can we merge it now? |
Signed-off-by: mariohong <mariohong128@gmail.com>
Signed-off-by: mariohong <mariohong128@gmail.com>
Audited recent tool parser bug-fix PRs and found that several landed without corresponding test coverage. Added unit tests for each fix to prevent regressions. - Mistral: fast detokenization text detection (PR vllm-project#37209) - Qwen3Coder: malformed XML crash, anyOf double-encoding, speculative decode streaming (PRs vllm-project#36774, vllm-project#36032, vllm-project#35615) - DeepSeekV32: delimiter preservation with fast detokenization, skip_special_tokens adjustment (PR vllm-project#33964) - GLM-4 MoE: zero-argument tool calls, transformers 5.x delimiter handling, Unicode character preservation (PRs vllm-project#32321, vllm-project#31622, vllm-project#30920) - MiniMax M2: anyOf nullable parameter handling for non-null and null values (PR vllm-project#32342) - Step3p5: MTP-style variable-chunk and multi-token streaming (PR vllm-project#33690) - Kimi K2: native tool call ID extraction and multi-turn ID continuity (PR vllm-project#32768) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ben Browning <bbrownin@redhat.com>
Purpose
Fix step3.5 parser when using mtp.
If model outputs
</tool_call><tool_call><(using mtp will greatly increase the possibility of this), parser will start a new empty toolcall incorrectly.Test Plan
pytest tests/tool_parsers/test_step3p5_tool_parser.pyTest Result
All tests passed.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.