[Model] Deepseek-V3.1 reasoning parser#24972
[Model] Deepseek-V3.1 reasoning parser#24972taohui wants to merge 20 commits intovllm-project:mainfrom
Conversation
… reasoning via request parameters, in line with DeepSeek-v3.1 model behavior. Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Code Review
This pull request introduces a new reasoning parser deepseek_v3 for the DeepSeek-V3.1 model. The implementation uses a delegation pattern, selecting between DeepSeekR1ReasoningParser and a new IdentityReasoningParser based on the thinking parameter in chat_template_kwargs. The changes to pass these kwargs from the serving layer to the parser constructor are correct. However, this change introduces a critical issue where existing reasoning parsers that don't accept arbitrary keyword arguments in their constructor will break, causing a TypeError. I've added a comment with a suggested fix for the new IdentityReasoningParser and noted that other parsers need a similar update.
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Change the return value of `is_reasoning_end` from False to True, since reasoning is not treated specially and the function should always return True. Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
| end_token: str = "</think>" | ||
|
|
||
| def __init__(self, tokenizer: PreTrainedTokenizerBase): | ||
| def __init__(self, tokenizer: PreTrainedTokenizerBase, *args, **kwargs): |
There was a problem hiding this comment.
I suggest splitting this change into a new PR.
There was a problem hiding this comment.
OK, I will split it into two.
There was a problem hiding this comment.
Hi @chaunceyjiang ,
I have split the original PR into two separate PRs for clarity and easier review:
-
[Model] Add optional parameter to reasoning parser constructor #25554
-
[Model] Add DeepSeek-V3.1 reasoning parser (split from PR #24972) #25589
Thank you!
|
This pull request has merge conflicts that must be resolved before it can be |
| def __init__(self, tokenizer: PreTrainedTokenizerBase, *args, **kwargs): | ||
| super().__init__(tokenizer) | ||
|
|
||
| thinking = bool(kwargs.pop("thinking", False)) |
There was a problem hiding this comment.
Currently, Qwen3 also supports parameters like
{
"chat_template_kwargs": {
"enable_thinking": enable_thinking
}
}to enable or disable reasoning, and unlike this PR, it does not require two separate reasoning_parsers.
Is there anything special about DeepSeek-V3 that necessitates this different approach?
There was a problem hiding this comment.
Thank you for reply. DeepSeek-V3.1 cannot directly use the qwen3 reasoning parser, because DeepSeek-V3.1 only has the end tag but no start tag . If we use the qwen3 reasoning parser, all content would be placed into the reasoning_content field. It also cannot directly use the deepseek_r1 reasoning parser, because that parser cannot decide whether to enable reasoning based on the request. For example, if I specify --reasoning-parser deepseek_r1 but the request is in non-thinking mode, it would still put all content into the reasoning_content field. The key point here is that we need a way to decide whether to use the reasoning parser based on the request. Therefore, I made a modification: applying a Strategy Pattern to select different strategies based on the request, while keeping the interface unchanged.
|
Documentation preview: https://vllm--24972.org.readthedocs.build/en/24972/ |
…5589) Signed-off-by: taohui <taohui3@gmail.com> Signed-off-by: Tao Hui <taohui3@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
…t#24972) (vllm-project#25589) Signed-off-by: taohui <taohui3@gmail.com> Signed-off-by: Tao Hui <taohui3@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev>
…t#24972) (vllm-project#25589) Signed-off-by: taohui <taohui3@gmail.com> Signed-off-by: Tao Hui <taohui3@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
…t#24972) (vllm-project#25589) Signed-off-by: taohui <taohui3@gmail.com> Signed-off-by: Tao Hui <taohui3@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
…t#24972) (vllm-project#25589) Signed-off-by: taohui <taohui3@gmail.com> Signed-off-by: Tao Hui <taohui3@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…t#24972) (vllm-project#25589) Signed-off-by: taohui <taohui3@gmail.com> Signed-off-by: Tao Hui <taohui3@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…t#24972) (vllm-project#25589) Signed-off-by: taohui <taohui3@gmail.com> Signed-off-by: Tao Hui <taohui3@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
…t#24972) (vllm-project#25589) Signed-off-by: taohui <taohui3@gmail.com> Signed-off-by: Tao Hui <taohui3@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you! |
|
This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you! |
Purpose
This PR adds a new reasoning parser for the DeepSeek-V3.1 model, named deepseek_v3. Unlike previous models such as deepseek_r1, the reasoning parser for DeepSeek-V3.1 is deterministic. Specifically:
When a request includes chat_template_kwargs": {"thinking": True}, the model uses the deepseek_r1 reasoning parser.
Otherwise, it uses a new IdentityReasoningParser, which implements the ReasoningParser interface but does not perform actual reasoning, effectively acting as a placeholder.
This ensures compatibility with the reasoning parser interface while preserving deterministic behavior for DeepSeek-V3.1.
Test Plan
Unit test
Added tests/reasoning/test_deepseekv3_reasoning_parser.py to verify that deepseek_v3 correctly selects either DeepSeekR1ReasoningParser or IdentityReasoningParser based on chat_template_kwargs": {"thinking": True}.
Implemented additional checks to ensure the newly added IdentityReasoningParser conforms to the ReasoningParser interface.
Unit test command:
pytest tests/reasoning/test_deepseekv3_reasoning_parser.pyUnit test result:
=============================================================================================== test session starts ================================================================================================
platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0
rootdir: /vllm-workspace/vllm
configfile: pyproject.toml
plugins: anyio-4.10.0
collected 3 items
tests/reasoning/test_deepseekv3_reasoning_parser.py ... [100%]
========================================================================================== 3 passed, 4 warnings in 4.69s ===========================================================================================
Serving test
Verified via service interface that both non-streaming and streaming requests correctly populate content and reasoning_content when chat_template_kwargs": {"thinking": True} is set or not. The code is bellow:
=== Thinking response ===
ChatCompletion(id='chatcmpl-27e5f6a59bb14359b3acdc2b9506cd55', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='I can check the weather for you! However, I don’t have real-time access to current or future weather data. To get the most accurate forecast for tomorrow, you can use a weather service like: \n\n- Weather.com \n- AccuWeather \n- The Weather Channel \n- Your device’s built-in weather app (e.g., Apple Weather, Google Weather) \n\nIf you tell me your location, I can help you find a reliable source or guide you on how to check! 😊', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], annotations=None, reasoning_content="Hmm, the user is asking for tomorrow's weather. Since I don't have real-time data access, I need to handle this gracefully. \n\nI should acknowledge the limitation upfront to set expectations, then offer a practical solution—like suggesting a weather service or asking for their location to provide instructions. \n\nKeeping it concise but helpful: a brief apology, a clear reason, and actionable alternatives. No need to overcomplicate it."), stop_reason=None, token_ids=None)], created=1758034986, model='DeepSeek-V3.1', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=197, prompt_tokens=15, total_tokens=212, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)
=== NonThinking response ===
ChatCompletion(id='chatcmpl-dad9101ee3f0464b98f28bba1a303d35', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='I can check the weather for you! Could you please tell me your city or location?', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], annotations=None, reasoning_content=None), stop_reason=None, token_ids=None)], created=1758034989, model='DeepSeek-V3.1, object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=19, prompt_tokens=15, total_tokens=34, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)
streaming output test:
=== Stream & Thinking response ===
Hmm, the user is asking about tomorrow's weather. This is a straightforward request for location-based information. Since I don't have access to real-time data or the user's location, I need to ask for their city or zip code to provide an accurate forecast.
I should keep it simple and friendly, offering to help once they provide the necessary details. No need to overcomplicate it—just a clear prompt and an emoji to keep it approachable.
Let's ask for the location and mention we can check multiple sources to ensure reliability.
=== end of reasoning ===
'd love to help! Could you please tell me your city or location so I can check the weather forecast for tomorrow? 😊
Documents update
Updated docs/features/reasoning_outputs.md with the following content:
deepseek_v3json,regexTool Calling is supported in Non-Thinking mode but not in Thinking mode. See details at: https://huggingface.co/deepseek-ai/DeepSeek-V3.1