[Model] Add DeepSeek-V3.1 reasoning parser (split from PR #24972)#25589
Conversation
Signed-off-by: taohui <taohui3@gmail.com>
fix(parser): ensure subclasses forward *args and **kwargs to super().__init__ Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Tao Hui <taohui3@gmail.com>
…__init__ Signed-off-by: taohui <taohui3@gmail.com>
… reasoning Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a new reasoning parser for the DeepSeek-V3.1 model, which dynamically selects between the existing DeepSeekR1ReasoningParser and a new IdentityReasoningParser based on the thinking flag in chat_template_kwargs. The changes are well-implemented, including updates to the base parser classes to allow for more flexible instantiation and the addition of relevant unit tests. However, I've identified a high-severity issue in the new DeepSeekV3ReasoningParser where it fails to propagate additional arguments to the underlying parser it creates. This could lead to incorrect behavior if other parameters are expected by the delegated parsers.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Tao Hui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
| reasoning_parser = self.reasoning_parser(tokenizer) | ||
| reasoning_parser = self.reasoning_parser( | ||
| tokenizer, **(request.chat_template_kwargs or {})) |
There was a problem hiding this comment.
I wonder if we can implement the same logic simply by define a local variable like use_reasoning_parser = request.chat_template_kwargs.get("thinking", True) and self.reasoning_parser on the beginning of .chat_completion_stream_generator + .chat_completion_full_generator and replace all if self.reasoning_parser checks with if use_reasoning_parser.
There was a problem hiding this comment.
Thanks for the suggestion! I also thought about a similar approach, but there are a couple of issues to consider:
-
chat_template_kwargs doesn’t have a standardized protocol—for example, Qwen3 uses enable_thinking, while DeepSeek-V3.1 uses thinking. It’s better to let each reasoning parser decide how to interpret its own parameters.
-
Making this change would result in a larger modification. Besides deciding whether to use the reasoning parser, the tool parser also depends on reasoning_parser.is_reasoning_end. A more elegant approach, like in this PR, is to abstract an IdentityReasoningParser, which allows flexible switching of whether to use the reasoning parser.
There was a problem hiding this comment.
Thanks for the detailed explanation, I haven't fully taken the tool parser alongside reasoning parser into account 👍
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
|
Hi @chaunceyjiang , |
|
Documentation preview: https://vllm--25589.org.readthedocs.build/en/25589/ |
|
Hi @chaunceyjiang , when you have a moment, please take a look at this PR. Thanks a lot! Let’s discuss if you have any questions. 🙏 |
|
这个功能什么时候能合并到主线 |
chaunceyjiang
left a comment
There was a problem hiding this comment.
/cc @aarnphm If you have time, could you please take a look at this?
| if self.reasoning_parser: | ||
| reasoning_parser = self.reasoning_parser(tokenizer) | ||
| reasoning_parser = self.reasoning_parser( | ||
| tokenizer, **(request.chat_template_kwargs or {})) |
There was a problem hiding this comment.
How about this?
reasoning_parser = self.reasoning_parser(
tokenizer, chat_template_kwargs=request.chat_template_kwargs))
…apper The DeepSeekV3ReasoningParser now expects parameters to be passed via `chat_template_kwargs` instead of being expanded directly with `**kwargs`. This aligns the call pattern with other reasoning parser constructors. - Updated __init__ to unpack `chat_template_kwargs` internally - Adjusted parser selection logic to extract `thinking` from the inner dict - Updated related pytest test cases to pass `chat_template_kwargs` instead of direct kwargs Signed-off-by: taohui <taohui3@gmail.com>
…ithub.com/taohui/vllm into deepseek_v3.1_reasoning_parser_add_parser
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
…ithub.com/taohui/vllm into deepseek_v3.1_reasoning_parser_add_parser
| logger = init_logger(__name__) | ||
|
|
||
|
|
||
| @ReasoningParserManager.register_module("deepseek_v3") |
There was a problem hiding this comment.
Hi, @taohui, have you tested this ReasoningParser deepseek_v3 with https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp?
There was a problem hiding this comment.
Not yet, I’ll test it today.
There was a problem hiding this comment.
The performance is perfect — with --reasoning-parser deepseek_v3 enabled, it parses correctly. Without it, the performance is the same as deepseek_v3.1.
model_name = "DeepSeek-V3.2-Exp"
extra_body_thinking = {"chat_template_kwargs": {"thinking": True}}
extra_body_nonthinking = {"chat_template_kwargs": {"thinking": False}}
response = client.chat.completions.create(
model=model_name,
messages=[
{"role": "user", "content": "What’s the weather like tomorrow?"}
],
extra_body=extra_body_thinking
)
print("=== Thinking response ===")
print(response)
response = client.chat.completions.create(
model=model_name,
messages=[
{"role": "user", "content": "What’s the weather like tomorrow?"}
],
extra_body=extra_body_nonthinking
)
print("=== NonThinking response ===")
print(response)
Output is:
=== Thinking response ===
ChatCompletion(id='chatcmpl-4984722fc97c4ee2ace19026f8825ee3', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='I can't check real-time weather, but you can get a reliable forecast for tomorrow by:\n\n• Searching online for "weather [your city]" \n• Asking your phone's assistant (Siri/Google Assistant) \n• Checking a weather app like The Weather Channel, AccuWeather, or your default phone app \n\nLet me know if you need help interpreting any weather terms once you have the forecast! ☀️🌧️⛅', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning_content="Hmm, the user is asking about tomorrow's weather. This is a straightforward request, but I don't have access to real-time data. \n\nI need to acknowledge the limitation upfront while offering helpful alternatives. The user likely wants actionable information, so suggesting reliable weather sources would be useful. \n\nI can list a few trusted options like weather apps and websites, and offer to help interpret the forecast if they provide their location. Keeping it concise but practical seems best here."), stop_reason=None, token_ids=None)], created=1760363936, model='DeepSeek-V3.2-Exp', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=194, prompt_tokens=14, total_tokens=208, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)
=== NonThinking response ===
ChatCompletion(id='chatcmpl-fc86fdc39d424c46897cdc18406b4267', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='I can check the latest weather forecast for you! Please enable "联网搜索" in the app settings, and I’ll fetch real-time weather information for your location. Alternatively, you can tell me your city or region, and I’ll look it up for you! 😊', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning_content=None), stop_reason=None, token_ids=None)], created=1760363945, model='DeepSeek-V3.2-Exp', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=60, prompt_tokens=14, total_tokens=74, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)
If start without --reasoning-parser deepseek_v3, the response is:
=== Thinking response ===
ChatCompletion(id='chatcmpl-0e0e975c6eae4e0a97cccbd41dc4c039', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Hmm, the user is asking about tomorrow's weather. This is a straightforward request, but I don't have access to real-time data. \n\nI need to acknowledge the limitation upfront while offering helpful alternatives. The user likely wants actionable information, so suggesting reliable weather sources would be useful. \n\nI can list a few trusted options like weather apps and websites, and offer to help interpret the forecast if they provide their location. Keeping it concise but practical seems best here.I can't check real-time weather, but you can get a reliable forecast for tomorrow by:\n\n• Searching online for "weather [your city]" \n• Asking your phone's assistant (Siri/Google Assistant) \n• Checking a weather app like The Weather Channel, AccuWeather, or your default phone app \n\nLet me know if you need help interpreting any weather terms once you have the forecast! ☀️🌧️⛅', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning_content=None), stop_reason=None, token_ids=None)], created=1760364966, model='DeepSeek-V3.2-Exp', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=194, prompt_tokens=14, total_tokens=208, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)
There was a problem hiding this comment.
Hope to merge this PR into main branch ASAP. 👏
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Tao Hui <taohui3@gmail.com>
…ent_streaming return annotation Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
…t#24972) (vllm-project#25589) Signed-off-by: taohui <taohui3@gmail.com> Signed-off-by: Tao Hui <taohui3@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev>
…t#24972) (vllm-project#25589) Signed-off-by: taohui <taohui3@gmail.com> Signed-off-by: Tao Hui <taohui3@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
…t#24972) (vllm-project#25589) Signed-off-by: taohui <taohui3@gmail.com> Signed-off-by: Tao Hui <taohui3@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
…t#24972) (vllm-project#25589) Signed-off-by: taohui <taohui3@gmail.com> Signed-off-by: Tao Hui <taohui3@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…t#24972) (vllm-project#25589) Signed-off-by: taohui <taohui3@gmail.com> Signed-off-by: Tao Hui <taohui3@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…t#24972) (vllm-project#25589) Signed-off-by: taohui <taohui3@gmail.com> Signed-off-by: Tao Hui <taohui3@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
…t#24972) (vllm-project#25589) Signed-off-by: taohui <taohui3@gmail.com> Signed-off-by: Tao Hui <taohui3@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Purpose
This PR adds a new reasoning parser for the DeepSeek-V3.1 model, named deepseek_v3. Unlike previous models such as deepseek_r1, the reasoning parser for DeepSeek-V3.1 is deterministic. Specifically:
When a request includes chat_template_kwargs": {"thinking": True}, the model uses the deepseek_r1 reasoning parser.
Otherwise, it uses a new IdentityReasoningParser, which implements the ReasoningParser interface but does not perform actual reasoning, effectively acting as a placeholder.
This ensures compatibility with the reasoning parser interface while preserving deterministic behavior for DeepSeek-V3.1.
Test Plan
Unit test
Added tests/reasoning/test_deepseekv3_reasoning_parser.py to verify that deepseek_v3 correctly selects either DeepSeekR1ReasoningParser or IdentityReasoningParser based on chat_template_kwargs": {"thinking": True}.
Implemented additional checks to ensure the newly added IdentityReasoningParser conforms to the ReasoningParser interface.
Unit test command:
pytest tests/reasoning/test_deepseekv3_reasoning_parser.pyUnit test result:
=============================================================================================== test session starts ================================================================================================
platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0
rootdir: /vllm-workspace/vllm
configfile: pyproject.toml
plugins: anyio-4.10.0
collected 3 items
tests/reasoning/test_deepseekv3_reasoning_parser.py ... [100%]
========================================================================================== 3 passed, 4 warnings in 4.69s ===========================================================================================
Serving test
Verified via service interface that both non-streaming and streaming requests correctly populate content and reasoning_content when chat_template_kwargs": {"thinking": True} is set or not. The code is bellow:
=== Thinking response ===
ChatCompletion(id='chatcmpl-27e5f6a59bb14359b3acdc2b9506cd55', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='I can check the weather for you! However, I don’t have real-time access to current or future weather data. To get the most accurate forecast for tomorrow, you can use a weather service like: \n\n- Weather.com \n- AccuWeather \n- The Weather Channel \n- Your device’s built-in weather app (e.g., Apple Weather, Google Weather) \n\nIf you tell me your location, I can help you find a reliable source or guide you on how to check! 😊', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], annotations=None, reasoning_content="Hmm, the user is asking for tomorrow's weather. Since I don't have real-time data access, I need to handle this gracefully. \n\nI should acknowledge the limitation upfront to set expectations, then offer a practical solution—like suggesting a weather service or asking for their location to provide instructions. \n\nKeeping it concise but helpful: a brief apology, a clear reason, and actionable alternatives. No need to overcomplicate it."), stop_reason=None, token_ids=None)], created=1758034986, model='DeepSeek-V3.1', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=197, prompt_tokens=15, total_tokens=212, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)
=== NonThinking response ===
ChatCompletion(id='chatcmpl-dad9101ee3f0464b98f28bba1a303d35', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='I can check the weather for you! Could you please tell me your city or location?', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], annotations=None, reasoning_content=None), stop_reason=None, token_ids=None)], created=1758034989, model='DeepSeek-V3.1, object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=19, prompt_tokens=15, total_tokens=34, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)
streaming output test:
=== Stream & Thinking response ===
Hmm, the user is asking about tomorrow's weather. This is a straightforward request for location-based information. Since I don't have access to real-time data or the user's location, I need to ask for their city or zip code to provide an accurate forecast.
I should keep it simple and friendly, offering to help once they provide the necessary details. No need to overcomplicate it—just a clear prompt and an emoji to keep it approachable.
Let's ask for the location and mention we can check multiple sources to ensure reliability.
=== end of reasoning ===
'd love to help! Could you please tell me your city or location so I can check the weather forecast for tomorrow? 😊
Documents update
Updated docs/features/reasoning_outputs.md with the following content:
deepseek_v3json,regexTool Calling is supported in Non-Thinking mode but not in Thinking mode. See details at: https://huggingface.co/deepseek-ai/DeepSeek-V3.1