Skip to content

[Model] Add DeepSeek-V3.1 reasoning parser (split from PR #24972)#25589

Merged
chaunceyjiang merged 27 commits intovllm-project:mainfrom
taohui:deepseek_v3.1_reasoning_parser_add_parser
Oct 15, 2025
Merged

[Model] Add DeepSeek-V3.1 reasoning parser (split from PR #24972)#25589
chaunceyjiang merged 27 commits intovllm-project:mainfrom
taohui:deepseek_v3.1_reasoning_parser_add_parser

Conversation

@taohui
Copy link
Copy Markdown
Contributor

@taohui taohui commented Sep 24, 2025

Purpose

This PR adds a new reasoning parser for the DeepSeek-V3.1 model, named deepseek_v3. Unlike previous models such as deepseek_r1, the reasoning parser for DeepSeek-V3.1 is deterministic. Specifically:

When a request includes chat_template_kwargs": {"thinking": True}, the model uses the deepseek_r1 reasoning parser.

Otherwise, it uses a new IdentityReasoningParser, which implements the ReasoningParser interface but does not perform actual reasoning, effectively acting as a placeholder.

This ensures compatibility with the reasoning parser interface while preserving deterministic behavior for DeepSeek-V3.1.

Test Plan

Unit test

Added tests/reasoning/test_deepseekv3_reasoning_parser.py to verify that deepseek_v3 correctly selects either DeepSeekR1ReasoningParser or IdentityReasoningParser based on chat_template_kwargs": {"thinking": True}.

Implemented additional checks to ensure the newly added IdentityReasoningParser conforms to the ReasoningParser interface.

Unit test command:

pytest tests/reasoning/test_deepseekv3_reasoning_parser.py

Unit test result:

=============================================================================================== test session starts ================================================================================================
platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0
rootdir: /vllm-workspace/vllm
configfile: pyproject.toml
plugins: anyio-4.10.0
collected 3 items

tests/reasoning/test_deepseekv3_reasoning_parser.py ... [100%]

========================================================================================== 3 passed, 4 warnings in 4.69s ===========================================================================================

Serving test

Verified via service interface that both non-streaming and streaming requests correctly populate content and reasoning_content when chat_template_kwargs": {"thinking": True} is set or not. The code is bellow:

model_name = "DeepSeek-V3.1"
extra_body_thinking = {"chat_template_kwargs": {"thinking": True}}
extra_body_nonthinking = {"chat_template_kwargs": {"thinking": False}}

response = client.chat.completions.create(
    model=model_name,
    messages=[
        {"role": "user", "content": "What’s the weather like tomorrow?"}
    ],
    extra_body=extra_body_thinking
)
print("=== Thinking response ===")
print(response)

response = client.chat.completions.create(
    model=model_name,
    messages=[
        {"role": "user", "content": "What’s the weather like tomorrow?"}
    ],
    extra_body=extra_body_nonthinking
)
print("=== NonThinking response ===")
print(response)

=== Thinking response ===
ChatCompletion(id='chatcmpl-27e5f6a59bb14359b3acdc2b9506cd55', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='I can check the weather for you! However, I don’t have real-time access to current or future weather data. To get the most accurate forecast for tomorrow, you can use a weather service like: \n\n- Weather.com \n- AccuWeather \n- The Weather Channel \n- Your device’s built-in weather app (e.g., Apple Weather, Google Weather) \n\nIf you tell me your location, I can help you find a reliable source or guide you on how to check! 😊', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], annotations=None, reasoning_content="Hmm, the user is asking for tomorrow's weather. Since I don't have real-time data access, I need to handle this gracefully. \n\nI should acknowledge the limitation upfront to set expectations, then offer a practical solution—like suggesting a weather service or asking for their location to provide instructions. \n\nKeeping it concise but helpful: a brief apology, a clear reason, and actionable alternatives. No need to overcomplicate it."), stop_reason=None, token_ids=None)], created=1758034986, model='DeepSeek-V3.1', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=197, prompt_tokens=15, total_tokens=212, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)
=== NonThinking response ===
ChatCompletion(id='chatcmpl-dad9101ee3f0464b98f28bba1a303d35', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='I can check the weather for you! Could you please tell me your city or location?', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], annotations=None, reasoning_content=None), stop_reason=None, token_ids=None)], created=1758034989, model='DeepSeek-V3.1, object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=19, prompt_tokens=15, total_tokens=34, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)

streaming output test:

response = client.chat.completions.create(
    model=model_name,
    messages=[
        {"role": "user", "content": "What’s the weather like tomorrow?"}
    ],
    extra_body=extra_body_thinking,
    stream=True
)

print("=== Stream & Thinking response ===")

full_content = ""


full_content = ""
in_reasoning = True

for chunk in response:
    delta = chunk.choices[0].delta
    
    if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
        text = delta.reasoning_content
        print(text, end="", flush=True)
        full_content += text

    if hasattr(delta, "content") and delta.content is not None:
        text = delta.content
        if text != "" and in_reasoning:
            print()
            print("=== end of reasoning ===")
            in_reasoning = False
        print(text, end="", flush=True)
        full_content += text

=== Stream & Thinking response ===
Hmm, the user is asking about tomorrow's weather. This is a straightforward request for location-based information. Since I don't have access to real-time data or the user's location, I need to ask for their city or zip code to provide an accurate forecast.

I should keep it simple and friendly, offering to help once they provide the necessary details. No need to overcomplicate it—just a clear prompt and an emoji to keep it approachable.

Let's ask for the location and mention we can check multiple sources to ensure reliability.
=== end of reasoning ===
'd love to help! Could you please tell me your city or location so I can check the weather forecast for tomorrow? 😊

Documents update

Updated docs/features/reasoning_outputs.md with the following content:

Model Series Parser Name Structured Output Support Tool Calling
DeepSeek-V3.1 deepseek_v3 json, regex

Tool Calling is supported in Non-Thinking mode but not in Thinking mode. See details at: https://huggingface.co/deepseek-ai/DeepSeek-V3.1

taohui and others added 5 commits September 24, 2025 15:24
fix(parser): ensure subclasses forward *args and **kwargs to super().__init__

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Tao Hui <taohui3@gmail.com>
…__init__

Signed-off-by: taohui <taohui3@gmail.com>
… reasoning

Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
@mergify mergify bot added documentation Improvements or additions to documentation deepseek Related to DeepSeek models frontend labels Sep 24, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new reasoning parser for the DeepSeek-V3.1 model, which dynamically selects between the existing DeepSeekR1ReasoningParser and a new IdentityReasoningParser based on the thinking flag in chat_template_kwargs. The changes are well-implemented, including updates to the base parser classes to allow for more flexible instantiation and the addition of relevant unit tests. However, I've identified a high-severity issue in the new DeepSeekV3ReasoningParser where it fails to propagate additional arguments to the underlying parser it creates. This could lead to incorrect behavior if other parameters are expected by the delegated parsers.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Tao Hui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Comment on lines +530 to +531
reasoning_parser = self.reasoning_parser(tokenizer)
reasoning_parser = self.reasoning_parser(
tokenizer, **(request.chat_template_kwargs or {}))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can implement the same logic simply by define a local variable like use_reasoning_parser = request.chat_template_kwargs.get("thinking", True) and self.reasoning_parser on the beginning of .chat_completion_stream_generator + .chat_completion_full_generator and replace all if self.reasoning_parser checks with if use_reasoning_parser.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I also thought about a similar approach, but there are a couple of issues to consider:

  • chat_template_kwargs doesn’t have a standardized protocol—for example, Qwen3 uses enable_thinking, while DeepSeek-V3.1 uses thinking. It’s better to let each reasoning parser decide how to interpret its own parameters.

  • Making this change would result in a larger modification. Besides deciding whether to use the reasoning parser, the tool parser also depends on reasoning_parser.is_reasoning_end. A more elegant approach, like in this PR, is to abstract an IdentityReasoningParser, which allows flexible switching of whether to use the reasoning parser.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation, I haven't fully taken the tool parser alongside reasoning parser into account 👍

@taohui
Copy link
Copy Markdown
Contributor Author

taohui commented Sep 27, 2025

Hi @chaunceyjiang ,
The first part of the split PR has already been merged #25554. Could you please help review this one as well? If any changes are needed, I’ll update accordingly. Thanks!
The original PR is at #24972 , where I also replied to your comments.

@chaunceyjiang chaunceyjiang self-assigned this Sep 27, 2025
@mergify
Copy link
Copy Markdown

mergify bot commented Oct 8, 2025

Documentation preview: https://vllm--25589.org.readthedocs.build/en/25589/

@taohui
Copy link
Copy Markdown
Contributor Author

taohui commented Oct 10, 2025

Hi @chaunceyjiang , when you have a moment, please take a look at this PR. Thanks a lot! Let’s discuss if you have any questions. 🙏

@yanmindi
Copy link
Copy Markdown

这个功能什么时候能合并到主线

Copy link
Copy Markdown
Collaborator

@chaunceyjiang chaunceyjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @aarnphm If you have time, could you please take a look at this?

if self.reasoning_parser:
reasoning_parser = self.reasoning_parser(tokenizer)
reasoning_parser = self.reasoning_parser(
tokenizer, **(request.chat_template_kwargs or {}))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?

    reasoning_parser = self.reasoning_parser(
                    tokenizer, chat_template_kwargs=request.chat_template_kwargs))

…apper

The DeepSeekV3ReasoningParser now expects parameters to be passed via
`chat_template_kwargs` instead of being expanded directly with `**kwargs`.
This aligns the call pattern with other reasoning parser constructors.

- Updated __init__ to unpack `chat_template_kwargs` internally
- Adjusted parser selection logic to extract `thinking` from the inner dict
- Updated related pytest test cases to pass `chat_template_kwargs` instead of direct kwargs

Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Copy link
Copy Markdown
Collaborator

@chaunceyjiang chaunceyjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks~

@chaunceyjiang chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 11, 2025
logger = init_logger(__name__)


@ReasoningParserManager.register_module("deepseek_v3")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @taohui, have you tested this ReasoningParser deepseek_v3 with https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, I’ll test it today.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waiting for your testing results..

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance is perfect — with --reasoning-parser deepseek_v3 enabled, it parses correctly. Without it, the performance is the same as deepseek_v3.1.

model_name = "DeepSeek-V3.2-Exp"
extra_body_thinking = {"chat_template_kwargs": {"thinking": True}}
extra_body_nonthinking = {"chat_template_kwargs": {"thinking": False}}

response = client.chat.completions.create(
    model=model_name,
    messages=[
        {"role": "user", "content": "What’s the weather like tomorrow?"}
    ],
    extra_body=extra_body_thinking
)
print("=== Thinking response ===")
print(response)

response = client.chat.completions.create(
    model=model_name,
    messages=[
        {"role": "user", "content": "What’s the weather like tomorrow?"}
    ],
    extra_body=extra_body_nonthinking
)
print("=== NonThinking response ===")
print(response)

Output is:

=== Thinking response ===
ChatCompletion(id='chatcmpl-4984722fc97c4ee2ace19026f8825ee3', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='I can't check real-time weather, but you can get a reliable forecast for tomorrow by:\n\n• Searching online for "weather [your city]" \n• Asking your phone's assistant (Siri/Google Assistant) \n• Checking a weather app like The Weather Channel, AccuWeather, or your default phone app \n\nLet me know if you need help interpreting any weather terms once you have the forecast! ☀️🌧️⛅', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning_content="Hmm, the user is asking about tomorrow's weather. This is a straightforward request, but I don't have access to real-time data. \n\nI need to acknowledge the limitation upfront while offering helpful alternatives. The user likely wants actionable information, so suggesting reliable weather sources would be useful. \n\nI can list a few trusted options like weather apps and websites, and offer to help interpret the forecast if they provide their location. Keeping it concise but practical seems best here."), stop_reason=None, token_ids=None)], created=1760363936, model='DeepSeek-V3.2-Exp', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=194, prompt_tokens=14, total_tokens=208, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)
=== NonThinking response ===
ChatCompletion(id='chatcmpl-fc86fdc39d424c46897cdc18406b4267', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='I can check the latest weather forecast for you! Please enable "联网搜索" in the app settings, and I’ll fetch real-time weather information for your location. Alternatively, you can tell me your city or region, and I’ll look it up for you! 😊', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning_content=None), stop_reason=None, token_ids=None)], created=1760363945, model='DeepSeek-V3.2-Exp', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=60, prompt_tokens=14, total_tokens=74, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)

If start without --reasoning-parser deepseek_v3, the response is:

=== Thinking response ===
ChatCompletion(id='chatcmpl-0e0e975c6eae4e0a97cccbd41dc4c039', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Hmm, the user is asking about tomorrow's weather. This is a straightforward request, but I don't have access to real-time data. \n\nI need to acknowledge the limitation upfront while offering helpful alternatives. The user likely wants actionable information, so suggesting reliable weather sources would be useful. \n\nI can list a few trusted options like weather apps and websites, and offer to help interpret the forecast if they provide their location. Keeping it concise but practical seems best here.I can't check real-time weather, but you can get a reliable forecast for tomorrow by:\n\n• Searching online for "weather [your city]" \n• Asking your phone's assistant (Siri/Google Assistant) \n• Checking a weather app like The Weather Channel, AccuWeather, or your default phone app \n\nLet me know if you need help interpreting any weather terms once you have the forecast! ☀️🌧️⛅', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning_content=None), stop_reason=None, token_ids=None)], created=1760364966, model='DeepSeek-V3.2-Exp', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=194, prompt_tokens=14, total_tokens=208, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope to merge this PR into main branch ASAP. 👏

@mergify
Copy link
Copy Markdown

mergify bot commented Oct 13, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @taohui.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 13, 2025
@mergify mergify bot removed the needs-rebase label Oct 13, 2025
…ent_streaming return annotation

Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
@chaunceyjiang chaunceyjiang merged commit 85a65e7 into vllm-project:main Oct 15, 2025
48 checks passed
bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025
…t#24972) (vllm-project#25589)

Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: Tao Hui <taohui3@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: bbartels <benjamin@bartels.dev>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…t#24972) (vllm-project#25589)

Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: Tao Hui <taohui3@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
…t#24972) (vllm-project#25589)

Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: Tao Hui <taohui3@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…t#24972) (vllm-project#25589)

Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: Tao Hui <taohui3@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…t#24972) (vllm-project#25589)

Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: Tao Hui <taohui3@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
…t#24972) (vllm-project#25589)

Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: Tao Hui <taohui3@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
…t#24972) (vllm-project#25589)

Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: Tao Hui <taohui3@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants