Skip to content

[Model] Deepseek-V3.1 reasoning parser#24972

Closed
taohui wants to merge 20 commits intovllm-project:mainfrom
taohui:deepseek_v3.1_reasoning_parser
Closed

[Model] Deepseek-V3.1 reasoning parser#24972
taohui wants to merge 20 commits intovllm-project:mainfrom
taohui:deepseek_v3.1_reasoning_parser

Conversation

@taohui
Copy link
Copy Markdown
Contributor

@taohui taohui commented Sep 16, 2025

Purpose

This PR adds a new reasoning parser for the DeepSeek-V3.1 model, named deepseek_v3. Unlike previous models such as deepseek_r1, the reasoning parser for DeepSeek-V3.1 is deterministic. Specifically:

When a request includes chat_template_kwargs": {"thinking": True}, the model uses the deepseek_r1 reasoning parser.

Otherwise, it uses a new IdentityReasoningParser, which implements the ReasoningParser interface but does not perform actual reasoning, effectively acting as a placeholder.

This ensures compatibility with the reasoning parser interface while preserving deterministic behavior for DeepSeek-V3.1.

Test Plan

Unit test

Added tests/reasoning/test_deepseekv3_reasoning_parser.py to verify that deepseek_v3 correctly selects either DeepSeekR1ReasoningParser or IdentityReasoningParser based on chat_template_kwargs": {"thinking": True}.

Implemented additional checks to ensure the newly added IdentityReasoningParser conforms to the ReasoningParser interface.

Unit test command:

pytest tests/reasoning/test_deepseekv3_reasoning_parser.py

Unit test result:

=============================================================================================== test session starts ================================================================================================
platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0
rootdir: /vllm-workspace/vllm
configfile: pyproject.toml
plugins: anyio-4.10.0
collected 3 items

tests/reasoning/test_deepseekv3_reasoning_parser.py ... [100%]

========================================================================================== 3 passed, 4 warnings in 4.69s ===========================================================================================

Serving test

Verified via service interface that both non-streaming and streaming requests correctly populate content and reasoning_content when chat_template_kwargs": {"thinking": True} is set or not. The code is bellow:

model_name = "DeepSeek-V3.1"
extra_body_thinking = {"chat_template_kwargs": {"thinking": True}}
extra_body_nonthinking = {"chat_template_kwargs": {"thinking": False}}

response = client.chat.completions.create(
    model=model_name,
    messages=[
        {"role": "user", "content": "What’s the weather like tomorrow?"}
    ],
    extra_body=extra_body_thinking
)
print("=== Thinking response ===")
print(response)

response = client.chat.completions.create(
    model=model_name,
    messages=[
        {"role": "user", "content": "What’s the weather like tomorrow?"}
    ],
    extra_body=extra_body_nonthinking
)
print("=== NonThinking response ===")
print(response)

=== Thinking response ===
ChatCompletion(id='chatcmpl-27e5f6a59bb14359b3acdc2b9506cd55', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='I can check the weather for you! However, I don’t have real-time access to current or future weather data. To get the most accurate forecast for tomorrow, you can use a weather service like: \n\n- Weather.com \n- AccuWeather \n- The Weather Channel \n- Your device’s built-in weather app (e.g., Apple Weather, Google Weather) \n\nIf you tell me your location, I can help you find a reliable source or guide you on how to check! 😊', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], annotations=None, reasoning_content="Hmm, the user is asking for tomorrow's weather. Since I don't have real-time data access, I need to handle this gracefully. \n\nI should acknowledge the limitation upfront to set expectations, then offer a practical solution—like suggesting a weather service or asking for their location to provide instructions. \n\nKeeping it concise but helpful: a brief apology, a clear reason, and actionable alternatives. No need to overcomplicate it."), stop_reason=None, token_ids=None)], created=1758034986, model='DeepSeek-V3.1', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=197, prompt_tokens=15, total_tokens=212, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)
=== NonThinking response ===
ChatCompletion(id='chatcmpl-dad9101ee3f0464b98f28bba1a303d35', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='I can check the weather for you! Could you please tell me your city or location?', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], annotations=None, reasoning_content=None), stop_reason=None, token_ids=None)], created=1758034989, model='DeepSeek-V3.1, object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=19, prompt_tokens=15, total_tokens=34, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)

streaming output test:

response = client.chat.completions.create(
    model=model_name,
    messages=[
        {"role": "user", "content": "What’s the weather like tomorrow?"}
    ],
    extra_body=extra_body_thinking,
    stream=True
)

print("=== Stream & Thinking response ===")

full_content = ""


full_content = ""
in_reasoning = True

for chunk in response:
    delta = chunk.choices[0].delta
    
    if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
        text = delta.reasoning_content
        print(text, end="", flush=True)
        full_content += text

    if hasattr(delta, "content") and delta.content is not None:
        text = delta.content
        if text != "" and in_reasoning:
            print()
            print("=== end of reasoning ===")
            in_reasoning = False
        print(text, end="", flush=True)
        full_content += text

=== Stream & Thinking response ===
Hmm, the user is asking about tomorrow's weather. This is a straightforward request for location-based information. Since I don't have access to real-time data or the user's location, I need to ask for their city or zip code to provide an accurate forecast.

I should keep it simple and friendly, offering to help once they provide the necessary details. No need to overcomplicate it—just a clear prompt and an emoji to keep it approachable.

Let's ask for the location and mention we can check multiple sources to ensure reliability.
=== end of reasoning ===
'd love to help! Could you please tell me your city or location so I can check the weather forecast for tomorrow? 😊

Documents update

Updated docs/features/reasoning_outputs.md with the following content:

Model Series Parser Name Structured Output Support Tool Calling
DeepSeek-V3.1 deepseek_v3 json, regex

Tool Calling is supported in Non-Thinking mode but not in Thinking mode. See details at: https://huggingface.co/deepseek-ai/DeepSeek-V3.1

… reasoning

via request parameters, in line with DeepSeek-v3.1 model behavior.

Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@mergify mergify bot added documentation Improvements or additions to documentation deepseek Related to DeepSeek models frontend labels Sep 16, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new reasoning parser deepseek_v3 for the DeepSeek-V3.1 model. The implementation uses a delegation pattern, selecting between DeepSeekR1ReasoningParser and a new IdentityReasoningParser based on the thinking parameter in chat_template_kwargs. The changes to pass these kwargs from the serving layer to the parser constructor are correct. However, this change introduces a critical issue where existing reasoning parsers that don't accept arbitrary keyword arguments in their constructor will break, causing a TypeError. I've added a comment with a suggested fix for the new IdentityReasoningParser and noted that other parsers need a similar update.

Signed-off-by: taohui <taohui3@gmail.com>
@mergify mergify bot added the qwen Related to Qwen models label Sep 16, 2025
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
@mergify
Copy link
Copy Markdown

mergify bot commented Sep 19, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @taohui.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 19, 2025
Change the return value of `is_reasoning_end` from False to True,
since reasoning is not treated specially and the function should always return True.

Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: taohui <taohui3@gmail.com>
@mergify mergify bot removed the needs-rebase label Sep 22, 2025
end_token: str = "</think>"

def __init__(self, tokenizer: PreTrainedTokenizerBase):
def __init__(self, tokenizer: PreTrainedTokenizerBase, *args, **kwargs):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest splitting this change into a new PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will split it into two.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mergify
Copy link
Copy Markdown

mergify bot commented Sep 24, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @taohui.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 24, 2025
def __init__(self, tokenizer: PreTrainedTokenizerBase, *args, **kwargs):
super().__init__(tokenizer)

thinking = bool(kwargs.pop("thinking", False))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, Qwen3 also supports parameters like

{
    "chat_template_kwargs": {
        "enable_thinking": enable_thinking
    }
}

to enable or disable reasoning, and unlike this PR, it does not require two separate reasoning_parsers.
Is there anything special about DeepSeek-V3 that necessitates this different approach?

https://github.com/vllm-project/vllm/blob/main/tests/entrypoints/openai/test_completion_with_function_calling.py#L177-L178

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reply. DeepSeek-V3.1 cannot directly use the qwen3 reasoning parser, because DeepSeek-V3.1 only has the end tag but no start tag . If we use the qwen3 reasoning parser, all content would be placed into the reasoning_content field. It also cannot directly use the deepseek_r1 reasoning parser, because that parser cannot decide whether to enable reasoning based on the request. For example, if I specify --reasoning-parser deepseek_r1 but the request is in non-thinking mode, it would still put all content into the reasoning_content field. The key point here is that we need a way to decide whether to use the reasoning parser based on the request. Therefore, I made a modification: applying a Strategy Pattern to select different strategies based on the request, while keeping the interface unchanged.

@mergify
Copy link
Copy Markdown

mergify bot commented Oct 8, 2025

Documentation preview: https://vllm--24972.org.readthedocs.build/en/24972/

chaunceyjiang added a commit that referenced this pull request Oct 15, 2025
…5589)

Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: Tao Hui <taohui3@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025
…t#24972) (vllm-project#25589)

Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: Tao Hui <taohui3@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: bbartels <benjamin@bartels.dev>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…t#24972) (vllm-project#25589)

Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: Tao Hui <taohui3@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
…t#24972) (vllm-project#25589)

Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: Tao Hui <taohui3@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…t#24972) (vllm-project#25589)

Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: Tao Hui <taohui3@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…t#24972) (vllm-project#25589)

Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: Tao Hui <taohui3@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
…t#24972) (vllm-project#25589)

Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: Tao Hui <taohui3@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
…t#24972) (vllm-project#25589)

Signed-off-by: taohui <taohui3@gmail.com>
Signed-off-by: Tao Hui <taohui3@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 7, 2026

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

@github-actions github-actions bot added the stale Over 90 days of inactivity label Jan 7, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 7, 2026

This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you!

@github-actions github-actions bot closed this Feb 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models documentation Improvements or additions to documentation frontend needs-rebase qwen Related to Qwen models stale Over 90 days of inactivity

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants