[Model] Deepseek-V3.1 reasoning parser by taohui · Pull Request #24972 · vllm-project/vllm

taohui · 2025-09-16T15:51:36Z

Purpose

This PR adds a new reasoning parser for the DeepSeek-V3.1 model, named deepseek_v3. Unlike previous models such as deepseek_r1, the reasoning parser for DeepSeek-V3.1 is deterministic. Specifically:

When a request includes chat_template_kwargs": {"thinking": True}, the model uses the deepseek_r1 reasoning parser.

Otherwise, it uses a new IdentityReasoningParser, which implements the ReasoningParser interface but does not perform actual reasoning, effectively acting as a placeholder.

This ensures compatibility with the reasoning parser interface while preserving deterministic behavior for DeepSeek-V3.1.

Test Plan

Unit test

Added tests/reasoning/test_deepseekv3_reasoning_parser.py to verify that deepseek_v3 correctly selects either DeepSeekR1ReasoningParser or IdentityReasoningParser based on chat_template_kwargs": {"thinking": True}.

Implemented additional checks to ensure the newly added IdentityReasoningParser conforms to the ReasoningParser interface.

Unit test command:

pytest tests/reasoning/test_deepseekv3_reasoning_parser.py

Unit test result:

=============================================================================================== test session starts ================================================================================================
platform linux -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0
rootdir: /vllm-workspace/vllm
configfile: pyproject.toml
plugins: anyio-4.10.0
collected 3 items

tests/reasoning/test_deepseekv3_reasoning_parser.py ... [100%]

========================================================================================== 3 passed, 4 warnings in 4.69s ===========================================================================================

Serving test

Verified via service interface that both non-streaming and streaming requests correctly populate content and reasoning_content when chat_template_kwargs": {"thinking": True} is set or not. The code is bellow:

model_name = "DeepSeek-V3.1"
extra_body_thinking = {"chat_template_kwargs": {"thinking": True}}
extra_body_nonthinking = {"chat_template_kwargs": {"thinking": False}}

response = client.chat.completions.create(
    model=model_name,
    messages=[
        {"role": "user", "content": "What’s the weather like tomorrow?"}
    ],
    extra_body=extra_body_thinking
)
print("=== Thinking response ===")
print(response)

response = client.chat.completions.create(
    model=model_name,
    messages=[
        {"role": "user", "content": "What’s the weather like tomorrow?"}
    ],
    extra_body=extra_body_nonthinking
)
print("=== NonThinking response ===")
print(response)

=== Thinking response ===
ChatCompletion(id='chatcmpl-27e5f6a59bb14359b3acdc2b9506cd55', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='I can check the weather for you! However, I don’t have real-time access to current or future weather data. To get the most accurate forecast for tomorrow, you can use a weather service like: \n\n- Weather.com \n- AccuWeather \n- The Weather Channel \n- Your device’s built-in weather app (e.g., Apple Weather, Google Weather) \n\nIf you tell me your location, I can help you find a reliable source or guide you on how to check! 😊', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], annotations=None, reasoning_content="Hmm, the user is asking for tomorrow's weather. Since I don't have real-time data access, I need to handle this gracefully. \n\nI should acknowledge the limitation upfront to set expectations, then offer a practical solution—like suggesting a weather service or asking for their location to provide instructions. \n\nKeeping it concise but helpful: a brief apology, a clear reason, and actionable alternatives. No need to overcomplicate it."), stop_reason=None, token_ids=None)], created=1758034986, model='DeepSeek-V3.1', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=197, prompt_tokens=15, total_tokens=212, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)
=== NonThinking response ===
ChatCompletion(id='chatcmpl-dad9101ee3f0464b98f28bba1a303d35', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='I can check the weather for you! Could you please tell me your city or location?', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], annotations=None, reasoning_content=None), stop_reason=None, token_ids=None)], created=1758034989, model='DeepSeek-V3.1, object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=19, prompt_tokens=15, total_tokens=34, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)

streaming output test:

response = client.chat.completions.create(
    model=model_name,
    messages=[
        {"role": "user", "content": "What’s the weather like tomorrow?"}
    ],
    extra_body=extra_body_thinking,
    stream=True
)

print("=== Stream & Thinking response ===")

full_content = ""


full_content = ""
in_reasoning = True

for chunk in response:
    delta = chunk.choices[0].delta
    
    if hasattr(delta, "reasoning_content") and delta.reasoning_content is not None:
        text = delta.reasoning_content
        print(text, end="", flush=True)
        full_content += text

    if hasattr(delta, "content") and delta.content is not None:
        text = delta.content
        if text != "" and in_reasoning:
            print()
            print("=== end of reasoning ===")
            in_reasoning = False
        print(text, end="", flush=True)
        full_content += text

=== Stream & Thinking response ===
Hmm, the user is asking about tomorrow's weather. This is a straightforward request for location-based information. Since I don't have access to real-time data or the user's location, I need to ask for their city or zip code to provide an accurate forecast.

I should keep it simple and friendly, offering to help once they provide the necessary details. No need to overcomplicate it—just a clear prompt and an emoji to keep it approachable.

Let's ask for the location and mention we can check multiple sources to ensure reliability.
=== end of reasoning ===
'd love to help! Could you please tell me your city or location so I can check the weather forecast for tomorrow? 😊

Documents update

Updated docs/features/reasoning_outputs.md with the following content:

Model Series	Parser Name	Structured Output Support	Tool Calling
DeepSeek-V3.1	`deepseek_v3`	`json`, `regex`	❌

Tool Calling is supported in Non-Thinking mode but not in Thinking mode. See details at: https://huggingface.co/deepseek-ai/DeepSeek-V3.1

… reasoning via request parameters, in line with DeepSeek-v3.1 model behavior. Signed-off-by: taohui <taohui3@gmail.com>

Signed-off-by: taohui <taohui3@gmail.com>

github-actions · 2025-09-16T15:51:46Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request introduces a new reasoning parser deepseek_v3 for the DeepSeek-V3.1 model. The implementation uses a delegation pattern, selecting between DeepSeekR1ReasoningParser and a new IdentityReasoningParser based on the thinking parameter in chat_template_kwargs. The changes to pass these kwargs from the serving layer to the parser constructor are correct. However, this change introduces a critical issue where existing reasoning parsers that don't accept arbitrary keyword arguments in their constructor will break, causing a TypeError. I've added a comment with a suggested fix for the new IdentityReasoningParser and noted that other parsers need a similar update.

vllm/reasoning/identity_reasoning_parser.py

Signed-off-by: taohui <taohui3@gmail.com>

mergify · 2025-09-19T19:21:49Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @taohui.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Change the return value of `is_reasoning_end` from False to True, since reasoning is not treated specially and the function should always return True. Signed-off-by: taohui <taohui3@gmail.com>

Signed-off-by: taohui <taohui3@gmail.com>

chaunceyjiang · 2025-09-24T03:19:16Z

vllm/reasoning/deepseek_r1_reasoning_parser.py

    end_token: str = "</think>"

-    def __init__(self, tokenizer: PreTrainedTokenizerBase):
+    def __init__(self, tokenizer: PreTrainedTokenizerBase, *args, **kwargs):


I suggest splitting this change into a new PR.

OK, I will split it into two.

Hi @chaunceyjiang ,

I have split the original PR into two separate PRs for clarity and easier review:

[Model] Add optional parameter to reasoning parser constructor #25554

[Model] Add DeepSeek-V3.1 reasoning parser (split from PR #24972) #25589

Thank you!

mergify · 2025-09-24T03:19:49Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @taohui.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

chaunceyjiang · 2025-09-24T03:44:48Z

vllm/reasoning/deepseek_v3_reasoning_parser.py

+    def __init__(self, tokenizer: PreTrainedTokenizerBase, *args, **kwargs):
+        super().__init__(tokenizer)
+
+        thinking = bool(kwargs.pop("thinking", False))


Currently, Qwen3 also supports parameters like

{ "chat_template_kwargs": { "enable_thinking": enable_thinking } }

to enable or disable reasoning, and unlike this PR, it does not require two separate reasoning_parsers.
Is there anything special about DeepSeek-V3 that necessitates this different approach?

https://github.com/vllm-project/vllm/blob/main/tests/entrypoints/openai/test_completion_with_function_calling.py#L177-L178

Thank you for reply. DeepSeek-V3.1 cannot directly use the qwen3 reasoning parser, because DeepSeek-V3.1 only has the end tag but no start tag . If we use the qwen3 reasoning parser, all content would be placed into the reasoning_content field. It also cannot directly use the deepseek_r1 reasoning parser, because that parser cannot decide whether to enable reasoning based on the request. For example, if I specify --reasoning-parser deepseek_r1 but the request is in non-thinking mode, it would still put all content into the reasoning_content field. The key point here is that we need a way to decide whether to use the reasoning parser based on the request. Therefore, I made a modification: applying a Strategy Pattern to select different strategies based on the request, while keeping the interface unchanged.

mergify · 2025-10-08T14:39:28Z

Documentation preview: https://vllm--24972.org.readthedocs.build/en/24972/

…5589) Signed-off-by: taohui <taohui3@gmail.com> Signed-off-by: Tao Hui <taohui3@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

…t#24972) (vllm-project#25589) Signed-off-by: taohui <taohui3@gmail.com> Signed-off-by: Tao Hui <taohui3@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev>

…t#24972) (vllm-project#25589) Signed-off-by: taohui <taohui3@gmail.com> Signed-off-by: Tao Hui <taohui3@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

…t#24972) (vllm-project#25589) Signed-off-by: taohui <taohui3@gmail.com> Signed-off-by: Tao Hui <taohui3@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

…t#24972) (vllm-project#25589) Signed-off-by: taohui <taohui3@gmail.com> Signed-off-by: Tao Hui <taohui3@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

github-actions · 2026-01-07T02:22:27Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

github-actions · 2026-02-07T02:17:04Z

This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you!

taohui added 6 commits September 16, 2025 19:21

Add a deepseek_v3 reasoning parser that supports dynamically enabling…

8249bf6

… reasoning via request parameters, in line with DeepSeek-v3.1 model behavior. Signed-off-by: taohui <taohui3@gmail.com>

docs: add DeepSeek-V3.1 reasoning parser to Reasoning Outputs section

7c704d8

Signed-off-by: taohui <taohui3@gmail.com>

test(reasoning): add unit tests for DeepSeekV3ReasoningParser

cf2eb7b

Signed-off-by: taohui <taohui3@gmail.com>

Add unit tests for DeepSeekV3ReasoningParser parser selection

c4bf074

Signed-off-by: taohui <taohui3@gmail.com>

remove separate_reasoning

53e9cdc

Signed-off-by: taohui <taohui3@gmail.com>

test: add basic IdentityReasoningParser tests

d7399df

Signed-off-by: taohui <taohui3@gmail.com>

taohui requested review from aarnphm, chaunceyjiang and hmellor as code owners September 16, 2025 15:51

mergify bot added documentation Improvements or additions to documentation deepseek Related to DeepSeek models frontend labels Sep 16, 2025

gemini-code-assist bot reviewed Sep 16, 2025

View reviewed changes

vllm/reasoning/identity_reasoning_parser.py Outdated Show resolved Hide resolved

Fix: Update ReasoningParser __init__ to accept arbitrary kwargs

71eb7d7

Signed-off-by: taohui <taohui3@gmail.com>

mergify bot added the qwen Related to Qwen models label Sep 16, 2025

taohui added 9 commits September 17, 2025 11:02

remove unused import

9ff007d

Signed-off-by: taohui <taohui3@gmail.com>

fix code style

709dd39

Signed-off-by: taohui <taohui3@gmail.com>

fix code style

118bfb3

Signed-off-by: taohui <taohui3@gmail.com>

fix code style

e69cfa1

Signed-off-by: taohui <taohui3@gmail.com>

fix code style

c5120d3

Signed-off-by: taohui <taohui3@gmail.com>

fix code style

b25ce53

Signed-off-by: taohui <taohui3@gmail.com>

fix code style

1c98c20

Signed-off-by: taohui <taohui3@gmail.com>

fix(reasoning): resolve circular import in DeepSeekV3ReasoningParser

43d9800

Signed-off-by: taohui <taohui3@gmail.com>

fix code style

3573024

Signed-off-by: taohui <taohui3@gmail.com>

mergify bot added the needs-rebase label Sep 19, 2025

taohui added 3 commits September 22, 2025 19:05

fix: correct is_reasoning_end return value

a7dc723

Change the return value of `is_reasoning_end` from False to True, since reasoning is not treated specially and the function should always return True. Signed-off-by: taohui <taohui3@gmail.com>

docs: update documentation

ed06fc2

Signed-off-by: taohui <taohui3@gmail.com>

Merge branch 'main' into deepseek_v3.1_reasoning_parser

520e576

Signed-off-by: taohui <taohui3@gmail.com>

mergify bot removed the needs-rebase label Sep 22, 2025

Merge branch 'main' into deepseek_v3.1_reasoning_parser

6f14555

chaunceyjiang reviewed Sep 24, 2025

View reviewed changes

mergify bot added the needs-rebase label Sep 24, 2025

chaunceyjiang reviewed Sep 24, 2025

View reviewed changes

This was referenced Sep 24, 2025

[Model] Add optional parameter to reasoning parser constructor #25554

Merged

[Model] Add DeepSeek-V3.1 reasoning parser (split from PR #24972) #25589

Merged

github-actions bot added the stale Over 90 days of inactivity label Jan 7, 2026

github-actions bot closed this Feb 7, 2026

Uh oh!

Conversation

taohui commented Sep 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Unit test

Serving test

Documents update

Uh oh!

github-actions bot commented Sep 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mergify bot commented Sep 19, 2025

Uh oh!

chaunceyjiang Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

taohui Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

taohui Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Sep 24, 2025

Uh oh!

chaunceyjiang Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

taohui Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Oct 8, 2025

Uh oh!

github-actions bot commented Jan 7, 2026

Uh oh!

github-actions bot commented Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

taohui commented Sep 16, 2025 •

edited by github-actions bot

Loading