Skip to content

[Bugfix] fix the default reasoning mode in Reasoner Grammar.#10831

Open
jeremyzhang866 wants to merge 22 commits intosgl-project:mainfrom
jeremyzhang866:jeremy_reasoner_grammer_fix
Open

[Bugfix] fix the default reasoning mode in Reasoner Grammar.#10831
jeremyzhang866 wants to merge 22 commits intosgl-project:mainfrom
jeremyzhang866:jeremy_reasoner_grammer_fix

Conversation

@jeremyzhang866
Copy link

@jeremyzhang866 jeremyzhang866 commented Sep 24, 2025

Motivation

When we start a hybrid reasoning model on the server side, such as DeepSeek V3.1 or Qwen-14B, and enable both the reasoning parser and speculative decoding, then on the client side disable “thinking mode” in chat_template_kwargs while also enabling JSON-constrained decoding, we observe that SGLang produces abnormal results.
#10789

Modifications

We found that this may be related to the ReasonerGrammarBackend, where the ReasonerGrammarObject defaults to is_in_reasoning = true. Therefore, we are considering some actions to take when “thinking mode” is disabled in chat_template_kwargs.

script

import json
from openai import OpenAI

def main(model_name="Qwen3-14B", thinking=False, use_json_schema=None):

    niogpt_base_url = "http://0.0.0.0:30000/v1"
    niogpt_api_key = "sk-no-api-key-needed"
    client = OpenAI(
        api_key=niogpt_api_key,
        base_url=niogpt_base_url
    )
    json_schema = {
        "type": "object",
        "properties": {
            "population": {"type": "integer"},
            "name": {"type": "string", "pattern": "^[\\w]+$"},
        },
        "required": ["name", "population"],
    }
    
    chat_kwargs = {"enable_thinking": thinking}

    # 基础请求参数
    request_params = dict(
        model=model_name,
        messages=[
            {
                "role": "user",
                "content": "show me the information of the capital of China in the JSON format.",
            }
        ],
        temperature=0,
        max_tokens=512,
        extra_body={"chat_template_kwargs": chat_kwargs}
    )
    if use_json_schema is True:
        request_params["response_format"] = {
            "type": "json_schema",
            "json_schema": {"name": "foo", "schema": json_schema}
        }

    response = client.chat.completions.create(**request_params)

    print("========== completion_tokens ==========")
    print(response.usage.completion_tokens)

    print("========== content ==========")
    for choice in response.choices:
        print(choice.message.content if choice.message.content is not None else "None")

    print("========== reasoning_content ==========")
    for choice in response.choices:
        print(choice.message.reasoning_content if choice.message.reasoning_content is not None else "None")
        print("=" * 50 + "\n")


if __name__ == "__main__":
    main(model_name="Qwen3-14B", thinking=False, use_json_schema=True)

  • before
========== completion_tokens ==========
512
========== content ==========
```!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
========== reasoning_content ==========
None
==================================================
  • After PR
========== completion_tokens ==========
22
========== content ==========
{"population": 215423946, "name": "Beijing"}
========== reasoning_content ==========
None
==================================================

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@jeremyzhang866
Copy link
Author

@minleminzui please have a look

@jeremyzhang866
Copy link
Author

@hnyls2002 I reopened the PR — please review when convenient. Thanks!

@jeremyzhang866
Copy link
Author

convenient

@hnyls2002 have a look.thanks

@jeremyzhang866
Copy link
Author

@zhyncs Can you help review this?

@minleminzui
Copy link
Collaborator

@jeremyzhang866 sglang/test/srt/test_reasoning_parser.py,could you please verify whether you have passed this test

@jeremyzhang866
Copy link
Author

jeremyzhang866 commented Sep 25, 2025

@jeremyzhang866 sglang/test/srt/test_reasoning_parser.py,could you please verify whether you have passed this test

@minleminzui I tested it and found no issues. Do you have any suggestions? Thanks.

zjm.zhang@chronic-daddy-founder-hzh-zjm-zhang-master-0:~/zjm_workspace/sglang/test/srt$ python3 test_reasoning_parser.py
................................................
----------------------------------------------------------------------
Ran 48 tests in 0.002s

@jeremyzhang866
Copy link
Author

@hnyls2002 @zhyncs Could you help review it when you have time

@jeremyzhang866
Copy link
Author

@xiezhq-hermann Could you help review it when you have time

@JustinTong0323
Copy link
Collaborator

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug where grammar constraints were not correctly applied for reasoning models when 'thinking mode' was disabled. The fix involves introducing a may_can_reasoning flag that is passed from the scheduler to the ReasonerGrammarObject to correctly initialize its reasoning state. The overall approach is sound and effectively resolves the issue. My review includes a few suggestions to improve code clarity and maintainability by restoring type hints that were removed in base_grammar_backend.py and correcting an inconsistent type hint for the cache.

Copy link
Collaborator

@JustinTong0323 JustinTong0323 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Overall LGTM, could you help resolve gemini's comments?

@jeremyzhang866
Copy link
Author

jeremyzhang866 commented Sep 26, 2025

Thanks for the contribution! Overall LGTM, could you help resolve gemini's comments?
@JustinTong0323 I have fixed this issue, please review it again, thanks.

@jeremyzhang866
Copy link
Author

@JustinTong0323 Could you help review it again? Thanks.

@jeremyzhang866
Copy link
Author

@JustinTong0323 are there any other comments, or can it be merged? thanks :)

@JustinTong0323
Copy link
Collaborator

@JustinTong0323 are there any other comments, or can it be merged? thanks :)

We need to wait for "required" CI green but there seems some issue with huggingface side blocking the PR. I would keep an eye on this PR, thanks for your effort!

@jeremyzhang866
Copy link
Author

@JustinTong0323 are there any other comments, or can it be merged? thanks :)

We need to wait for "required" CI green but there seems some issue with huggingface side blocking the PR. I would keep an eye on this PR, thanks for your effort!

Thank you for your reply.

@jeremyzhang866
Copy link
Author

jeremyzhang866 commented Oct 24, 2025

@hnyls2002 @xiezhq-hermann please have a look. thanks

@JustinTong0323
Copy link
Collaborator

JustinTong0323 commented Oct 28, 2025

@hnyls2002 @xiezhq-hermann please have a look. thanks

Sorry for the late reply, just got a similar issue, and it could be better to add a test for this corner case..
(and also due to lots of PR waiting for merge I would really appreciate if you could ping me on Slack to accelerate the process of any PR.

super().__init__()
self.grammar = grammar
self.think_end_id = think_end_id
self.is_in_reasoning = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we simpily set this field to False?

@jeremyzhang866
Copy link
Author

Sorry for the late reply, just got a similar issue, and it could be better to add a test for this corner case..
(and also due to lots of PR waiting for merge I would really appreciate if you could ping me on Slack to accelerate the process of any PR.

@JustinTong0323 Sorry Could you please clarify what I should do? I thought the example above already showed the issue. Thanks.

@JustinTong0323
Copy link
Collaborator

Sorry for the late reply, just got a similar issue, and it could be better to add a test for this corner case..
(and also due to lots of PR waiting for merge I would really appreciate if you could ping me on Slack to accelerate the process of any PR.

@JustinTong0323 Sorry Could you please clarify what I should do? I thought the example above already showed the issue. Thanks.

May you check the PR that just mentioned? Maybe you could discuss and find out how to combine your PRs... (and sorry I am OOO this week so reply maybe late...

@jeremyzhang866
Copy link
Author

ou check the PR that just mentioned? Maybe you could discuss and find out how to combine your PRs... (and sorry I am OOO this week so reply maybe late...

Sorry for the late reply, just got a similar issue, and it could be better to add a test for this corner case..
(and also due to lots of PR waiting for merge I would really appreciate if you could ping me on Slack to accelerate the process of any PR.

@JustinTong0323 Sorry Could you please clarify what I should do? I thought the example above already showed the issue. Thanks.

May you check the PR that just mentioned? Maybe you could discuss and find out how to combine your PRs... (and sorry I am OOO this week so reply maybe late...

Thanks for your reply. I just checked that PR — the motivation behind it is the same as mine.

That PR might have better extensibility, but my changes are a bit simpler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants