Support for reasoning_content in API by tot0 · Pull Request #3202 · sgl-project/sglang

tot0 · 2025-01-29T03:33:17Z

Not sure if it's how folks want this done, but it works for me :)

~~Sorry I don't have time to add unit tests right now, if the general implementation is acceptable I can look at how to add the unit tests soon.~~
I found the time to implement unittests!

#3043

Motivation

Some people wanted the reasoning_content functionality like DeepSeeks API.

Modifications

Added a reasoning parser to /chat/completions API similar to Tool parsing.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling.

…esponse behavior.

lambert0312 · 2025-02-08T04:32:07Z

Good work! It works for me. :D

python/sglang/srt/reasoning_parser.py

python/sglang/srt/openai_api/adapter.py

…model, also handle first response while separating reasoning.

tot0 · 2025-02-28T20:41:58Z

Move this docs/backend/reasoning_parser.md to docs/references and add it into the index of https://github.com/sgl-project/sglang/blob/main/docs/references/deepseek.rst.

Ok

tot0 · 2025-02-28T20:45:22Z

Also looking at the unit test failures.

tot0 · 2025-02-28T21:57:25Z

@zhaochenyang20 addressed your comment.

zhaochenyang20 · 2025-03-01T08:07:17Z

@shuaills This is related to your work? Could you review this? Thnaks!

xihuai18 · 2025-03-01T08:48:58Z

I found a problem. When I use seperate_reasoning and model name that does not contain "deepseek-r1", I can not use the deepseek-r1 parser.

Can the function implemented as

when serve model using --enable-reasoning and --reasoning-parser, seperate reasoning content automatically.
when serve model using --reasoning-parser and receive request with seperate_reasoning, seperate the reasoning content with specified parser.

shuaills · 2025-03-01T11:34:46Z

I will review this @zhaochenyang20

tot0 · 2025-03-01T15:58:00Z

I found a problem. When I use seperate_reasoning and model name that does not contain "deepseek-r1", I can not use the deepseek-r1 parser.

Can the function implemented as

when serve model using --enable-reasoning and --reasoning-parser, seperate reasoning content automatically.

when serve model using --reasoning-parser and receive request with seperate_reasoning, seperate the reasoning content with specified parser.

I believe this is the current behavior, if --reasoning-parser is specified that will always be used. All of the statement that check if parsing should be done are: if reasoning_parser or (req.separate_reasoning and is_reasoning_model(req.model)):. This should always preference use a parser specified at server launch.
I also think this is how it should work at the moment? Because if the structure of the if statements specified in 1) any parser specified at launch will always be used no matter if req.separate_reasoning is provided or not.

Now maybe it makes sense to add a check for req.separate_reasoning == False to skip reasoning parsing even if specified at server startup? No

Edit: ah, i think I see what you mean now @xihuai18 , you'd like to bring back the --enable-reasoning flag to mean "always parse reasoning with the specified reasoning_parser", and then modify current behavior to have only specifying --reasoning_parser not force reasoning parsing, instead use the req.separate_reasoning signal to trigger parsing and don't rely on the model name check and instead just use the specified parser.

This makes sense to me, I'll take a look at the change today.

zhaochenyang20 · 2025-03-01T18:25:55Z

@tot0 Great. Hope shuai @shuaills will review this quickly.

…using the given `--reasoning-parser`. Also allow disabled reasoning parsing on a per-request basis using `req.separate_reasoning`. Finally preference a reasoning parser supplied at server startup over detecting the parser using `req.model`.

tot0 · 2025-03-01T22:33:54Z

@zhaochenyang20 @xihuai18 @shuaills take a look

zhaochenyang20 · 2025-03-02T04:56:52Z

Thanks. I will remind shuai

shuaills

Thanks for this PR, overall it looks great. And there are a few points need to change.

there's no need to use two flags; it's redundant. Parameters can be passed in the request instead.
the code could be written in a more Pythonic style. I've marked one instance.
please format the documentation as a Jupyter notebook, similar to the tool call documentation.
this should support both native API and offline engine.
Xihuai @xihuai18 has submitted a similar PR. I think you could join the Slack channel to work on this together - he's done some excellent work.

shuaills · 2025-03-02T11:56:33Z

docs/references/reasoning_parser.md

@@ -0,0 +1,141 @@
+# Reasoning Parser


Can you move this under Advanced Features and make it an ipynb?

This is a good idea, though it increases the scope of this PR
further and i'd really like to get this merged asap to get Microsoft internal branch back in sync with main.
Would you consider a follow up task for this @shuaills ?

@xihuai18 has done it. Can we discuss this on slack? https://slack.sglang.ai/

shuaills · 2025-03-02T11:57:10Z

docs/references/deepseek.rst

   :maxdepth: 1

   deepseek.md
+   reasoning_parser.md


move this under Advanced Features

shuaills · 2025-03-02T12:20:19Z

python/sglang/srt/openai_api/adapter.py

+        if (
+            separate_reasoning != False
+            and (parse_reasoning or separate_reasoning)
+            and (reasoning_parser or is_reasoning_model(model))
+        ):


if ( separate_reasoning and (reasoning_parser or is_reasoning_model(model)) ):

I pull this pr and I am working on this, should I pr to @tot0 or just pr to #3859 ?

Working on which part? Happy to merge a PR you make into the brush on my fork, but it would be good to understand what we're trying to achieve.

Can you enter https://slack.sglang.ai/ and we talk about it ? @tot0

tot0 · 2025-03-02T17:13:41Z

Thanks for this PR, overall it looks great. And there are a few points need to change.

there's no need to use two flags; it's redundant. Parameters can be passed in the request instead.

the code could be written in a more Pythonic style. I've marked one instance.

please format the documentation as a Jupyter notebook, similar to the tool call documentation.

this should support both native API and offline engine.

Xihuai @xihuai18 has submitted a similar PR. I think you could join the Slack channel to work on this together - he's done some excellent work.

The flags currently do mean different things, which I explained in reasoning_parser.md, one to force a specific reasoning parser to be used when separate_reasoning is specified for a request and one to force all requests to be parser (effectively forcing separate_reasoning=true for all requests.)
I added these flags back to have both the functionality like vllm where server startup can determine all requests should be parsed for reasoning (so you can have behavior like deepseeks api). While keeping the separate_reasoning field on requests as that's my desired interface.
I agree the if statement you commented on is ugly, but it is written that way to specifically handle the interaction of some cases:
a) separate_reasoning != False is there to ensure if any requests explicitly sets False that reasoning not be separated.
b) Next, (parse_reasoning or separate_reasoning), if parse_reasoning is forced then proceed, or if separate_reasoning is explicitly set True proceed (if it wasn't set (i.e. None) then we don't parse reasoning unless it's been forced at startup).
c) Finally, (reasoning_parser or is_reasoning_model(model)), ensures that there is a matching reasoning_parser to use (either set at startup or fuzzy matched from model name)

If sglang would rather not have the option to force reasoning parsing at startup then it gets much simpler. If sglang prefers the --reasoning-parser to double as specify the parser to use and forcing parsing, as I had this PR previously, then I think it still makes sense to allow separate_reasoning=False to disable parsing on a per request basis, which is think still may needs the 3 condition if statement.

Commented on this inline, agree this is better, not sure if I'll have time to make this change until Tuesday this coming week though.
I'm not so familiar with the offline engine and would appreciate pointers and/or help with this change, perhaps @xihuai18 could help here?
Could you link that PR?

gaocegege · 2025-03-03T02:18:26Z

Hey, just wanted to check—do we actually need separate_reasoning? It’s not supported by DeepSeek API, and I’m having a bit of trouble seeing the use case for it.

xihuai18 · 2025-03-03T02:56:26Z

Hey, just wanted to check—do we actually need separate_reasoning? It’s not supported by DeepSeek API, and I’m having a bit of trouble seeing the use case for it.

see the docs in #4000

gaocegege · 2025-03-03T03:02:51Z

Hey, just wanted to check—do we actually need separate_reasoning? It’s not supported by DeepSeek API, and I’m having a bit of trouble seeing the use case for it.

see the docs in #4000

I’ve gone through the PR, but I’m still unclear about the specific scenarios where separate_reasoning=False would be appropriate. That said, it’s totally fine to have this setting available.

tot0 · 2025-03-03T04:12:41Z

Hey, just wanted to check—do we actually need separate_reasoning? It’s not supported by DeepSeek API, and I’m having a bit of trouble seeing the use case for it.

see the docs in #4000

I’ve gone through the PR, but I’m still unclear about the specific scenarios where separate_reasoning=False would be appropriate. That said, it’s totally fine to have this setting available.

The original reason for separate_reasoning as a body param is because this PRs default is separate_reasoning=False and individual requests can ask for the reasoning parser to be used by setting it True. This is how Microsoft is serving the model and intends to continue serving it until we've formalized the behavior of reasoning models in our API. Many customers clients may not handle reasoning_content and it's also the current default behavior of the sglang server in main, to not parse out reasoning_content.

In #4000 the current default behavior is separate_reasoning=True. I still think it's valuable to have the option for clients to request that reasoning not be separated, though the main reason i'd like the per-request option to remain is because Microsoft wants to keep our current behavior consistent, and so would like the option to set the default value of separate_reasoning to False, allowing each request to specify True if desired.

gaocegege · 2025-03-03T04:30:39Z

SGTM. It makes sense. Thanks.

lambert0312 · 2025-03-04T02:38:59Z

In #4000 the current default behavior is separate_reasoning=True. I still think it's valuable to have the option for clients to request that reasoning not be separated, though the main reason i'd like the per-request option to remain is because Microsoft wants to keep our current behavior consistent, and so would like the option to set the default value of separate_reasoning to False, allowing each request to specify True if desired.

I agree with this suggestion

xihuai18 · 2025-03-04T03:18:10Z

In #4000 the current default behavior is separate_reasoning=True. I still think it's valuable to have the option for clients to request that reasoning not be separated, though the main reason i'd like the per-request option to remain is because Microsoft wants to keep our current behavior consistent, and so would like the option to set the default value of separate_reasoning to False, allowing each request to specify True if desired.

I agree with this suggestion

Could you suggest a param name and I will implement it quickly today?

xihuai18 · 2025-03-04T03:22:00Z

In #4000 the current default behavior is separate_reasoning=True. I still think it's valuable to have the option for clients to request that reasoning not be separated, though the main reason i'd like the per-request option to remain is because Microsoft wants to keep our current behavior consistent, and so would like the option to set the default value of separate_reasoning to False, allowing each request to specify True if desired.

I agree with this suggestion

Could you suggest a param name and I will implement it quickly today?

And I think set default mode as not separating and add a param to force separating is more reasonable. @tot0 Can you accept this plan?

xihuai18 · 2025-03-04T03:38:29Z

In #4000 the current default behavior is separate_reasoning=True. I still think it's valuable to have the option for clients to request that reasoning not be separated, though the main reason i'd like the per-request option to remain is because Microsoft wants to keep our current behavior consistent, and so would like the option to set the default value of separate_reasoning to False, allowing each request to specify True if desired.

I agree with this suggestion

Could you suggest a param name and I will implement it quickly today?

And I think set default mode as not separating and add a param to force separating is more reasonable. @tot0 Can you accept this plan?

As a new PR maybe.

tot0 · 2025-03-04T05:24:00Z

Superseded by #4000

tot0 · 2025-03-04T05:26:15Z

In #4000 the current default behavior is separate_reasoning=True. I still think it's valuable to have the option for clients to request that reasoning not be separated, though the main reason i'd like the per-request option to remain is because Microsoft wants to keep our current behavior consistent, and so would like the option to set the default value of separate_reasoning to False, allowing each request to specify True if desired.

I agree with this suggestion

Could you suggest a param name and I will implement it quickly today?

And I think set default mode as not separating and add a param to force separating is more reasonable. @tot0 Can you accept this plan?

As a new PR maybe.

The last commit on #4001 has a suggested param name, and the rest of the changes to allow changing the default value of separate_reasoning.
Thanks for taking a look.

tot0 marked this pull request as ready for review January 31, 2025 04:39

tot0 requested review from ByronHsu, Ying1123, hnyls2002, ispobock, merrymercy and zhyncs as code owners January 31, 2025 04:39

tot0 force-pushed the lupickup/deepseek/reasoning_content branch 2 times, most recently from c3a3d81 to cd0af57 Compare February 5, 2025 19:49

Lucas Pickup added 6 commits February 6, 2025 16:08

Support reasoning_content in ChatCompletion choices like DeepSeek api.

ad623a0

Fix up silly mistakes in non-streaming path

1fe6654

Flip accumulate_reasoning to stream_reasoning to match the api changes.

0a0eaad

Ensure finish_reason is null by default to match OpenAI streaming r…

9cc7b76

…esponse behavior.

fix silly python tuple mistake.

6400800

Don't send streaming chunks for empty content.

d856124

tot0 force-pushed the lupickup/deepseek/reasoning_content branch from cd0af57 to d856124 Compare February 6, 2025 16:14

tot0 changed the title ~~Draft support for reasoning_content in API~~ Support for reasoning_content in API Feb 6, 2025

lambert0312 mentioned this pull request Feb 9, 2025

How to return reasoning_content from sglang server response? #3428

Closed

zhyncs self-assigned this Feb 9, 2025

zhyncs added the high priority label Feb 9, 2025

Merge branch 'main' into lupickup/deepseek/reasoning_content

414d467

gaocegege reviewed Feb 11, 2025

View reviewed changes

python/sglang/srt/reasoning_parser.py Outdated Show resolved Hide resolved

lambert0312 reviewed Feb 13, 2025

View reviewed changes

python/sglang/srt/openai_api/adapter.py Outdated Show resolved Hide resolved

Lucas Pickup added 6 commits February 13, 2025 05:58

Adapt reasoning_parser to handle <think> token not being produced by …

99f2583

…model, also handle first response while separating reasoning.

Fix silly typo

f39a256

Fix up <think> token stripping.

132e5d6

use split correctly

4a60111

Remove unused reasoning_regex.

9ec06b1

wow i really can't read, or it's late, or both.

22a0b61

Lucas Pickup added 2 commits February 28, 2025 20:58

Move reasoning_parser.md to docs/references

55acaaa

Fixup incorrect handling of request: list

2ae4fa4

Merge branch 'main' into lupickup/deepseek/reasoning_content

dd00bd8

Merge branch 'main' into lupickup/deepseek/reasoning_content

b93b610

shuaills suggested changes Mar 2, 2025

View reviewed changes

xihuai18 mentioned this pull request Mar 2, 2025

Reasoning parser #4000

Merged

6 tasks

tot0 closed this Mar 4, 2025

Conversation

tot0 commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

lambert0312 commented Feb 8, 2025

Uh oh!

Uh oh!

Uh oh!

tot0 commented Feb 28, 2025

Uh oh!

tot0 commented Feb 28, 2025

Uh oh!

tot0 commented Feb 28, 2025

Uh oh!

zhaochenyang20 commented Mar 1, 2025

Uh oh!

xihuai18 commented Mar 1, 2025

Uh oh!

shuaills commented Mar 1, 2025

Uh oh!

tot0 commented Mar 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhaochenyang20 commented Mar 1, 2025

Uh oh!

tot0 commented Mar 1, 2025

Uh oh!

zhaochenyang20 commented Mar 2, 2025

Uh oh!

shuaills left a comment

Choose a reason for hiding this comment

Uh oh!

shuaills Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

tot0 Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

shuaills Mar 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shuaills Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

shuaills Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

xihuai18 Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

tot0 Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

xihuai18 Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

tot0 commented Mar 2, 2025

Uh oh!

gaocegege commented Mar 3, 2025

Uh oh!

xihuai18 commented Mar 3, 2025

Uh oh!

gaocegege commented Mar 3, 2025

Uh oh!

tot0 commented Mar 3, 2025

Uh oh!

gaocegege commented Mar 3, 2025

Uh oh!

lambert0312 commented Mar 4, 2025

Uh oh!

xihuai18 commented Mar 4, 2025

Uh oh!

xihuai18 commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xihuai18 commented Mar 4, 2025

tot0 commented Jan 29, 2025 •

edited

Loading

tot0 commented Mar 1, 2025 •

edited

Loading

shuaills Mar 2, 2025 •

edited

Loading

xihuai18 commented Mar 4, 2025 •

edited

Loading