Skip to content

Support for reasoning_content in API#3202

Closed
tot0 wants to merge 31 commits intosgl-project:mainfrom
tot0:lupickup/deepseek/reasoning_content
Closed

Support for reasoning_content in API#3202
tot0 wants to merge 31 commits intosgl-project:mainfrom
tot0:lupickup/deepseek/reasoning_content

Conversation

@tot0
Copy link

@tot0 tot0 commented Jan 29, 2025

Not sure if it's how folks want this done, but it works for me :)

Sorry I don't have time to add unit tests right now, if the general implementation is acceptable I can look at how to add the unit tests soon.
I found the time to implement unittests!

#3043

Motivation

Some people wanted the reasoning_content functionality like DeepSeeks API.

Modifications

Added a reasoning parser to /chat/completions API similar to Tool parsing.

Checklist

@tot0 tot0 marked this pull request as ready for review January 31, 2025 04:39
@tot0 tot0 force-pushed the lupickup/deepseek/reasoning_content branch 2 times, most recently from c3a3d81 to cd0af57 Compare February 5, 2025 19:49
@tot0 tot0 force-pushed the lupickup/deepseek/reasoning_content branch from cd0af57 to d856124 Compare February 6, 2025 16:14
@tot0 tot0 changed the title Draft support for reasoning_content in API Support for reasoning_content in API Feb 6, 2025
@lambert0312
Copy link
Contributor

Good work! It works for me. :D

@tot0
Copy link
Author

tot0 commented Feb 28, 2025

Move this docs/backend/reasoning_parser.md to docs/references and add it into the index of https://github.com/sgl-project/sglang/blob/main/docs/references/deepseek.rst.

Ok

@tot0
Copy link
Author

tot0 commented Feb 28, 2025

Also looking at the unit test failures.

@tot0
Copy link
Author

tot0 commented Feb 28, 2025

@zhaochenyang20 addressed your comment.

@zhaochenyang20
Copy link
Collaborator

@shuaills This is related to your work? Could you review this? Thnaks!

@xihuai18
Copy link
Contributor

xihuai18 commented Mar 1, 2025

I found a problem. When I use seperate_reasoning and model name that does not contain "deepseek-r1", I can not use the deepseek-r1 parser.

Can the function implemented as

  1. when serve model using --enable-reasoning and --reasoning-parser, seperate reasoning content automatically.
  2. when serve model using --reasoning-parser and receive request with seperate_reasoning, seperate the reasoning content with specified parser.

@shuaills
Copy link
Collaborator

shuaills commented Mar 1, 2025

I will review this @zhaochenyang20

@tot0
Copy link
Author

tot0 commented Mar 1, 2025

I found a problem. When I use seperate_reasoning and model name that does not contain "deepseek-r1", I can not use the deepseek-r1 parser.

Can the function implemented as

  1. when serve model using --enable-reasoning and --reasoning-parser, seperate reasoning content automatically.

  2. when serve model using --reasoning-parser and receive request with seperate_reasoning, seperate the reasoning content with specified parser.

  1. I believe this is the current behavior, if --reasoning-parser is specified that will always be used. All of the statement that check if parsing should be done are: if reasoning_parser or (req.separate_reasoning and is_reasoning_model(req.model)):. This should always preference use a parser specified at server launch.

  2. I also think this is how it should work at the moment? Because if the structure of the if statements specified in 1) any parser specified at launch will always be used no matter if req.separate_reasoning is provided or not.

Now maybe it makes sense to add a check for req.separate_reasoning == False to skip reasoning parsing even if specified at server startup? No

Edit: ah, i think I see what you mean now @xihuai18 , you'd like to bring back the --enable-reasoning flag to mean "always parse reasoning with the specified reasoning_parser", and then modify current behavior to have only specifying --reasoning_parser not force reasoning parsing, instead use the req.separate_reasoning signal to trigger parsing and don't rely on the model name check and instead just use the specified parser.

This makes sense to me, I'll take a look at the change today.

@zhaochenyang20
Copy link
Collaborator

@tot0 Great. Hope shuai @shuaills will review this quickly.

…using the given `--reasoning-parser`. Also allow disabled reasoning parsing on a per-request basis using `req.separate_reasoning`. Finally preference a reasoning parser supplied at server startup over detecting the parser using `req.model`.
@tot0
Copy link
Author

tot0 commented Mar 1, 2025

@zhaochenyang20 @xihuai18 @shuaills take a look

@zhaochenyang20
Copy link
Collaborator

Thanks. I will remind shuai

Copy link
Collaborator

@shuaills shuaills left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR, overall it looks great. And there are a few points need to change.

  1. there's no need to use two flags; it's redundant. Parameters can be passed in the request instead.
  2. the code could be written in a more Pythonic style. I've marked one instance.
  3. please format the documentation as a Jupyter notebook, similar to the tool call documentation.
  4. this should support both native API and offline engine.
  5. Xihuai @xihuai18 has submitted a similar PR. I think you could join the Slack channel to work on this together - he's done some excellent work.

@@ -0,0 +1,141 @@
# Reasoning Parser
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this under Advanced Features and make it an ipynb?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea, though it increases the scope of this PR
further and i'd really like to get this merged asap to get Microsoft internal branch back in sync with main.
Would you consider a follow up task for this @shuaills ?

Copy link
Collaborator

@shuaills shuaills Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xihuai18 has done it. Can we discuss this on slack? https://slack.sglang.ai/

:maxdepth: 1

deepseek.md
reasoning_parser.md
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this under Advanced Features

Comment on lines +1119 to +1123
if (
separate_reasoning != False
and (parse_reasoning or separate_reasoning)
and (reasoning_parser or is_reasoning_model(model))
):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (
    separate_reasoning and 
    (reasoning_parser or is_reasoning_model(model))
):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pull this pr and I am working on this, should I pr to @tot0 or just pr to #3859 ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on which part? Happy to merge a PR you make into the brush on my fork, but it would be good to understand what we're trying to achieve.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you enter https://slack.sglang.ai/ and we talk about it ? @tot0

@tot0
Copy link
Author

tot0 commented Mar 2, 2025

Thanks for this PR, overall it looks great. And there are a few points need to change.

  1. there's no need to use two flags; it's redundant. Parameters can be passed in the request instead.

  2. the code could be written in a more Pythonic style. I've marked one instance.

  3. please format the documentation as a Jupyter notebook, similar to the tool call documentation.

  4. this should support both native API and offline engine.

  5. Xihuai @xihuai18 has submitted a similar PR. I think you could join the Slack channel to work on this together - he's done some excellent work.

  1. The flags currently do mean different things, which I explained in reasoning_parser.md, one to force a specific reasoning parser to be used when separate_reasoning is specified for a request and one to force all requests to be parser (effectively forcing separate_reasoning=true for all requests.)
    I added these flags back to have both the functionality like vllm where server startup can determine all requests should be parsed for reasoning (so you can have behavior like deepseeks api). While keeping the separate_reasoning field on requests as that's my desired interface.

  2. I agree the if statement you commented on is ugly, but it is written that way to specifically handle the interaction of some cases:
    a) separate_reasoning != False is there to ensure if any requests explicitly sets False that reasoning not be separated.
    b) Next, (parse_reasoning or separate_reasoning), if parse_reasoning is forced then proceed, or if separate_reasoning is explicitly set True proceed (if it wasn't set (i.e. None) then we don't parse reasoning unless it's been forced at startup).
    c) Finally, (reasoning_parser or is_reasoning_model(model)), ensures that there is a matching reasoning_parser to use (either set at startup or fuzzy matched from model name)

If sglang would rather not have the option to force reasoning parsing at startup then it gets much simpler. If sglang prefers the --reasoning-parser to double as specify the parser to use and forcing parsing, as I had this PR previously, then I think it still makes sense to allow separate_reasoning=False to disable parsing on a per request basis, which is think still may needs the 3 condition if statement.

  1. Commented on this inline, agree this is better, not sure if I'll have time to make this change until Tuesday this coming week though.

  2. I'm not so familiar with the offline engine and would appreciate pointers and/or help with this change, perhaps @xihuai18 could help here?

  3. Could you link that PR?

@xihuai18 xihuai18 mentioned this pull request Mar 2, 2025
6 tasks
@gaocegege
Copy link
Contributor

Hey, just wanted to check—do we actually need separate_reasoning? It’s not supported by DeepSeek API, and I’m having a bit of trouble seeing the use case for it.

@xihuai18
Copy link
Contributor

xihuai18 commented Mar 3, 2025

Hey, just wanted to check—do we actually need separate_reasoning? It’s not supported by DeepSeek API, and I’m having a bit of trouble seeing the use case for it.

see the docs in #4000

@gaocegege
Copy link
Contributor

Hey, just wanted to check—do we actually need separate_reasoning? It’s not supported by DeepSeek API, and I’m having a bit of trouble seeing the use case for it.

see the docs in #4000

I’ve gone through the PR, but I’m still unclear about the specific scenarios where separate_reasoning=False would be appropriate. That said, it’s totally fine to have this setting available.

@tot0
Copy link
Author

tot0 commented Mar 3, 2025

Hey, just wanted to check—do we actually need separate_reasoning? It’s not supported by DeepSeek API, and I’m having a bit of trouble seeing the use case for it.

see the docs in #4000

I’ve gone through the PR, but I’m still unclear about the specific scenarios where separate_reasoning=False would be appropriate. That said, it’s totally fine to have this setting available.

The original reason for separate_reasoning as a body param is because this PRs default is separate_reasoning=False and individual requests can ask for the reasoning parser to be used by setting it True. This is how Microsoft is serving the model and intends to continue serving it until we've formalized the behavior of reasoning models in our API. Many customers clients may not handle reasoning_content and it's also the current default behavior of the sglang server in main, to not parse out reasoning_content.

In #4000 the current default behavior is separate_reasoning=True. I still think it's valuable to have the option for clients to request that reasoning not be separated, though the main reason i'd like the per-request option to remain is because Microsoft wants to keep our current behavior consistent, and so would like the option to set the default value of separate_reasoning to False, allowing each request to specify True if desired.

@gaocegege
Copy link
Contributor

SGTM. It makes sense. Thanks.

@lambert0312
Copy link
Contributor

In #4000 the current default behavior is separate_reasoning=True. I still think it's valuable to have the option for clients to request that reasoning not be separated, though the main reason i'd like the per-request option to remain is because Microsoft wants to keep our current behavior consistent, and so would like the option to set the default value of separate_reasoning to False, allowing each request to specify True if desired.

I agree with this suggestion

@xihuai18
Copy link
Contributor

xihuai18 commented Mar 4, 2025

In #4000 the current default behavior is separate_reasoning=True. I still think it's valuable to have the option for clients to request that reasoning not be separated, though the main reason i'd like the per-request option to remain is because Microsoft wants to keep our current behavior consistent, and so would like the option to set the default value of separate_reasoning to False, allowing each request to specify True if desired.

I agree with this suggestion

Could you suggest a param name and I will implement it quickly today?

@xihuai18
Copy link
Contributor

xihuai18 commented Mar 4, 2025

In #4000 the current default behavior is separate_reasoning=True. I still think it's valuable to have the option for clients to request that reasoning not be separated, though the main reason i'd like the per-request option to remain is because Microsoft wants to keep our current behavior consistent, and so would like the option to set the default value of separate_reasoning to False, allowing each request to specify True if desired.

I agree with this suggestion

Could you suggest a param name and I will implement it quickly today?

And I think set default mode as not separating and add a param to force separating is more reasonable. @tot0 Can you accept this plan?

@xihuai18
Copy link
Contributor

xihuai18 commented Mar 4, 2025

In #4000 the current default behavior is separate_reasoning=True. I still think it's valuable to have the option for clients to request that reasoning not be separated, though the main reason i'd like the per-request option to remain is because Microsoft wants to keep our current behavior consistent, and so would like the option to set the default value of separate_reasoning to False, allowing each request to specify True if desired.

I agree with this suggestion

Could you suggest a param name and I will implement it quickly today?

And I think set default mode as not separating and add a param to force separating is more reasonable. @tot0 Can you accept this plan?

As a new PR maybe.

@tot0 tot0 closed this Mar 4, 2025
@tot0
Copy link
Author

tot0 commented Mar 4, 2025

Superseded by #4000

@tot0
Copy link
Author

tot0 commented Mar 4, 2025

In #4000 the current default behavior is separate_reasoning=True. I still think it's valuable to have the option for clients to request that reasoning not be separated, though the main reason i'd like the per-request option to remain is because Microsoft wants to keep our current behavior consistent, and so would like the option to set the default value of separate_reasoning to False, allowing each request to specify True if desired.

I agree with this suggestion

Could you suggest a param name and I will implement it quickly today?

And I think set default mode as not separating and add a param to force separating is more reasonable. @tot0 Can you accept this plan?

As a new PR maybe.

The last commit on #4001 has a suggested param name, and the rest of the changes to allow changing the default value of separate_reasoning.
Thanks for taking a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants