Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
ad623a0
Support `reasoning_content` in ChatCompletion choices like DeepSeek api.
Jan 29, 2025
1fe6654
Fix up silly mistakes in non-streaming path
Jan 29, 2025
0a0eaad
Flip accumulate_reasoning to stream_reasoning to match the api changes.
Jan 29, 2025
9cc7b76
Ensure `finish_reason` is null by default to match OpenAI streaming r…
Feb 4, 2025
6400800
fix silly python tuple mistake.
Feb 4, 2025
d856124
Don't send streaming chunks for empty content.
Feb 5, 2025
414d467
Merge branch 'main' into lupickup/deepseek/reasoning_content
zhyncs Feb 9, 2025
99f2583
Adapt reasoning_parser to handle <think> token not being produced by …
Feb 13, 2025
f39a256
Fix silly typo
Feb 13, 2025
132e5d6
Fix up <think> token stripping.
Feb 13, 2025
4a60111
use split correctly
Feb 13, 2025
9ec06b1
Remove unused reasoning_regex.
Feb 13, 2025
22a0b61
wow i really can't read, or it's late, or both.
Feb 13, 2025
cde50fc
Fix another case.
Feb 13, 2025
7330b0b
parse_result.normal_text _shouldn't_ ever be None, but lets be defens…
Feb 13, 2025
aa63f5d
Merge branch 'main' into lupickup/deepseek/reasoning_content
tot0 Feb 13, 2025
3fdce0d
Make content=None if iparse_results.normal_text returns an empty string
Feb 13, 2025
cf9d440
Merge branch 'main' into lupickup/deepseek/reasoning_content
zhyncs Feb 20, 2025
de7618b
Merge branch 'main' into lupickup/deepseek/reasoning_content
zhyncs Feb 21, 2025
1f7daae
Run pre-commit hook to format changes.
Feb 21, 2025
287be31
Merge branch 'main' into lupickup/deepseek/reasoning_content
tot0 Feb 25, 2025
210fbdc
Merge branch 'main' into lupickup/deepseek/reasoning_content
tot0 Feb 27, 2025
e165bf7
Merge in awesome docs from #3859 by @ShaoZhang0115 and add unittests.
Feb 28, 2025
5a89225
Adding missing format string.
Feb 28, 2025
fa85c96
Remove local testing hacks.
Feb 28, 2025
5309df7
Merge branch 'main' into lupickup/deepseek/reasoning_content
zhaochenyang20 Feb 28, 2025
55acaaa
Move reasoning_parser.md to `docs/references`
Feb 28, 2025
2ae4fa4
Fixup incorrect handling of `request: list`
Feb 28, 2025
94bee72
[Refactor] Update reasoning handling in ChatCompletionRequest and adj…
xihuai18 Mar 2, 2025
411473b
revert dockerfile changes
xihuai18 Mar 2, 2025
ce6c485
add more testcases
xihuai18 Mar 2, 2025
022590a
add main for unit tests
xihuai18 Mar 2, 2025
98be910
revert some typos
xihuai18 Mar 2, 2025
9ff2a19
fix(reasoning content): :bug: fix typos
xihuai18 Mar 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/references/deepseek.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ Multi-Node Deployment
:maxdepth: 1

deepseek.md
reasoning_parser.md
138 changes: 138 additions & 0 deletions docs/references/reasoning_parser.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# Reasoning Parser

SGLang supports parsing reasoning content our from "normal" content for reasoning models such as [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1).

The contract follows the [DeepSeek API design](https://api-docs.deepseek.com/guides/reasoning_model) established with the release of DeepSeek-R1:

- `reasoning_content`: The content of the CoT.
- `content`: The content of the final answer.

## Supported Models

Currently, SGLang supports the following reasoning models:
- [DeepSeek R1 series](https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d): The reasoning content is wrapped with `<think>` and `</think>` tags.

## Usage

There are two ways to enable reasoning parsing:

1) Enable the reasoning parser when starting the SGLang Server by setting the `--enable-reasoning` and `--reasoning-parser` options. The `--reasoning-parser` option specifies the reasoning parser to extract the reasoning content and final answer.

```bash
python -m sglang.launch_server --host 0.0.0.0 \
--model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-14B \
--enable-reasoning --reasoning-parser deepseek-r1
```

2) Specify on a per-request basis by setting the `separate_reasoning` body field on a `/chat/completions` request.

```bash
curl -X POST -H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"Compute 1+3"}],"max_tokens":100,"model":"deepseek-r1","stream":true,"separate_reasoning":true}' http://0.0.0.0:30000/v1/chat/completions
```

There is another body param which can be set to buffer the reasoning traces to be sent in one chunk after the closing `</think>` tag, `"stream_reasoning": false`.

### Non-streaming Request

Make a request to the reasoning model, get the reasoning content and final answer.

Using OpenAI python api:
```python
import openai

client = openai.Client(base_url="http://localhost:30000/v1", api_key="None")

response = client.chat.completions.create(
model="deepseek-r1:14b",
messages=[{"role": "user", "content": "Compute 1+3"}],
max_tokens=1024,
stream=False
)

response.choices[0].message.reasoning_content
# 'First, I recognize that the problem requires adding the numbers 1 and 3.\n\nNext, I identify the numbers to be added, which are 1 and 3.\n\nThen, I perform the addition operation: 1 plus 3 equals 4.\n\nFinally, I conclude that the sum of 1 and 3 is 4.\n'
response.choices[0].message.content
# \n\nTo compute \\(1 + 3\\), follow these simple steps:\n\n1. **Identify the numbers to add:** \n The numbers are **1** and **3**.\n\n2. **Add the numbers together:** \n \\[\n 1 + 3 = 4\n \\]\n\n3. **Write the final answer:** \n The sum of \\(1 + 3\\) is \\(\\boxed{4}\\).'
```

### Streaming Request

`reasoning_content` is available in the `delta` field of the streaming response.

Using OpenAI python api:

```python
# ... Initialize the client as before ...

response = client.chat.completions.create(
model="deepseek-r1:14b",
messages=[{"role": "user", "content": "Compute 1+3"}],
max_tokens=1024,
stream=True
)
reasoning_content = ""
content = ""
for chunk in response:
if chunk.choices[0].delta.content:
content += chunk.choices[0].delta.content
elif chunk.choices[0].delta.reasoning_content:
reasoning_content += chunk.choices[0].delta.reasoning_content

reasoning_content
# 'I need to calculate the sum of 1 and 3. \n\nFirst, I identify the numbers involved in the addition: 1 and 3.\n\nNext, I add these two numbers together to find the total.\n\nFinally, the result of the addition is 4.\n'
content
# '\n\n**Solution:**\n\nWe need to compute the sum of 1 and 3.\n\n1. **Identify the numbers to add:**\n - Number 1\n - Number 3\n\n2. **Add the numbers together:**\n \\[\n 1 + 3 = 4\n \\]\n\n3. **Final Answer:**\n \\[\n \\boxed{4}\n \\]'
```


## Supporting New Reasoning Models

For future reasoning models, you can implement the reasoning parser as a subclass of `BaseReasoningParser` in `python/sglang/srt/reasoning_parser.py`.

```python
class BaseReasoningParser:
"""Base class for reasoning parser."""

def __init__(self):
self._buffer = ""

def detect_and_parse(self, text: str) -> Tuple[Optional[str], Optional[str]]:
"""Detect and parse the text, return reasoning_content and content."""
raise NotImplementedError

def parse_streaming_increment(
self, new_text: str
) -> Tuple[Optional[str], Optional[str]]:
"""Parse the new text incrementally, return reasoning_content and content."""
raise NotImplementedError
```

And specify the reasoning parser for new reasoning models accordingly.

```python
class ReasoningParser:
"""Reasoning parser for different reasoning models."""

# Specify the reasoning parser for each reasoning model here
ReasoningParserDict: Dict[str, Type[BaseReasoningParser]] = {
"deepseek-r1": DeepSeekR1ReasoningParser
}

def __init__(self, reasoning_parser: str):
self.parser = self.ReasoningParserDict[reasoning_parser]()

def parse_non_stream(self, full_text: str) -> Tuple[Optional[str], Optional[str]]:
"""
Non-streaming parsing for reasoning models.
Return: reasoning_content, content
"""
return self.parser.detect_and_parse(full_text)

def parse_stream_chunk(self, chunk_text: str):
"""
Streaming parsing for reasoning models.
Return: reasoning_content, content
"""
return self.parser.parse_streaming_increment(chunk_text)
```
94 changes: 87 additions & 7 deletions python/sglang/srt/openai_api/adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@
TopLogprob,
UsageInfo,
)
from sglang.srt.reasoning_parser import ReasoningParser
from sglang.utils import get_exception_traceback

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -1045,7 +1046,12 @@ def v1_chat_generate_request(


def v1_chat_generate_response(
request, ret, to_file=False, cache_report=False, tool_call_parser=None
request,
ret,
to_file=False,
cache_report=False,
tool_call_parser=None,
reasoning_parser=None,
):
choices = []

Expand Down Expand Up @@ -1099,9 +1105,26 @@ def v1_chat_generate_response(
if isinstance(request, list):
tool_choice = request[idx].tool_choice
tools = request[idx].tools
separate_reasoning = request[idx].separate_reasoning
else:
tool_choice = request.tool_choice
tools = request.tools
separate_reasoning = request.separate_reasoning

if reasoning_parser and separate_reasoning:
try:
parser = ReasoningParser(reasoning_parser, True)
parse_result = parser.parse_non_stream(text)
text = parse_result.normal_text #! text can not be None
reasoning_text = parse_result.reasoning_text
except Exception as e:
logger.error(f"Exception: {e}")
return create_error_response(
HTTPStatus.BAD_REQUEST,
"Failed to parse reasoning related info to json format!",
)
else:
reasoning_text = None

if tool_choice != "none" and any([i in text for i in TOOLS_TAG_LIST]):
if finish_reason == "stop":
Expand Down Expand Up @@ -1131,8 +1154,9 @@ def v1_chat_generate_response(
"index": 0,
"message": {
"role": "assistant",
"content": ret_item["text"] if tool_calls is None else None,
"content": text if tool_calls is None else None,
"tool_calls": tool_calls,
"reasoning_content": reasoning_text,
},
"logprobs": choice_logprobs,
"finish_reason": (finish_reason["type"] if finish_reason else ""),
Expand All @@ -1147,8 +1171,9 @@ def v1_chat_generate_response(
index=idx,
message=ChatMessage(
role="assistant",
content=ret_item["text"] if tool_calls is None else None,
content=text if tool_calls is None else None,
tool_calls=tool_calls,
reasoning_content=reasoning_text,
),
logprobs=choice_logprobs,
finish_reason=(finish_reason["type"] if finish_reason else ""),
Expand Down Expand Up @@ -1215,6 +1240,7 @@ async def v1_chat_completions(tokenizer_manager, raw_request: Request):

if adapted_request.stream:
parser_dict = {}
reasoning_parser_dict = {}

async def generate_stream_resp():
is_firsts = {}
Expand Down Expand Up @@ -1281,15 +1307,27 @@ async def generate_stream_resp():
choice_logprobs = None

finish_reason = content["meta_info"]["finish_reason"]
finish_reason_type = (
finish_reason["type"] if finish_reason else None
)

if is_first:
# First chunk with role
is_first = False
if (
tokenizer_manager.server_args.reasoning_parser
and request.separate_reasoning
):
delta = DeltaMessage(role="assistant", reasoning_content="")
else:
delta = DeltaMessage(role="assistant", content="")
choice_data = ChatCompletionResponseStreamChoice(
index=index,
delta=DeltaMessage(role="assistant", content=""),
delta=delta,
finish_reason=(
finish_reason["type"] if finish_reason else ""
None
if finish_reason_type and len(finish_reason_type) == 0
else finish_reason_type
),
matched_stop=(
finish_reason["matched"]
Expand All @@ -1309,6 +1347,42 @@ async def generate_stream_resp():
delta = text[len(stream_buffer) :]
new_stream_buffer = stream_buffer + delta

if (
tokenizer_manager.server_args.reasoning_parser
and request.separate_reasoning
):
if index not in reasoning_parser_dict:
reasoning_parser_dict[index] = ReasoningParser(
tokenizer_manager.server_args.reasoning_parser,
request.stream_reasoning,
)
reasoning_parser = reasoning_parser_dict[index]
parse_result = reasoning_parser.parse_stream_chunk(delta)
if parse_result.reasoning_text:
choice_data = ChatCompletionResponseStreamChoice(
index=index,
delta=DeltaMessage(
reasoning_content=parse_result.reasoning_text
),
finish_reason=(
None
if finish_reason_type
and len(finish_reason_type) == 0
else finish_reason_type
),
)
chunk = ChatCompletionStreamResponse(
id=content["meta_info"]["id"],
choices=[choice_data],
model=request.model,
)
yield f"data: {chunk.model_dump_json()}\n\n"
delta = parse_result.normal_text
if (delta and len(delta) == 0) or not delta:
stream_buffers[index] = new_stream_buffer
is_firsts[index] = is_first
continue

if request.tool_choice != "none" and request.tools:
if index not in parser_dict:
parser_dict[index] = FunctionCallParser(
Expand All @@ -1326,7 +1400,10 @@ async def generate_stream_resp():
index=index,
delta=DeltaMessage(content=normal_text),
finish_reason=(
finish_reason["type"] if finish_reason else ""
None
if finish_reason_type
and len(finish_reason_type) == 0
else finish_reason_type
),
)
chunk = ChatCompletionStreamResponse(
Expand Down Expand Up @@ -1395,7 +1472,9 @@ async def generate_stream_resp():
index=index,
delta=DeltaMessage(content=delta),
finish_reason=(
finish_reason["type"] if finish_reason else ""
None
if finish_reason_type and len(finish_reason_type) == 0
else finish_reason_type
),
matched_stop=(
finish_reason["matched"]
Expand Down Expand Up @@ -1463,6 +1542,7 @@ async def generate_stream_resp():
ret,
cache_report=tokenizer_manager.server_args.enable_cache_report,
tool_call_parser=tokenizer_manager.server_args.tool_call_parser,
reasoning_parser=tokenizer_manager.server_args.reasoning_parser,
)

return response
Expand Down
4 changes: 4 additions & 0 deletions python/sglang/srt/openai_api/protocol.py
Original file line number Diff line number Diff line change
Expand Up @@ -336,6 +336,8 @@ class ChatCompletionRequest(BaseModel):
skip_special_tokens: bool = True
lora_path: Optional[Union[List[Optional[str]], Optional[str]]] = None
session_params: Optional[Dict] = None
separate_reasoning: bool = True
stream_reasoning: bool = True


class FunctionResponse(BaseModel):
Expand All @@ -356,6 +358,7 @@ class ToolCall(BaseModel):
class ChatMessage(BaseModel):
role: Optional[str] = None
content: Optional[str] = None
reasoning_content: Optional[str] = None
tool_calls: Optional[List[ToolCall]] = Field(default=None, examples=[None])


Expand All @@ -379,6 +382,7 @@ class ChatCompletionResponse(BaseModel):
class DeltaMessage(BaseModel):
role: Optional[str] = None
content: Optional[str] = None
reasoning_content: Optional[str] = None
tool_calls: Optional[List[ToolCall]] = Field(default=None, examples=[None])


Expand Down
Loading