Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
ad623a0
Support `reasoning_content` in ChatCompletion choices like DeepSeek api.
Jan 29, 2025
1fe6654
Fix up silly mistakes in non-streaming path
Jan 29, 2025
0a0eaad
Flip accumulate_reasoning to stream_reasoning to match the api changes.
Jan 29, 2025
9cc7b76
Ensure `finish_reason` is null by default to match OpenAI streaming r…
Feb 4, 2025
6400800
fix silly python tuple mistake.
Feb 4, 2025
d856124
Don't send streaming chunks for empty content.
Feb 5, 2025
414d467
Merge branch 'main' into lupickup/deepseek/reasoning_content
zhyncs Feb 9, 2025
99f2583
Adapt reasoning_parser to handle <think> token not being produced by …
Feb 13, 2025
f39a256
Fix silly typo
Feb 13, 2025
132e5d6
Fix up <think> token stripping.
Feb 13, 2025
4a60111
use split correctly
Feb 13, 2025
9ec06b1
Remove unused reasoning_regex.
Feb 13, 2025
22a0b61
wow i really can't read, or it's late, or both.
Feb 13, 2025
cde50fc
Fix another case.
Feb 13, 2025
7330b0b
parse_result.normal_text _shouldn't_ ever be None, but lets be defens…
Feb 13, 2025
aa63f5d
Merge branch 'main' into lupickup/deepseek/reasoning_content
tot0 Feb 13, 2025
3fdce0d
Make content=None if iparse_results.normal_text returns an empty string
Feb 13, 2025
cf9d440
Merge branch 'main' into lupickup/deepseek/reasoning_content
zhyncs Feb 20, 2025
de7618b
Merge branch 'main' into lupickup/deepseek/reasoning_content
zhyncs Feb 21, 2025
1f7daae
Run pre-commit hook to format changes.
Feb 21, 2025
287be31
Merge branch 'main' into lupickup/deepseek/reasoning_content
tot0 Feb 25, 2025
210fbdc
Merge branch 'main' into lupickup/deepseek/reasoning_content
tot0 Feb 27, 2025
e165bf7
Merge in awesome docs from #3859 by @ShaoZhang0115 and add unittests.
Feb 28, 2025
5a89225
Adding missing format string.
Feb 28, 2025
fa85c96
Remove local testing hacks.
Feb 28, 2025
5309df7
Merge branch 'main' into lupickup/deepseek/reasoning_content
zhaochenyang20 Feb 28, 2025
55acaaa
Move reasoning_parser.md to `docs/references`
Feb 28, 2025
2ae4fa4
Fixup incorrect handling of `request: list`
Feb 28, 2025
94bee72
[Refactor] Update reasoning handling in ChatCompletionRequest and adj…
xihuai18 Mar 2, 2025
411473b
revert dockerfile changes
xihuai18 Mar 2, 2025
ce6c485
add more testcases
xihuai18 Mar 2, 2025
022590a
add main for unit tests
xihuai18 Mar 2, 2025
98be910
revert some typos
xihuai18 Mar 2, 2025
9ff2a19
fix(reasoning content): :bug: fix typos
xihuai18 Mar 2, 2025
75463eb
test(reasoning content): :white_check_mark: add full test for reasoni…
xihuai18 Mar 2, 2025
88a90f5
feat(reasoning content): :sparkles: support separate reasoninig conte…
xihuai18 Mar 2, 2025
d74a9d3
feat(reasoning parser): :sparkles: refactor parsing methods to return…
xihuai18 Mar 2, 2025
c2d5d5b
fix(reasoning content): :bug: fix stream_reasoning
xihuai18 Mar 2, 2025
43f31b4
docs(reasoning content):
xihuai18 Mar 2, 2025
090d7fb
chore(reasoning content): del some useless code
xihuai18 Mar 2, 2025
d4a5dc2
Merge branch 'main' into reasoning-parser
xihuai18 Mar 2, 2025
32b8f31
Fix up type hints in `reasoning_parser.py`
Mar 2, 2025
3b0f1aa
De-dupcliate list of supported reasoning models.
Mar 2, 2025
be03171
Support setting default. value for separate_reasoining.
Mar 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
424 changes: 424 additions & 0 deletions docs/backend/separate_reasoning.ipynb

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ The core features include:
backend/speculative_decoding.ipynb
backend/structured_outputs.ipynb
backend/function_calling.ipynb
backend/separate_reasoning.ipynb
backend/custom_chat_template.md
backend/quantization.md

Expand Down
4 changes: 4 additions & 0 deletions docs/references/deepseek.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,10 @@ Overall, with these optimizations, we have achieved up to a 7x acceleration in o

**Usage**: turn on by default for DeepSeek V3 models.

### Reasoning Content for DeepSeek R1

See [Separate Reasoning](https://docs.sglang.ai/backend/separate_reasoning.html).

## FAQ

1. **Question**: What should I do if model loading takes too long and NCCL timeout occurs?
Expand Down
22 changes: 22 additions & 0 deletions python/sglang/srt/entrypoints/http_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@
ParseFunctionCallReq,
ReleaseMemoryOccupationReqInput,
ResumeMemoryOccupationReqInput,
SeparateReasoningReqInput,
UpdateWeightFromDiskReqInput,
UpdateWeightsFromDistributedReqInput,
VertexGenerateReqInput,
Expand All @@ -70,6 +71,7 @@
v1_retrieve_file_content,
)
from sglang.srt.openai_api.protocol import ModelCard, ModelList
from sglang.srt.reasoning_parser import ReasoningParser
from sglang.srt.server_args import ServerArgs
from sglang.srt.utils import (
add_api_key_middleware,
Expand Down Expand Up @@ -394,6 +396,26 @@ async def function_call_request(obj: ParseFunctionCallReq, request: Request):
return ORJSONResponse(content=response_data, status_code=200)


@app.post("/separate_reasoning")
async def separate_reasoning_request(obj: SeparateReasoningReqInput, request: Request):
"""
A native API endpoint to separate reasoning from a text.
"""
# 1) Initialize the parser based on the request body
parser = ReasoningParser(model_type=obj.reasoning_parser)

# 2) Call the non-stream parsing method (non-stream)
reasoning_text, normal_text = parser.parse_non_stream(obj.text)

# 3) Organize the response content
response_data = {
"reasoning_text": reasoning_text,
"text": normal_text,
}

return ORJSONResponse(content=response_data, status_code=200)


##### OpenAI-compatible API endpoints #####


Expand Down
6 changes: 6 additions & 0 deletions python/sglang/srt/managers/io_struct.py
Original file line number Diff line number Diff line change
Expand Up @@ -580,6 +580,12 @@ class ParseFunctionCallReq:
)


@dataclass
class SeparateReasoningReqInput:
text: str # The text to parse.
reasoning_parser: str # Specify the parser type, e.g., "deepseek-r1".


@dataclass
class VertexGenerateReqInput:
instances: List[dict]
Expand Down
105 changes: 98 additions & 7 deletions python/sglang/srt/openai_api/adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@
TopLogprob,
UsageInfo,
)
from sglang.srt.reasoning_parser import ReasoningParser
from sglang.utils import get_exception_traceback

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -324,6 +325,8 @@ async def process_batch(tokenizer_manager, batch_id: str, batch_request: BatchRe
to_file=True,
cache_report=tokenizer_manager.server_args.enable_cache_report,
tool_call_parser=tokenizer_manager.server_args.tool_call_parser,
reasoning_parser=tokenizer_manager.server_args.reasoning_parser,
separate_reasoning_default=tokenizer_manager.server_args.separate_reasoning_default,
)
else:
responses = v1_generate_response(
Expand Down Expand Up @@ -1045,7 +1048,13 @@ def v1_chat_generate_request(


def v1_chat_generate_response(
request, ret, to_file=False, cache_report=False, tool_call_parser=None
request,
ret,
to_file=False,
cache_report=False,
tool_call_parser=None,
reasoning_parser=None,
separate_reasoning_default=None,
):
choices = []

Expand Down Expand Up @@ -1099,9 +1108,32 @@ def v1_chat_generate_response(
if isinstance(request, list):
tool_choice = request[idx].tool_choice
tools = request[idx].tools
separate_reasoning = (
request[idx].separate_reasoning
if request[idx].separate_reasoning is not None
else separate_reasoning_default
)
else:
tool_choice = request.tool_choice
tools = request.tools
separate_reasoning = (
request.separate_reasoning
if request.separate_reasoning is not None
else separate_reasoning_default
)

if reasoning_parser and separate_reasoning:
try:
parser = ReasoningParser(reasoning_parser, True)
reasoning_text, text = parser.parse_non_stream(text)
except Exception as e:
logger.error(f"Exception: {e}")
return create_error_response(
HTTPStatus.BAD_REQUEST,
"Failed to parse reasoning related info to json format!",
)
else:
reasoning_text = None

if tool_choice != "none" and any([i in text for i in TOOLS_TAG_LIST]):
if finish_reason == "stop":
Expand Down Expand Up @@ -1131,8 +1163,9 @@ def v1_chat_generate_response(
"index": 0,
"message": {
"role": "assistant",
"content": ret_item["text"] if tool_calls is None else None,
"content": text if tool_calls is None else None,
"tool_calls": tool_calls,
"reasoning_content": reasoning_text,
},
"logprobs": choice_logprobs,
"finish_reason": (finish_reason["type"] if finish_reason else ""),
Expand All @@ -1147,8 +1180,9 @@ def v1_chat_generate_response(
index=idx,
message=ChatMessage(
role="assistant",
content=ret_item["text"] if tool_calls is None else None,
content=text if tool_calls is None else None,
tool_calls=tool_calls,
reasoning_content=reasoning_text,
),
logprobs=choice_logprobs,
finish_reason=(finish_reason["type"] if finish_reason else ""),
Expand Down Expand Up @@ -1215,6 +1249,7 @@ async def v1_chat_completions(tokenizer_manager, raw_request: Request):

if adapted_request.stream:
parser_dict = {}
reasoning_parser_dict = {}

async def generate_stream_resp():
is_firsts = {}
Expand Down Expand Up @@ -1281,15 +1316,28 @@ async def generate_stream_resp():
choice_logprobs = None

finish_reason = content["meta_info"]["finish_reason"]
finish_reason_type = (
finish_reason["type"] if finish_reason else None
)

if is_first:
# First chunk with role
is_first = False
if tokenizer_manager.server_args.reasoning_parser and (
request.separate_reasoning
if request.separate_reasoning is not None
else tokenizer_manager.server_args.separate_reasoning
):
delta = DeltaMessage(role="assistant", reasoning_content="")
else:
delta = DeltaMessage(role="assistant", content="")
choice_data = ChatCompletionResponseStreamChoice(
index=index,
delta=DeltaMessage(role="assistant", content=""),
delta=delta,
finish_reason=(
finish_reason["type"] if finish_reason else ""
None
if finish_reason_type and len(finish_reason_type) == 0
else finish_reason_type
),
matched_stop=(
finish_reason["matched"]
Expand All @@ -1309,6 +1357,42 @@ async def generate_stream_resp():
delta = text[len(stream_buffer) :]
new_stream_buffer = stream_buffer + delta

if tokenizer_manager.server_args.reasoning_parser and (
request.separate_reasoning
if request.separate_reasoning is not None
else tokenizer_manager.server_args.separate_reasoning
):
if index not in reasoning_parser_dict:
reasoning_parser_dict[index] = ReasoningParser(
tokenizer_manager.server_args.reasoning_parser,
request.stream_reasoning,
)
reasoning_parser = reasoning_parser_dict[index]
reasoning_text, delta = reasoning_parser.parse_stream_chunk(
delta
)
if reasoning_text:
choice_data = ChatCompletionResponseStreamChoice(
index=index,
delta=DeltaMessage(reasoning_content=reasoning_text),
finish_reason=(
None
if finish_reason_type
and len(finish_reason_type) == 0
else finish_reason_type
),
)
chunk = ChatCompletionStreamResponse(
id=content["meta_info"]["id"],
choices=[choice_data],
model=request.model,
)
yield f"data: {chunk.model_dump_json()}\n\n"
if (delta and len(delta) == 0) or not delta:
stream_buffers[index] = new_stream_buffer
is_firsts[index] = is_first
continue

if request.tool_choice != "none" and request.tools:
if index not in parser_dict:
parser_dict[index] = FunctionCallParser(
Expand All @@ -1326,7 +1410,10 @@ async def generate_stream_resp():
index=index,
delta=DeltaMessage(content=normal_text),
finish_reason=(
finish_reason["type"] if finish_reason else ""
None
if finish_reason_type
and len(finish_reason_type) == 0
else finish_reason_type
),
)
chunk = ChatCompletionStreamResponse(
Expand Down Expand Up @@ -1395,7 +1482,9 @@ async def generate_stream_resp():
index=index,
delta=DeltaMessage(content=delta),
finish_reason=(
finish_reason["type"] if finish_reason else ""
None
if finish_reason_type and len(finish_reason_type) == 0
else finish_reason_type
),
matched_stop=(
finish_reason["matched"]
Expand Down Expand Up @@ -1463,6 +1552,8 @@ async def generate_stream_resp():
ret,
cache_report=tokenizer_manager.server_args.enable_cache_report,
tool_call_parser=tokenizer_manager.server_args.tool_call_parser,
reasoning_parser=tokenizer_manager.server_args.reasoning_parser,
separate_reasoning_default=tokenizer_manager.server_args.separate_reasoning_default,
)

return response
Expand Down
4 changes: 4 additions & 0 deletions python/sglang/srt/openai_api/protocol.py
Original file line number Diff line number Diff line change
Expand Up @@ -336,6 +336,8 @@ class ChatCompletionRequest(BaseModel):
skip_special_tokens: bool = True
lora_path: Optional[Union[List[Optional[str]], Optional[str]]] = None
session_params: Optional[Dict] = None
separate_reasoning: Optional[bool] = None
stream_reasoning: bool = True


class FunctionResponse(BaseModel):
Expand All @@ -356,6 +358,7 @@ class ToolCall(BaseModel):
class ChatMessage(BaseModel):
role: Optional[str] = None
content: Optional[str] = None
reasoning_content: Optional[str] = None
tool_calls: Optional[List[ToolCall]] = Field(default=None, examples=[None])


Expand All @@ -379,6 +382,7 @@ class ChatCompletionResponse(BaseModel):
class DeltaMessage(BaseModel):
role: Optional[str] = None
content: Optional[str] = None
reasoning_content: Optional[str] = None
tool_calls: Optional[List[ToolCall]] = Field(default=None, examples=[None])


Expand Down
Loading