Fix broken streaming response with --incremental-streaming-output by kpham-sgl · Pull Request #22549 · sgl-project/sglang

kpham-sgl · 2026-04-10T21:58:13Z

Motivation

--incremental-streaming-output + stream=True produces garbled text for all models (not just Gemma 4).

Root cause: #21037 (c37200f) changed tokenizer_manager.py to emit incremental deltas (state.text[last_text_offset:]) instead of cumulative text (state.text) when incremental_streaming_output is enabled. serving_chat.py was not updated and still slices the already-incremental text by the accumulated buffer length, producing garbage fragments.

Repro on any model:

# Server
python3 -m sglang.launch_server --model-path Qwen/Qwen2.5-0.5B-Instruct --incremental-streaming-output --port 51000

# Client
curl -s http://127.0.0.1:51000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"default","messages":[{"role":"user","content":"Hello, who are you?"}],"stream":true}'

# Before fix: chunks = ['Hello', 'e', 'age', 's', 'g'] -> "Helloeagesg"
# After fix:  chunks = ['I', ' am', ' Q', 'wen', ',', ' a', ' large', ...] -> coherent text

Modifications

serving_chat.py: When incremental_streaming_output is enabled, use content["text"] directly as the delta instead of slicing by accumulated buffer length.
test_serving_chat.py: Add regression test test_incremental_streaming_output_delta.

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

When incremental_streaming_output is enabled, content["text"] from the tokenizer manager is already the incremental delta (new text only), not the full accumulated text. The delta computation in _generate_chat_stream incorrectly sliced into this delta using the accumulated buffer length, producing garbled fragments like "rge", "age", "eable" instead of "large", "language", "trainable". Fixes #22510 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist · 2026-04-10T21:58:19Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

kpham-sgl · 2026-04-10T22:00:18Z

/tag-run-ci-label

kpham-sgl · 2026-04-10T22:01:25Z

/rerun-test test/registered/openai_server/basic/test_serving_chat.py

github-actions · 2026-04-10T22:01:54Z

✅ 1-gpu-5090 (1 test): View workflow run

cd test/ && python3 registered/openai_server/basic/test_serving_chat.py

kpham-sgl · 2026-04-11T02:24:22Z

/rerun-failed-ci

kpham-sgl requested a review from ispobock as a code owner April 10, 2026 21:58

kpham-sgl requested review from CatherineSue, JustinTong0323, merrymercy and slin1237 as code owners April 10, 2026 21:58

github-actions bot added the run-ci label Apr 10, 2026

kpham-sgl requested a review from Kangyan-Zhou April 10, 2026 22:05

alexnails approved these changes Apr 11, 2026

View reviewed changes

JustinTong0323 approved these changes Apr 11, 2026

View reviewed changes

kpham-sgl added the ready-to-merge The PR is ready to merge after the CI is green. label Apr 11, 2026

Kangyan-Zhou merged commit 1f8df97 into main Apr 12, 2026
246 of 304 checks passed

Kangyan-Zhou deleted the fix/incremental-streaming-delta branch April 12, 2026 22:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix broken streaming response with --incremental-streaming-output#22549

Fix broken streaming response with --incremental-streaming-output#22549
Kangyan-Zhou merged 1 commit intomainfrom
fix/incremental-streaming-delta

kpham-sgl commented Apr 10, 2026

Uh oh!

gemini-code-assist bot commented Apr 10, 2026

Uh oh!

kpham-sgl commented Apr 10, 2026

Uh oh!

kpham-sgl commented Apr 10, 2026

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

kpham-sgl commented Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kpham-sgl commented Apr 10, 2026

Motivation

Modifications

Checklist

Uh oh!

gemini-code-assist bot commented Apr 10, 2026

Uh oh!

kpham-sgl commented Apr 10, 2026

Uh oh!

kpham-sgl commented Apr 10, 2026

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

kpham-sgl commented Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants