Fix streaming token ids data loss under load by vladnosiv · Pull Request #19977 · sgl-project/sglang

vladnosiv · 2026-03-05T19:57:10Z

When handling multiple scheduler batches before _wait_one_response is naturally scheduled, intermediate outputs accumulate in state.out_list. Previously, the code consumed only the last element (state.out_list[-1]).

While this masked the issue for cumulative text consumers, it introduced a severe bug for consumers reading output_ids limits. Since streaming token IDs are emitted as disjoint deltas, dropping intermediate entries from out_list caused silent, unrecoverable data loss (missing tokens, missing format in tool-calling, and output corruption).

This race condition is significantly amplified under high concurrency, especially when using --skip-tokenizer-init (e.g., in Dynamo environments), where the absence of a CPU-bound detokenizer allows ZMQ messages to arrive at wire speed and accumulate rapidly.

Proposed Changes

Atomically drain out_list: Differentiate between streaming and non-streaming requests. For streams, we now drain and yield all pending output dicts sequentially. For non-streams, we preserve the previous behavior of only taking the latest cumulative output (state.out_list[-1:]).
Preserve Finalization Logic: Ensured that request logging, metrics exporting, and abort handling still only trigger on the very last chunk (is_last) when the state is finished.
Add Observability: Added a warning when draining multiple queued chunks in streaming mode. This provides crucial observability into back-pressure dynamics and helps explain P99 TBT spikes.

Resolves #19976

cc @ishandhanani

Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>

gemini-code-assist · 2026-03-05T19:57:14Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

ishandhanani · 2026-03-06T17:23:23Z

Thanks for the fix @vladnosiv this is important.

For completeness can you add a comment or update the PR description with a before/after

vladnosiv · 2026-03-06T17:40:00Z

Thanks for the fix @vladnosiv this is important.

For completeness can you add a comment or update the PR description with a before/after

added a description in the PR, described the problem in more detail in the linked issue

ishandhanani · 2026-03-07T21:18:42Z

/tag-and-rerun-ci

ishandhanani · 2026-03-09T16:12:54Z

Only a single CI test failed in the previous run

vladnosiv · 2026-03-13T07:05:07Z

@hnyls2002 Hi !
Please take a look

vladnosiv · 2026-03-16T10:34:51Z

Seems like in registered/distributed/test_dp_attention.py test the server didn't even start
On my side test crashes due to using the lmsys/sglang-ci-dsv3-test private model, which I do not have access to
In any case, it looks irrelevant to the current PR, because the problem is still at the start of the test

Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com> Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>

fix tokenizer manager data loss

554faa8

Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>

vladnosiv requested review from Ying1123, hnyls2002, merrymercy and xiezhq-hermann as code owners March 5, 2026 19:57

ishandhanani self-assigned this Mar 6, 2026

ishandhanani approved these changes Mar 6, 2026

View reviewed changes

Merge branch 'main' into fix-tokenizer-manager-queue

c5e0dc3

github-actions bot added the run-ci label Mar 7, 2026

Merge branch 'main' into fix-tokenizer-manager-queue

ab877d2

Merge branch 'main' into fix-tokenizer-manager-queue

fd92573

ishandhanani merged commit b9dba85 into sgl-project:main Mar 18, 2026
100 of 108 checks passed

lawrence-harmonic added a commit to lawrence-harmonic/sglang that referenced this pull request Mar 19, 2026

backport sgl-project#19977 (sgl-project#7)

f16db0d

vladnosiv mentioned this pull request Mar 20, 2026

Scope streaming backlog coalescing to incremental_streaming_output mode #21037

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix streaming token ids data loss under load#19977

Fix streaming token ids data loss under load#19977
ishandhanani merged 4 commits intosgl-project:mainfrom
vladnosiv:fix-tokenizer-manager-queue

vladnosiv commented Mar 5, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 5, 2026

Uh oh!

ishandhanani commented Mar 6, 2026

Uh oh!

vladnosiv commented Mar 6, 2026

Uh oh!

ishandhanani commented Mar 7, 2026

Uh oh!

ishandhanani commented Mar 9, 2026

Uh oh!

vladnosiv commented Mar 13, 2026

Uh oh!

vladnosiv commented Mar 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vladnosiv commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed Changes

Uh oh!

gemini-code-assist bot commented Mar 5, 2026

Uh oh!

ishandhanani commented Mar 6, 2026

Uh oh!

vladnosiv commented Mar 6, 2026

Uh oh!

ishandhanani commented Mar 7, 2026

Uh oh!

ishandhanani commented Mar 9, 2026

Uh oh!

vladnosiv commented Mar 13, 2026

Uh oh!

vladnosiv commented Mar 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vladnosiv commented Mar 5, 2026 •

edited

Loading