Skip to content

Fix streaming token ids data loss under load#19977

Merged
ishandhanani merged 4 commits intosgl-project:mainfrom
vladnosiv:fix-tokenizer-manager-queue
Mar 18, 2026
Merged

Fix streaming token ids data loss under load#19977
ishandhanani merged 4 commits intosgl-project:mainfrom
vladnosiv:fix-tokenizer-manager-queue

Conversation

@vladnosiv
Copy link
Copy Markdown
Contributor

@vladnosiv vladnosiv commented Mar 5, 2026

When handling multiple scheduler batches before _wait_one_response is naturally scheduled, intermediate outputs accumulate in state.out_list. Previously, the code consumed only the last element (state.out_list[-1]).

While this masked the issue for cumulative text consumers, it introduced a severe bug for consumers reading output_ids limits. Since streaming token IDs are emitted as disjoint deltas, dropping intermediate entries from out_list caused silent, unrecoverable data loss (missing tokens, missing format in tool-calling, and output corruption).

This race condition is significantly amplified under high concurrency, especially when using --skip-tokenizer-init (e.g., in Dynamo environments), where the absence of a CPU-bound detokenizer allows ZMQ messages to arrive at wire speed and accumulate rapidly.

Proposed Changes

  • Atomically drain out_list: Differentiate between streaming and non-streaming requests. For streams, we now drain and yield all pending output dicts sequentially. For non-streams, we preserve the previous behavior of only taking the latest cumulative output (state.out_list[-1:]).
  • Preserve Finalization Logic: Ensured that request logging, metrics exporting, and abort handling still only trigger on the very last chunk (is_last) when the state is finished.
  • Add Observability: Added a warning when draining multiple queued chunks in streaming mode. This provides crucial observability into back-pressure dynamics and helps explain P99 TBT spikes.

Resolves #19976

cc @ishandhanani

Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@ishandhanani
Copy link
Copy Markdown
Collaborator

Thanks for the fix @vladnosiv this is important.

For completeness can you add a comment or update the PR description with a before/after

@ishandhanani ishandhanani self-assigned this Mar 6, 2026
@vladnosiv
Copy link
Copy Markdown
Contributor Author

Thanks for the fix @vladnosiv this is important.

For completeness can you add a comment or update the PR description with a before/after

added a description in the PR, described the problem in more detail in the linked issue

@ishandhanani
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@github-actions github-actions bot added the run-ci label Mar 7, 2026
@ishandhanani
Copy link
Copy Markdown
Collaborator

Only a single CI test failed in the previous run

@vladnosiv
Copy link
Copy Markdown
Contributor Author

@hnyls2002 Hi !
Please take a look

@vladnosiv
Copy link
Copy Markdown
Contributor Author

Seems like in registered/distributed/test_dp_attention.py test the server didn't even start
On my side test crashes due to using the lmsys/sglang-ci-dsv3-test private model, which I do not have access to
In any case, it looks irrelevant to the current PR, because the problem is still at the start of the test

@ishandhanani ishandhanani merged commit b9dba85 into sgl-project:main Mar 18, 2026
100 of 108 checks passed
lawrence-harmonic added a commit to lawrence-harmonic/sglang that referenced this pull request Mar 19, 2026
Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
dutsc pushed a commit to dutsc/sglang that referenced this pull request Mar 30, 2026
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Streaming token ids data loss under load (affects Nvidia Dynamo)

2 participants