fix(sglang): use incremental streaming output for completions by weireweire · Pull Request #7752 · ai-dynamo/dynamo

weireweire · 2026-04-01T06:27:08Z

Summary

Fix Dynamo's SGLang /v1/completions streaming assumptions on current SGLang main.

switch the SGLang integration to set incremental_streaming_output = True
prefer the backend-reported final completion_usage.completion_tokens when building final completions usage

Why

Dynamo still set server_args.stream_output = True, but current SGLang gates disjoint streaming chunks behind incremental_streaming_output.

Relevant SGLang change:

Rename --stream-output to --incremental-streaming-output (#20614)

Related SGLang follow-up that makes the incremental/cumulative split explicit:

Scope streaming backlog coalescing to incremental_streaming_output mode (#21037)

Without this update, Dynamo can mis-handle cumulative streaming output on /v1/completions, which can in turn skew usage.completion_tokens and downstream benchmark metrics.

Validation

python3 -m py_compile components/src/dynamo/sglang/args.py
cargo fmt --check --manifest-path lib/llm/Cargo.toml --all

Notes

I also attempted an end-to-end mounted run against the local dynamo checkout, but the container-side local source install path still needs extra environment work unrelated to these code changes.

Summary by CodeRabbit

Bug Fixes
- Improved accuracy of token counting in completion responses by properly updating completion token metrics from the worker instead of relying solely on accumulated token data.

copy-pr-bot · 2026-04-01T06:27:12Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-04-01T06:27:16Z

👋 Hi weireweire! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

coderabbitai · 2026-04-01T06:32:59Z

Walkthrough

Two changes across streaming configuration and token usage tracking. First change modifies SGLang argument parsing to use incremental_streaming_output instead of stream_output. Second change updates completion usage tracking to source both token types from worker-provided data when available.

Changes

Cohort / File(s)	Summary
SGLang Streaming Configuration `components/src/dynamo/sglang/args.py`	Modified argument parsing to set `incremental_streaming_output = True` instead of `stream_output = True` for controlling SGLang's disjoint-segment streaming behavior.
Token Usage Tracking `lib/llm/src/protocols/openai/completions/delta.rs`	Enhanced completion usage handling to update both `prompt_tokens` and `completion_tokens` from worker-provided `completion_usage` when available, instead of deriving `completion_tokens` solely from accumulated token IDs.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: switching SGLang to use incremental streaming output for completions, which aligns with the primary objective of the PR.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check	✅ Passed	The pull request description covers all required template sections: Overview/Summary, Details of changes, file callouts, and related issues with references.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

lib/llm/src/protocols/openai/completions/delta.rs (1)

283-291: Propagate completion_tokens_details when using backend-provided completion usage.

Line 286 now trusts backend completion_tokens, but only prompt_tokens_details is copied. If backend sends completion_tokens_details, it’s currently dropped.

Suggested patch

         if let Some(completion_usage) = delta.completion_usage.as_ref() {
             // Update prompt_tokens from worker if provided (e.g., for embeddings)
             self.usage.prompt_tokens = completion_usage.prompt_tokens;
             self.usage.completion_tokens = completion_usage.completion_tokens;
 
+            // Propagate completion token details if provided
+            if let Some(completion_details) = completion_usage.completion_tokens_details.as_ref() {
+                self.usage.completion_tokens_details = Some(completion_details.clone());
+            }
+
             // Propagate prompt token details if provided
             if let Some(prompt_details) = completion_usage.prompt_tokens_details.as_ref() {
                 self.usage.prompt_tokens_details = Some(prompt_details.clone());
             }
         }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@lib/llm/src/protocols/openai/completions/delta.rs` around lines 283 - 291,
The code updates self.usage from delta.completion_usage but only propagates
prompt_tokens_details; also copy completion_tokens_details when completion_usage
provides it. In the block handling delta.completion_usage (look for
delta.completion_usage.as_ref(), self.usage.prompt_tokens,
self.usage.completion_tokens and the prompt_tokens_details branch), add an
analogous branch that sets self.usage.completion_tokens_details =
Some(completion_tokens_details.clone()) when
completion_usage.completion_tokens_details.is_some(), ensuring backend-provided
completion token detail objects are not dropped.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@lib/llm/src/protocols/openai/completions/delta.rs`:
- Around line 283-291: The code updates self.usage from delta.completion_usage
but only propagates prompt_tokens_details; also copy completion_tokens_details
when completion_usage provides it. In the block handling delta.completion_usage
(look for delta.completion_usage.as_ref(), self.usage.prompt_tokens,
self.usage.completion_tokens and the prompt_tokens_details branch), add an
analogous branch that sets self.usage.completion_tokens_details =
Some(completion_tokens_details.clone()) when
completion_usage.completion_tokens_details.is_some(), ensuring backend-provided
completion token detail objects are not dropped.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 71a1f921-dace-43db-9c66-f0fe7a4504c0

📥 Commits

Reviewing files that changed from the base of the PR and between ab5a31b and 865ae87.

📒 Files selected for processing (2)

components/src/dynamo/sglang/args.py
lib/llm/src/protocols/openai/completions/delta.rs

rmccorm4 · 2026-04-01T22:25:49Z

Hi @weireweire , a similar PR was merged last night: #7642

Is this one still needed? Do you want to make a simpler PR for the completion_token detail usage change?

nvpohanh · 2026-04-02T00:23:00Z

Superceded by #7642

We can close this

weireweire · 2026-04-02T02:14:52Z

I think we can still merge this, as it's better to make all the compatible logic to _compat.py

weireweire · 2026-04-02T02:36:05Z

rebased, please review

nvpohanh · 2026-04-02T07:35:50Z

@rmccorm4 could you review this? thanks

rmccorm4 · 2026-04-02T21:31:48Z

Hi @weireweire, please fix the failing checks

rmccorm4 · 2026-04-02T21:32:01Z

/ok to test 4c873b4

Signed-off-by: Weiliangl User <weiliangl@login-node.hosted.internal>

weireweire · 2026-04-03T05:25:46Z

@rmccorm4 fixed

rmccorm4 · 2026-04-03T05:49:32Z

/ok to test e9239fc

weireweire requested review from a team as code owners April 1, 2026 06:27

pull-request-size bot added the size/XS label Apr 1, 2026

github-actions bot added fix external-contribution Pull request is from an external contributor backend::sglang Relates to the sglang backend frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Apr 1, 2026

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

pull-request-size bot added size/M and removed size/XS labels Apr 1, 2026

nvpohanh mentioned this pull request Apr 1, 2026

fix(sglang): use incremental_streaming_output instead of deprecated stream_output #7642

Merged

weireweire force-pushed the fix/sglang-incremental-completions-usage branch from bdad4c2 to af37781 Compare April 2, 2026 02:22

rmccorm4 approved these changes Apr 2, 2026

View reviewed changes

copy-pr-bot bot temporarily deployed to GITLAB April 2, 2026 21:32 Inactive

copy-pr-bot bot temporarily deployed to GITLAB April 3, 2026 00:37 Inactive

Weiliangl User added 4 commits April 3, 2026 02:56

fix(sglang): compat old and new streaming field

fe9ae0c

Signed-off-by: Weiliangl User <weiliangl@login-node.hosted.internal>

fix(llm): preserve completion token details

918f555

Signed-off-by: Weiliangl User <weiliangl@login-node.hosted.internal>

fix(llm): restore backend completion token usage

e7e612f

Signed-off-by: Weiliangl User <weiliangl@login-node.hosted.internal>

fix(llm): keep aggregated completion token usage

4e19428

Signed-off-by: Weiliangl User <weiliangl@login-node.hosted.internal>

style(sglang): sort args imports

e9239fc

Signed-off-by: Weiliangl User <weiliangl@login-node.hosted.internal>

weireweire force-pushed the fix/sglang-incremental-completions-usage branch from 984b518 to e9239fc Compare April 3, 2026 02:56

copy-pr-bot bot temporarily deployed to GITLAB April 3, 2026 05:49 Inactive

copy-pr-bot bot temporarily deployed to GITLAB April 3, 2026 08:44 Inactive

Conversation

weireweire commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Validation

Notes

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Apr 1, 2026

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

coderabbitai bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

rmccorm4 commented Apr 1, 2026

Uh oh!

nvpohanh commented Apr 2, 2026

Uh oh!

weireweire commented Apr 2, 2026

Uh oh!

weireweire commented Apr 2, 2026

Uh oh!

nvpohanh commented Apr 2, 2026

Uh oh!

rmccorm4 commented Apr 2, 2026

Uh oh!

rmccorm4 commented Apr 2, 2026

Uh oh!

weireweire commented Apr 3, 2026

Uh oh!

rmccorm4 commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

weireweire commented Apr 1, 2026 •

edited

Loading

coderabbitai bot commented Apr 1, 2026 •

edited

Loading