fix(sglang): use incremental_streaming_output instead of deprecated stream_output by YAMY1234 · Pull Request #7642 · ai-dynamo/dynamo

YAMY1234 · 2026-03-25T23:41:26Z

Overview:

use incremental_streaming_output instead of deprecated stream_output

Details:

sglang renamed stream_output to incremental_streaming_output in sglang/sglang#20614. The old attribute assignment silently became a no-op, causing cumulative output_ids to be sent instead of disjoint deltas. This led to triangular-sum inflation of completion_tokens (~10x).

Before the fix:

============ Serving Benchmark Result ============
Successful requests:                     8         
Benchmark duration (s):                  22.07     
Total input tokens:                      8000      
Total generated tokens:                  84160     
Request throughput (req/s):              0.36      
Output token throughput (tok/s):         3813.62   
Total Token throughput (tok/s):          4176.14   
---------------Time to First Token----------------
Mean TTFT (ms):                          780.26    
Median TTFT (ms):                        775.64    
P99 TTFT (ms):                           817.16    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          0.97      
Median TPOT (ms):                        0.97      
P99 TPOT (ms):                           1.01      
---------------Inter-token Latency----------------
Mean ITL (ms):                           512.30    
Median ITL (ms):                         487.04    
P99 ITL (ms):                            1111.95   
----------------End-to-end Latency----------------
Mean E2EL (ms):                          11026.20  
Median E2EL (ms):                        11024.14  
P99 E2EL (ms):                           11437.12  
==================================================

After the fix:

============ Serving Benchmark Result ============
Successful requests:                     8         
Benchmark duration (s):                  20.50     
Total input tokens:                      7302      
Total generated tokens:                  7093      
Request throughput (req/s):              0.39      
Output token throughput (tok/s):         346.04    
Total Token throughput (tok/s):          702.28    
---------------Time to First Token----------------
Mean TTFT (ms):                          984.03    
Median TTFT (ms):                        771.13    
P99 TTFT (ms):                           1556.39   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          10.02     
Median TPOT (ms):                        9.80      
P99 TPOT (ms):                           10.84     
---------------Inter-token Latency----------------
Mean ITL (ms):                           10.00     
Median ITL (ms):                         9.75      
P99 ITL (ms):                            10.33     
----------------End-to-end Latency----------------
Mean E2EL (ms):                          9839.03   
Median E2EL (ms):                        9963.85   
P99 E2EL (ms):                           10276.21  
==================================================

Summary by CodeRabbit

Bug Fixes
- Updated streaming configuration for SGLang server compatibility.

copy-pr-bot · 2026-03-25T23:41:30Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-03-25T23:41:37Z

👋 Hi YAMY1234! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

components/src/dynamo/sglang/args.py

coderabbitai · 2026-03-25T23:46:11Z

Walkthrough

Updated SGLang ServerArgs streaming configuration in the parse_args function by replacing the stream_output flag with incremental_streaming_output and adjusting the corresponding comment. No control flow or other logic modifications.

Changes

Cohort / File(s)	Summary
SGLang Configuration Update `components/src/dynamo/sglang/args.py`	Replaced `server_args.stream_output = True` with `server_args.incremental_streaming_output = True` in the argument parsing logic, with comment adjustment to reflect the renamed configuration parameter.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: using incremental_streaming_output instead of the deprecated stream_output in the sglang configuration.
Description check	✅ Passed	The PR description includes an overview, detailed explanation of the problem (deprecated attribute causing no-op behavior), the fix applied, and comprehensive before/after benchmark results demonstrating the impact.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

components/src/dynamo/sglang/args.py (1)
372-373: Optional follow-up: align stale handler docstrings with the renamed flag.

You updated the config flag here, but related docs/comments still mention stream_output=True (for example in components/src/dynamo/sglang/request_handlers/llm/decode_handler.py and components/src/dynamo/sglang/request_handlers/multimodal/worker_handler.py). Renaming those to incremental_streaming_output would reduce future confusion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/sglang/args.py` around lines 372 - 373, Update
docstrings/comments that still reference the old flag name `stream_output=True`
to the new name `incremental_streaming_output` so they match the renamed config
in args.py; specifically search for occurrences in request handler docstrings
(e.g., in the decode handler function/class in
request_handlers/llm/decode_handler.py and the worker handler in
request_handlers/multimodal/worker_handler.py) and replace the wording (and any
example parameter usage) to use `incremental_streaming_output` while preserving
the original explanatory text and examples.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@components/src/dynamo/sglang/args.py`:
- Around line 372-373: Update docstrings/comments that still reference the old
flag name `stream_output=True` to the new name `incremental_streaming_output` so
they match the renamed config in args.py; specifically search for occurrences in
request handler docstrings (e.g., in the decode handler function/class in
request_handlers/llm/decode_handler.py and the worker handler in
request_handlers/multimodal/worker_handler.py) and replace the wording (and any
example parameter usage) to use `incremental_streaming_output` while preserving
the original explanatory text and examples.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4a6fd0a7-008b-4757-9b52-3d856f278181

📥 Commits

Reviewing files that changed from the base of the PR and between c7b2d2b and 785e5ac.

📒 Files selected for processing (1)

components/src/dynamo/sglang/args.py

rmccorm4 · 2026-03-25T23:48:36Z

Please sign your commit to pass DCO check

We can also get you added to the Dynamo github team so that this check will pass automatically for you in the future - please reach out on Slack for this

…tream_output sglang renamed `stream_output` to `incremental_streaming_output` in sglang/sglang#20614. The old attribute assignment silently became a no-op, causing cumulative output_ids to be sent instead of disjoint deltas. This led to triangular-sum inflation of completion_tokens (~10x). Signed-off-by: Yangmin Li <yangminl@nvidia.com>

YAMY1234 · 2026-03-25T23:52:26Z

Added, thanks! @rmccorm4

nvpohanh · 2026-04-01T07:49:59Z

@rmccorm4 When could we merge this? We urgently need this. Thanks!

rmccorm4

LGTM after this comment is addressed so we don't break in between versions: #7642 (comment)

rmccorm4 · 2026-04-01T07:54:47Z

/ok to test c5f81b5

sglang renamed `stream_output` to `incremental_streaming_output` in sglang/sglang#20614 (after v0.5.9). Use hasattr to detect which field exists so the fix works on both old and new sglang versions. Signed-off-by: Yangmin Li <yangminl@nvidia.com>

rmccorm4 · 2026-04-01T14:42:41Z

/ok to test 9fed398

YAMY1234 requested review from a team as code owners March 25, 2026 23:41

pull-request-size bot added the size/XS label Mar 25, 2026

github-actions bot added the fix label Mar 25, 2026

github-actions bot added backend::sglang Relates to the sglang backend external-contribution Pull request is from an external contributor fix and removed fix labels Mar 25, 2026

rmccorm4 reviewed Mar 25, 2026

View reviewed changes

components/src/dynamo/sglang/args.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 25, 2026

View reviewed changes

YAMY1234 force-pushed the fix/sglang-incremental-streaming-output branch from 785e5ac to c5f81b5 Compare March 25, 2026 23:51

rmccorm4 enabled auto-merge (squash) April 1, 2026 07:52

rmccorm4 approved these changes Apr 1, 2026

View reviewed changes

copy-pr-bot bot temporarily deployed to GITLAB April 1, 2026 07:54 Inactive

copy-pr-bot bot temporarily deployed to GITLAB April 1, 2026 07:55 Inactive

pull-request-size bot added size/S and removed size/XS labels Apr 1, 2026

copy-pr-bot bot temporarily deployed to GITLAB April 1, 2026 14:42 Inactive

copy-pr-bot bot temporarily deployed to GITLAB April 1, 2026 14:43 Inactive

rmccorm4 merged commit 8fe2082 into ai-dynamo:main Apr 1, 2026
77 checks passed

rmccorm4 mentioned this pull request Apr 1, 2026

fix(sglang): use incremental streaming output for completions #7752

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sglang): use incremental_streaming_output instead of deprecated stream_output#7642

fix(sglang): use incremental_streaming_output instead of deprecated stream_output#7642
rmccorm4 merged 2 commits intoai-dynamo:mainfrom
YAMY1234:fix/sglang-incremental-streaming-output

YAMY1234 commented Mar 25, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

Uh oh!

coderabbitai bot commented Mar 25, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

rmccorm4 commented Mar 25, 2026

Uh oh!

YAMY1234 commented Mar 25, 2026

Uh oh!

nvpohanh commented Apr 1, 2026

Uh oh!

rmccorm4 left a comment

Uh oh!

rmccorm4 commented Apr 1, 2026

Uh oh!

rmccorm4 commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

YAMY1234 commented Mar 25, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

Uh oh!

coderabbitai bot commented Mar 25, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

rmccorm4 commented Mar 25, 2026

Uh oh!

YAMY1234 commented Mar 25, 2026

Uh oh!

nvpohanh commented Apr 1, 2026

Uh oh!

rmccorm4 left a comment

Choose a reason for hiding this comment

Uh oh!

rmccorm4 commented Apr 1, 2026

Uh oh!

rmccorm4 commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

YAMY1234 commented Mar 25, 2026 •

edited by coderabbitai bot

Loading