feat(sglang): enforce stream_output=True for optimal streaming performance by MatejKosec · Pull Request #5510 · ai-dynamo/dynamo

MatejKosec · 2026-01-20T19:59:15Z

Summary

Enforce stream_output=True in SGLang ServerArgs for Dynamo
Update streaming handlers to pass through disjoint token segments directly (no more cumulative-to-delta conversion)
Applies to both LLM decode handler and multimodal worker handler

Description

With stream_output=True, SGLang sends only new tokens since the last output (disjoint segments) rather than all tokens generated so far (cumulative). This change:

Forces stream_output=True in args.py after parsing ServerArgs
Simplifies _process_token_stream in decode_handler - removes tracking/slicing logic
Simplifies process_sglang_stream in multimodal worker_handler - same fix

This aligns Dynamo with SGLang's efficient streaming mode, reducing redundant data transfer.

Summary by CodeRabbit

Bug Fixes
- Improved token streaming to deliver disjoint segments instead of cumulative tokens, ensuring more accurate and granular token delivery during streaming operations.
- Enabled stream output mode in server configuration for consistent streaming behavior across LLM and multimodal handlers.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…mance Dynamo's streaming handlers now expect disjoint output_ids from SGLang (only new tokens since last output) rather than cumulative tokens. Changes: - Force stream_output=True in args.py after parsing ServerArgs - Update decode_handler to pass through disjoint token segments directly - Update multimodal worker_handler with the same fix This aligns Dynamo with SGLang's efficient streaming mode where only delta tokens are transmitted, reducing redundant data transfer. Signed-off-by: Matej Kosec <mkosec@nvidia.com>

coderabbitai · 2026-01-20T20:04:12Z

Walkthrough

The pull request enforces stream output with disjoint token segments across Dynamo's SGLang integration. Stream output is now forcibly enabled in argument parsing, and both token stream handlers are refactored to forward token segments directly rather than computing them from running totals or offsets.

Changes

Cohort / File(s)	Summary
Stream Output Configuration `components/src/dynamo/sglang/args.py`	Unconditionally enables `stream_output` on server args after parsing, logging a message if it was previously disabled. Ensures consistent streaming behavior.
Token Stream Processing `components/src/dynamo/sglang/request_handlers/llm/decode_handler.py`, `components/src/dynamo/sglang/request_handlers/multimodal/worker_handler.py`	Refactored `_process_token_stream` and `StreamProcessor.process_sglang_stream` to use disjoint token segments from `output_ids` instead of slicing from running offsets. Removes accumulated offset tracking and clarifies streaming semantics. Error handling preserved.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Streams now flow in segments bright,
No more offsets to recalculate each night,
Tokens hop along, disjoint and free,
SGLang streams as they should be! 🎉

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: enforcing stream_output=True in SGLang for optimal streaming, which is the primary objective across all three modified files.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check	✅ Passed	The PR description follows the required template structure with all key sections present: Overview (Summary), Details, and related guidance. It clearly explains the changes, rationale, and includes a test plan.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

components/src/dynamo/sglang/args.py

Signed-off-by: Matej Kosec <mkosec@nvidia.com>

…mance (#5510) This ensures that only new tokens are returned by sglang which avoids the overhead from creating copies of the entire token sequences per each iteration. These copies can become a bottleneck particularly for long sequence lengths and large concurrency counts. Signed-off-by: Matej Kosec <mkosec@nvidia.com> Signed-off-by: davilu <davilu@nvidia.com>

…mance (ai-dynamo#5510) This ensures that only new tokens are returned by sglang which avoids the overhead from creating copies of the entire token sequences per each iteration. These copies can become a bottleneck particularly for long sequence lengths and large concurrency counts. Signed-off-by: Matej Kosec <mkosec@nvidia.com>

MatejKosec requested review from a team as code owners January 20, 2026 19:59

pull-request-size bot added the size/M label Jan 20, 2026

github-actions bot added feat backend::sglang Relates to the sglang backend multimodal labels Jan 20, 2026

MatejKosec requested review from Aphoh and ishandhanani January 20, 2026 20:00

grahamking approved these changes Jan 20, 2026

View reviewed changes

grahamking reviewed Jan 20, 2026

View reviewed changes

components/src/dynamo/sglang/args.py Outdated Show resolved Hide resolved

remove unnecessary log message

cd574a8

Signed-off-by: Matej Kosec <mkosec@nvidia.com>

copy-pr-bot bot temporarily deployed to GITLAB January 20, 2026 20:31 Inactive

copy-pr-bot bot temporarily deployed to GITLAB January 20, 2026 20:33 Inactive

ishandhanani approved these changes Jan 20, 2026

View reviewed changes

MatejKosec merged commit 748fee6 into main Jan 20, 2026
32 of 33 checks passed

MatejKosec deleted the user/mkosec/enforce_sglang_reponse_streaming_flag branch January 20, 2026 22:19

ishandhanani mentioned this pull request Apr 6, 2026

fix(sglang): stop forcing incremental_streaming_output to fix high-concurrency throughput regression #7910

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sglang): enforce stream_output=True for optimal streaming performance#5510

feat(sglang): enforce stream_output=True for optimal streaming performance#5510
MatejKosec merged 2 commits intomainfrom
user/mkosec/enforce_sglang_reponse_streaming_flag

MatejKosec commented Jan 20, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

MatejKosec commented Jan 20, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Description

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MatejKosec commented Jan 20, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 20, 2026 •

edited

Loading