fix(sglang): use incremental_streaming_output instead of deprecated stream_output#7642
Conversation
|
👋 Hi YAMY1234! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughUpdated SGLang ServerArgs streaming configuration in the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
components/src/dynamo/sglang/args.py (1)
372-373: Optional follow-up: align stale handler docstrings with the renamed flag.You updated the config flag here, but related docs/comments still mention
stream_output=True(for example incomponents/src/dynamo/sglang/request_handlers/llm/decode_handler.pyandcomponents/src/dynamo/sglang/request_handlers/multimodal/worker_handler.py). Renaming those toincremental_streaming_outputwould reduce future confusion.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/src/dynamo/sglang/args.py` around lines 372 - 373, Update docstrings/comments that still reference the old flag name `stream_output=True` to the new name `incremental_streaming_output` so they match the renamed config in args.py; specifically search for occurrences in request handler docstrings (e.g., in the decode handler function/class in request_handlers/llm/decode_handler.py and the worker handler in request_handlers/multimodal/worker_handler.py) and replace the wording (and any example parameter usage) to use `incremental_streaming_output` while preserving the original explanatory text and examples.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@components/src/dynamo/sglang/args.py`:
- Around line 372-373: Update docstrings/comments that still reference the old
flag name `stream_output=True` to the new name `incremental_streaming_output` so
they match the renamed config in args.py; specifically search for occurrences in
request handler docstrings (e.g., in the decode handler function/class in
request_handlers/llm/decode_handler.py and the worker handler in
request_handlers/multimodal/worker_handler.py) and replace the wording (and any
example parameter usage) to use `incremental_streaming_output` while preserving
the original explanatory text and examples.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4a6fd0a7-008b-4757-9b52-3d856f278181
📒 Files selected for processing (1)
components/src/dynamo/sglang/args.py
|
Please sign your commit to pass DCO check We can also get you added to the Dynamo github team so that this check will pass automatically for you in the future - please reach out on Slack for this |
…tream_output sglang renamed `stream_output` to `incremental_streaming_output` in sglang/sglang#20614. The old attribute assignment silently became a no-op, causing cumulative output_ids to be sent instead of disjoint deltas. This led to triangular-sum inflation of completion_tokens (~10x). Signed-off-by: Yangmin Li <yangminl@nvidia.com>
785e5ac to
c5f81b5
Compare
|
Added, thanks! @rmccorm4 |
|
@rmccorm4 When could we merge this? We urgently need this. Thanks! |
rmccorm4
left a comment
There was a problem hiding this comment.
LGTM after this comment is addressed so we don't break in between versions: #7642 (comment)
|
/ok to test c5f81b5 |
sglang renamed `stream_output` to `incremental_streaming_output` in sglang/sglang#20614 (after v0.5.9). Use hasattr to detect which field exists so the fix works on both old and new sglang versions. Signed-off-by: Yangmin Li <yangminl@nvidia.com>
|
/ok to test 9fed398 |
Overview:
use incremental_streaming_output instead of deprecated stream_output
Details:
sglang renamed
stream_outputtoincremental_streaming_outputin sglang/sglang#20614. The old attribute assignment silently became a no-op, causing cumulative output_ids to be sent instead of disjoint deltas. This led to triangular-sum inflation of completion_tokens (~10x).Before the fix:
After the fix:
Summary by CodeRabbit