Skip to content

feat: sglang guided decoding support#6620

Open
jellysnack wants to merge 9 commits intoai-dynamo:mainfrom
jellysnack:feat/sglang-guided-decoding-support
Open

feat: sglang guided decoding support#6620
jellysnack wants to merge 9 commits intoai-dynamo:mainfrom
jellysnack:feat/sglang-guided-decoding-support

Conversation

@jellysnack
Copy link
Copy Markdown

@jellysnack jellysnack commented Feb 26, 2026

Overview:

Add guided decoding support for the SGLang engine and replace skip_tokenizer_init with use_sglang_tokenizer for tokenizer selection.

Details:

SGLang's GrammarManager disables grammar_backend when skip_tokenizer_init=True (ref), which makes guided decoding impossible. To fix this:

  • Removed the skip_tokenizer_init=True override. SGLang still supports both token-based and text-based inputs without it.
  • Replaced all skip_tokenizer_init references with use_sglang_tokenizer throughout handlers and registration logic.
  • Added _get_guided_decoding_params() helper in BaseWorkerHandler to extract json_schema from guided_decoding request params and forward them to SGLang's sampling params.
  • Wired guided decoding params into DecodeWorkerHandler and PrefillWorkerHandler.

Where should the reviewer start?

  • components/src/dynamo/sglang/request_handlers/handler_base.py – new _get_guided_decoding_params() method
  • components/src/dynamo/sglang/args.py – removal of skip_tokenizer_init
  • components/src/dynamo/sglang/request_handlers/llm/decode_handler.py – guided decoding integration and skip_tokenizer_init to use_sglang_tokenizer switch

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

  • Refactor
    • Simplified internal tokenizer initialization configuration and improved guided decoding parameter handling across request processing pipeline.

Signed-off-by: jellysnack <oleg.jellysnack@gmail.com>
Signed-off-by: jellysnack <oleg.jellysnack@gmail.com>
@jellysnack jellysnack requested review from a team as code owners February 26, 2026 10:01
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 26, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi jellysnack! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added feat external-contribution Pull request is from an external contributor backend::sglang Relates to the sglang backend labels Feb 26, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 26, 2026

Walkthrough

This change refactors tokenizer flag handling by replacing skip_tokenizer_init with a new use_sglang_tokenizer flag across sglang handler files. Additionally, a new _get_guided_decoding_params helper method is introduced to extract JSON schema parameters from guided decoding configurations.

Changes

Cohort / File(s) Summary
Tokenizer Flag Migration
components/src/dynamo/sglang/args.py, components/src/dynamo/sglang/register.py
Removed direct assignment of skip_tokenizer_init flag; updated condition checks to use use_sglang_tokenizer instead, with simplified logging messages reflecting the tokenizer choice without state mutation.
Handler Core Updates
components/src/dynamo/sglang/request_handlers/handler_base.py
Added new static method _get_guided_decoding_params() to extract JSON schema from guided decoding dictionaries; replaced skip_tokenizer_init with use_sglang_tokenizer flag for tokenizer initialization control in InputParamManager.
Handler Implementations
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py, components/src/dynamo/sglang/request_handlers/llm/diffusion_handler.py, components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py
Replaced conditional branching from skip_tokenizer_init to use_sglang_tokenizer for stream processor selection; integrated guided decoding parameters into sampling configurations by merging _get_guided_decoding_params() results.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A tokenizer flag takes flight,
From skip to use—a clearer sight!
Guided decoding joins the show,
Through handlers six, the refactors flow! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main feature addition: guided decoding support for SGLang, which aligns with the core objective of the PR.
Description check ✅ Passed The description covers all required template sections: Overview explains the feature and problem being solved, Details outlines the specific changes, Where to review identifies key files, and Related Issues section is present (though no issues are referenced).
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/src/dynamo/sglang/request_handlers/handler_base.py`:
- Around line 391-400: The helper _get_guided_decoding_params currently only
reads guided_decoding["json"] and will ignore requests using
guided_decoding["json_schema"]; update _get_guided_decoding_params to accept
both keys (prefer "json_schema" if present, fall back to "json") and return
{"json_schema": json.dumps(value)} when either is supplied so schema constraints
are preserved; ensure the existing type check on guided_decoding
(isinstance(..., dict)) remains and return {} when neither key exists.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db88c95 and 7390085.

📒 Files selected for processing (6)
  • components/src/dynamo/sglang/args.py
  • components/src/dynamo/sglang/register.py
  • components/src/dynamo/sglang/request_handlers/handler_base.py
  • components/src/dynamo/sglang/request_handlers/llm/decode_handler.py
  • components/src/dynamo/sglang/request_handlers/llm/diffusion_handler.py
  • components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py

Comment on lines +391 to +400
@staticmethod
def _get_guided_decoding_params(
guided_decoding: Optional[Dict[str, Any]],
) -> Dict[str, Any]:
"""Extract guided decoding params (e.g. json_schema) for SGLang sampling_params."""
if isinstance(guided_decoding, dict):
json_schema = guided_decoding.get("json")
if json_schema is not None:
return {"json_schema": json.dumps(json_schema)}
return {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guided decoding key mismatch can drop schema constraints.

At Line 397, only guided_decoding["json"] is read. Requests that provide guided_decoding["json_schema"] will silently skip guided decoding.

💡 Proposed compatibility fix
     def _get_guided_decoding_params(
         guided_decoding: Optional[Dict[str, Any]],
     ) -> Dict[str, Any]:
         """Extract guided decoding params (e.g. json_schema) for SGLang sampling_params."""
         if isinstance(guided_decoding, dict):
-            json_schema = guided_decoding.get("json")
+            json_schema = guided_decoding.get("json_schema")
+            if json_schema is None:
+                json_schema = guided_decoding.get("json")
             if json_schema is not None:
                 return {"json_schema": json.dumps(json_schema)}
         return {}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@components/src/dynamo/sglang/request_handlers/handler_base.py` around lines
391 - 400, The helper _get_guided_decoding_params currently only reads
guided_decoding["json"] and will ignore requests using
guided_decoding["json_schema"]; update _get_guided_decoding_params to accept
both keys (prefer "json_schema" if present, fall back to "json") and return
{"json_schema": json.dumps(value)} when either is supplied so schema constraints
are preserved; ensure the existing type check on guided_decoding
(isinstance(..., dict)) remains and return {} when neither key exists.

@ishandhanani
Copy link
Copy Markdown
Contributor

Hey @jellysnack - thank you for the PR. Sorry have been overall very busy. This PR is on my TODO list to review this week

@jellysnack
Copy link
Copy Markdown
Author

just a gentle ping on this PR

@jellysnack
Copy link
Copy Markdown
Author

@ishandhanani Could you take a look when you get a chance? Thanks!

@rmccorm4
Copy link
Copy Markdown
Contributor

rmccorm4 commented Apr 7, 2026

/ok to test a18b513

@ayushag-nv
Copy link
Copy Markdown
Contributor

@jellysnack Thanks for contributing. Gentle reminder to add a reproducer and output before and after fix.

@dmitry-tokarev-nv
Copy link
Copy Markdown
Contributor

sorry for jumping in. Need to refresh this PR to pull a fix for Allure reporting.

@dmitry-tokarev-nv
Copy link
Copy Markdown
Contributor

/ok to test ac19772

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend::sglang Relates to the sglang backend external-contribution Pull request is from an external contributor feat size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants