feat: sglang guided decoding support#6620
Conversation
Signed-off-by: jellysnack <oleg.jellysnack@gmail.com>
Signed-off-by: jellysnack <oleg.jellysnack@gmail.com>
|
👋 Hi jellysnack! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughThis change refactors tokenizer flag handling by replacing Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/src/dynamo/sglang/request_handlers/handler_base.py`:
- Around line 391-400: The helper _get_guided_decoding_params currently only
reads guided_decoding["json"] and will ignore requests using
guided_decoding["json_schema"]; update _get_guided_decoding_params to accept
both keys (prefer "json_schema" if present, fall back to "json") and return
{"json_schema": json.dumps(value)} when either is supplied so schema constraints
are preserved; ensure the existing type check on guided_decoding
(isinstance(..., dict)) remains and return {} when neither key exists.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
components/src/dynamo/sglang/args.pycomponents/src/dynamo/sglang/register.pycomponents/src/dynamo/sglang/request_handlers/handler_base.pycomponents/src/dynamo/sglang/request_handlers/llm/decode_handler.pycomponents/src/dynamo/sglang/request_handlers/llm/diffusion_handler.pycomponents/src/dynamo/sglang/request_handlers/llm/prefill_handler.py
| @staticmethod | ||
| def _get_guided_decoding_params( | ||
| guided_decoding: Optional[Dict[str, Any]], | ||
| ) -> Dict[str, Any]: | ||
| """Extract guided decoding params (e.g. json_schema) for SGLang sampling_params.""" | ||
| if isinstance(guided_decoding, dict): | ||
| json_schema = guided_decoding.get("json") | ||
| if json_schema is not None: | ||
| return {"json_schema": json.dumps(json_schema)} | ||
| return {} |
There was a problem hiding this comment.
Guided decoding key mismatch can drop schema constraints.
At Line 397, only guided_decoding["json"] is read. Requests that provide guided_decoding["json_schema"] will silently skip guided decoding.
💡 Proposed compatibility fix
def _get_guided_decoding_params(
guided_decoding: Optional[Dict[str, Any]],
) -> Dict[str, Any]:
"""Extract guided decoding params (e.g. json_schema) for SGLang sampling_params."""
if isinstance(guided_decoding, dict):
- json_schema = guided_decoding.get("json")
+ json_schema = guided_decoding.get("json_schema")
+ if json_schema is None:
+ json_schema = guided_decoding.get("json")
if json_schema is not None:
return {"json_schema": json.dumps(json_schema)}
return {}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@components/src/dynamo/sglang/request_handlers/handler_base.py` around lines
391 - 400, The helper _get_guided_decoding_params currently only reads
guided_decoding["json"] and will ignore requests using
guided_decoding["json_schema"]; update _get_guided_decoding_params to accept
both keys (prefer "json_schema" if present, fall back to "json") and return
{"json_schema": json.dumps(value)} when either is supplied so schema constraints
are preserved; ensure the existing type check on guided_decoding
(isinstance(..., dict)) remains and return {} when neither key exists.
|
Hey @jellysnack - thank you for the PR. Sorry have been overall very busy. This PR is on my TODO list to review this week |
Signed-off-by: jellysnack <158609015+jellysnack@users.noreply.github.com>
|
just a gentle ping on this PR |
|
@ishandhanani Could you take a look when you get a chance? Thanks! |
|
/ok to test a18b513 |
|
@jellysnack Thanks for contributing. Gentle reminder to add a reproducer and output before and after fix. |
|
sorry for jumping in. Need to refresh this PR to pull a fix for Allure reporting. |
|
/ok to test ac19772 |
Overview:
Add guided decoding support for the SGLang engine and replace
skip_tokenizer_initwithuse_sglang_tokenizerfor tokenizer selection.Details:
SGLang's
GrammarManagerdisablesgrammar_backendwhenskip_tokenizer_init=True(ref), which makes guided decoding impossible. To fix this:skip_tokenizer_init=Trueoverride. SGLang still supports both token-based and text-based inputs without it.skip_tokenizer_initreferences withuse_sglang_tokenizerthroughout handlers and registration logic._get_guided_decoding_params()helper inBaseWorkerHandlerto extractjson_schemafromguided_decodingrequest params and forward them to SGLang's sampling params.DecodeWorkerHandlerandPrefillWorkerHandler.Where should the reviewer start?
components/src/dynamo/sglang/request_handlers/handler_base.py– new_get_guided_decoding_params()methodcomponents/src/dynamo/sglang/args.py– removal ofskip_tokenizer_initcomponents/src/dynamo/sglang/request_handlers/llm/decode_handler.py– guided decoding integration andskip_tokenizer_inittouse_sglang_tokenizerswitchRelated Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit