fix: guided decoding params handling in vLLM by vladnosiv · Pull Request #4770 · ai-dynamo/dynamo

vladnosiv · 2025-12-05T16:06:52Z

Overview:

After that PR vLLM changed the type of SO params from GuidedDecodingParams to StructuredOutputsParams.

The current handler code for the vLLM worker assigned an attribute based on guided decoding params name, which excluded the possibility of using guided decoding.

Summary by CodeRabbit

New Features
- Added support for guided decoding with structured outputs, enabling constrained LLM outputs that adhere to specific JSON schemas or regex patterns.
Tests
- Added test configurations for guided JSON schema and guided regex pattern constraints.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>

copy-pr-bot · 2025-12-05T16:06:55Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2025-12-05T16:07:07Z

👋 Hi vladnosiv! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

coderabbitai · 2025-12-05T16:10:34Z

Walkthrough

Support for guided decoding and structured outputs is added to vLLM handlers by importing StructuredOutputsParams and modifying build_sampling_params to convert guided_decoding into structured output parameters. Test configurations for JSON schema and regex-based guided decoding are introduced.

Changes

Cohort / File(s)	Summary
vLLM Guided Decoding Support `components/src/dynamo/vllm/handlers.py`	Added import of `StructuredOutputsParams` from `vllm.sampling_params`. Enhanced `build_sampling_params` to extract guided_decoding from sampling_options, convert it to a `StructuredOutputsParams` instance, assign it to `sampling_params.structured_outputs`, and apply backend configuration if provided. Modified loop logic to skip guided_decoding key after processing.
Test Configurations `tests/serve/test_vllm.py`	Added two new vLLM test configurations (`guided_decoding_json` and `guided_decoding_regex`) to validate guided decoding functionality with JSON schema and regex pattern constraints respectively. Each configuration includes corresponding request payloads and expected response behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Extra attention areas:
- The guided_decoding extraction and StructuredOutputsParams assignment logic in build_sampling_params requires careful verification to ensure backend handling and parameter mapping are correct
- Test payloads should be validated to confirm they accurately exercise the new guided decoding pathways and that expected_response definitions are appropriate

Poem

🐰 A decoder blessed with structure new,
JSON schemas, regex patterns too!
Guided whispers shape the output stream,
Structured outputs fulfill the dream! ✨

Pre-merge checks

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	The description provides an overview of the issue but is incomplete. It lacks details about the specific changes made, which files were modified, and lacks the 'Where should the reviewer start?' and 'Related Issues' sections from the template.	Add sections for 'Details' (explaining the code changes), 'Where should the reviewer start?', and 'Related Issues' to match the PR description template.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: fixing guided decoding params handling in vLLM after an upstream API change.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b8c4a5f and acc9661.

📒 Files selected for processing (2)

components/src/dynamo/vllm/handlers.py (2 hunks)
tests/serve/test_vllm.py (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-06-08T08:28:20.100Z

Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 1409
File: examples/router_standalone/router.py:113-118
Timestamp: 2025-06-08T08:28:20.100Z
Learning: In vLLM, TokensPrompt objects support dictionary-style access (e.g., prompt["prompt_token_ids"]) rather than attribute access (e.g., prompt.prompt_token_ids). The dictionary-style access is the correct way to extract prompt_token_ids from TokensPrompt objects. Attempting to use attribute access (prompt.prompt_token_ids) will result in an error.

Applied to files:

components/src/dynamo/vllm/handlers.py

📚 Learning: 2025-06-08T08:28:20.100Z

Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 1409
File: examples/router_standalone/router.py:113-118
Timestamp: 2025-06-08T08:28:20.100Z
Learning: In vLLM, TokensPrompt objects support dictionary-style access (e.g., prompt["prompt_token_ids"]) rather than attribute access (e.g., prompt.prompt_token_ids). The dictionary-style access is the correct way to extract prompt_token_ids from TokensPrompt objects.

Applied to files:

components/src/dynamo/vllm/handlers.py

🧬 Code graph analysis (1)

tests/serve/test_vllm.py (1)

tests/utils/payload_builder.py (1)

chat_payload (129-156)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and Test - dynamo

🔇 Additional comments (5)

tests/serve/test_vllm.py (3)

464-473: Same extra_body issue applies here.

This test configuration also passes extra_body to chat_payload, which will fail for the same reason as the JSON config above.

431-475: Test configurations are well-structured for guided decoding validation.

The test designs are appropriate—using temperature=0.0 for deterministic outputs and checking for expected JSON keys/email format. The pre_merge marks ensure these run in CI.

However, ensure the extra_body parameter issue is resolved before merging.

438-454: Due to repository access limitations, I cannot complete verification of this review comment. To properly categorize this review, please provide:

The current signature of chat_payload() from tests/utils/payload_builder.py

Confirmation that the test code at lines 438-454 in tests/serve/test_vllm.py actually uses extra_body parameter

Alternatively, if you can restore repository access, I can complete a full verification.

components/src/dynamo/vllm/handlers.py (2)

98-104: LGTM on the skip logic.

Skipping guided_decoding in the generic loop is correct since it's already been converted to structured_outputs. The inline comment clearly explains the reasoning.

15-15: LGTM on the import addition.

The StructuredOutputsParams import from vllm.sampling_params is correct and aligns with vLLM's documented API for structured outputs. This class is used to configure structured-output generation for JSON, regex, choice, grammar, and other structural patterns.

components/src/dynamo/vllm/handlers.py

Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>

tests/serve/test_vllm.py

rmccorm4 · 2025-12-08T18:13:47Z

/ok to test 5bbfc81

vladnosiv · 2025-12-08T20:02:57Z

@karen-sy thanks for the quick review!
I'm not sure, but it seems full build was launched on the commit before the last main merge, so the full build didn't run on this PR and it won't be able to merge automatically

rmccorm4 · 2025-12-09T00:28:27Z

/ok to test 93cf7f1

Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com> Co-authored-by: Karen Chung <karenc@nvidia.com>

fix guided decoding params handling in vllm

acc9661

Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>

vladnosiv requested review from a team as code owners December 5, 2025 16:06

pull-request-size bot added the size/M label Dec 5, 2025

github-actions bot added external-contribution Pull request is from an external contributor fix labels Dec 5, 2025

coderabbitai bot reviewed Dec 5, 2025

View reviewed changes

components/src/dynamo/vllm/handlers.py Outdated Show resolved Hide resolved

rmccorm4 requested review from karen-sy and ptarasiewiczNV December 5, 2025 16:16

vladnosiv added 2 commits December 5, 2025 19:18

remove backend from request-level params

34f3937

Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>

add extra_body in payload builder

5bbfc81

Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>

karen-sy reviewed Dec 6, 2025

View reviewed changes

tests/serve/test_vllm.py Show resolved Hide resolved

karen-sy approved these changes Dec 8, 2025

View reviewed changes

Merge branch 'main' into fix-guided-decoding-config-vllm

19245cd

karen-sy enabled auto-merge (squash) December 8, 2025 18:14

karen-sy added 2 commits December 8, 2025 13:24

Merge branch 'main' into fix-guided-decoding-config-vllm

8cffeec

Merge branch 'main' into fix-guided-decoding-config-vllm

93cf7f1

copy-pr-bot bot had a problem deploying to GITLAB December 9, 2025 00:28 Failure

karen-sy merged commit 3e4b480 into ai-dynamo:main Dec 9, 2025
29 of 30 checks passed

esoba pushed a commit to esoba/dynamo that referenced this pull request Dec 9, 2025

fix: guided decoding params handling in vLLM (ai-dynamo#4770)

eaf319b

Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com> Co-authored-by: Karen Chung <karenc@nvidia.com>

karen-sy added a commit that referenced this pull request Dec 9, 2025

fix: guided decoding params handling in vLLM (#4770)

c014555

Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com> Co-authored-by: Karen Chung <karenc@nvidia.com>

zxue2 pushed a commit to zxue2/dynamo that referenced this pull request Dec 11, 2025

fix: guided decoding params handling in vLLM (ai-dynamo#4770)

5c255b4

Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com> Co-authored-by: Karen Chung <karenc@nvidia.com>

smatta-star pushed a commit to smatta-star/dynamo that referenced this pull request Dec 19, 2025

fix: guided decoding params handling in vLLM (ai-dynamo#4770)

299c994

Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com> Co-authored-by: Karen Chung <karenc@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: guided decoding params handling in vLLM#4770

fix: guided decoding params handling in vLLM#4770
karen-sy merged 6 commits intoai-dynamo:mainfrom
vladnosiv:fix-guided-decoding-config-vllm

vladnosiv commented Dec 5, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Dec 5, 2025

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

coderabbitai bot commented Dec 5, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

rmccorm4 commented Dec 8, 2025

Uh oh!

vladnosiv commented Dec 8, 2025

Uh oh!

rmccorm4 commented Dec 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vladnosiv commented Dec 5, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Dec 5, 2025

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

coderabbitai bot commented Dec 5, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rmccorm4 commented Dec 8, 2025

Uh oh!

vladnosiv commented Dec 8, 2025

Uh oh!

rmccorm4 commented Dec 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vladnosiv commented Dec 5, 2025 •

edited by coderabbitai bot

Loading