Skip to content

fix: guided decoding params handling in vLLM#4770

Merged
karen-sy merged 6 commits intoai-dynamo:mainfrom
vladnosiv:fix-guided-decoding-config-vllm
Dec 9, 2025
Merged

fix: guided decoding params handling in vLLM#4770
karen-sy merged 6 commits intoai-dynamo:mainfrom
vladnosiv:fix-guided-decoding-config-vllm

Conversation

@vladnosiv
Copy link
Contributor

@vladnosiv vladnosiv commented Dec 5, 2025

Overview:

After that PR vLLM changed the type of SO params from GuidedDecodingParams to StructuredOutputsParams.

The current handler code for the vLLM worker assigned an attribute based on guided decoding params name, which excluded the possibility of using guided decoding.

Summary by CodeRabbit

  • New Features

    • Added support for guided decoding with structured outputs, enabling constrained LLM outputs that adhere to specific JSON schemas or regex patterns.
  • Tests

    • Added test configurations for guided JSON schema and guided regex pattern constraints.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
@vladnosiv vladnosiv requested review from a team as code owners December 5, 2025 16:06
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 5, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2025

👋 Hi vladnosiv! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added external-contribution Pull request is from an external contributor fix labels Dec 5, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 5, 2025

Walkthrough

Support for guided decoding and structured outputs is added to vLLM handlers by importing StructuredOutputsParams and modifying build_sampling_params to convert guided_decoding into structured output parameters. Test configurations for JSON schema and regex-based guided decoding are introduced.

Changes

Cohort / File(s) Summary
vLLM Guided Decoding Support
components/src/dynamo/vllm/handlers.py
Added import of StructuredOutputsParams from vllm.sampling_params. Enhanced build_sampling_params to extract guided_decoding from sampling_options, convert it to a StructuredOutputsParams instance, assign it to sampling_params.structured_outputs, and apply backend configuration if provided. Modified loop logic to skip guided_decoding key after processing.
Test Configurations
tests/serve/test_vllm.py
Added two new vLLM test configurations (guided_decoding_json and guided_decoding_regex) to validate guided decoding functionality with JSON schema and regex pattern constraints respectively. Each configuration includes corresponding request payloads and expected response behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Extra attention areas:
    • The guided_decoding extraction and StructuredOutputsParams assignment logic in build_sampling_params requires careful verification to ensure backend handling and parameter mapping are correct
    • Test payloads should be validated to confirm they accurately exercise the new guided decoding pathways and that expected_response definitions are appropriate

Poem

🐰 A decoder blessed with structure new,
JSON schemas, regex patterns too!
Guided whispers shape the output stream,
Structured outputs fulfill the dream! ✨

Pre-merge checks

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description check ❓ Inconclusive The description provides an overview of the issue but is incomplete. It lacks details about the specific changes made, which files were modified, and lacks the 'Where should the reviewer start?' and 'Related Issues' sections from the template. Add sections for 'Details' (explaining the code changes), 'Where should the reviewer start?', and 'Related Issues' to match the PR description template.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: fixing guided decoding params handling in vLLM after an upstream API change.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b8c4a5f and acc9661.

📒 Files selected for processing (2)
  • components/src/dynamo/vllm/handlers.py (2 hunks)
  • tests/serve/test_vllm.py (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-06-08T08:28:20.100Z
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 1409
File: examples/router_standalone/router.py:113-118
Timestamp: 2025-06-08T08:28:20.100Z
Learning: In vLLM, TokensPrompt objects support dictionary-style access (e.g., prompt["prompt_token_ids"]) rather than attribute access (e.g., prompt.prompt_token_ids). The dictionary-style access is the correct way to extract prompt_token_ids from TokensPrompt objects. Attempting to use attribute access (prompt.prompt_token_ids) will result in an error.

Applied to files:

  • components/src/dynamo/vllm/handlers.py
📚 Learning: 2025-06-08T08:28:20.100Z
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 1409
File: examples/router_standalone/router.py:113-118
Timestamp: 2025-06-08T08:28:20.100Z
Learning: In vLLM, TokensPrompt objects support dictionary-style access (e.g., prompt["prompt_token_ids"]) rather than attribute access (e.g., prompt.prompt_token_ids). The dictionary-style access is the correct way to extract prompt_token_ids from TokensPrompt objects.

Applied to files:

  • components/src/dynamo/vllm/handlers.py
🧬 Code graph analysis (1)
tests/serve/test_vllm.py (1)
tests/utils/payload_builder.py (1)
  • chat_payload (129-156)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (5)
tests/serve/test_vllm.py (3)

464-473: Same extra_body issue applies here.

This test configuration also passes extra_body to chat_payload, which will fail for the same reason as the JSON config above.


431-475: Test configurations are well-structured for guided decoding validation.

The test designs are appropriate—using temperature=0.0 for deterministic outputs and checking for expected JSON keys/email format. The pre_merge marks ensure these run in CI.

However, ensure the extra_body parameter issue is resolved before merging.


438-454: Due to repository access limitations, I cannot complete verification of this review comment. To properly categorize this review, please provide:

  1. The current signature of chat_payload() from tests/utils/payload_builder.py
  2. Confirmation that the test code at lines 438-454 in tests/serve/test_vllm.py actually uses extra_body parameter

Alternatively, if you can restore repository access, I can complete a full verification.

components/src/dynamo/vllm/handlers.py (2)

98-104: LGTM on the skip logic.

Skipping guided_decoding in the generic loop is correct since it's already been converted to structured_outputs. The inline comment clearly explains the reasoning.


15-15: LGTM on the import addition.

The StructuredOutputsParams import from vllm.sampling_params is correct and aligns with vLLM's documented API for structured outputs. This class is used to configure structured-output generation for JSON, regex, choice, grammar, and other structural patterns.

Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
@rmccorm4
Copy link
Contributor

rmccorm4 commented Dec 8, 2025

/ok to test 5bbfc81

@karen-sy karen-sy enabled auto-merge (squash) December 8, 2025 18:14
@vladnosiv
Copy link
Contributor Author

@karen-sy thanks for the quick review!
I'm not sure, but it seems full build was launched on the commit before the last main merge, so the full build didn't run on this PR and it won't be able to merge automatically

@rmccorm4
Copy link
Contributor

rmccorm4 commented Dec 9, 2025

/ok to test 93cf7f1

@karen-sy karen-sy merged commit 3e4b480 into ai-dynamo:main Dec 9, 2025
29 of 30 checks passed
esoba pushed a commit to esoba/dynamo that referenced this pull request Dec 9, 2025
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Co-authored-by: Karen Chung <karenc@nvidia.com>
karen-sy added a commit that referenced this pull request Dec 9, 2025
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Co-authored-by: Karen Chung <karenc@nvidia.com>
zxue2 pushed a commit to zxue2/dynamo that referenced this pull request Dec 11, 2025
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Co-authored-by: Karen Chung <karenc@nvidia.com>
smatta-star pushed a commit to smatta-star/dynamo that referenced this pull request Dec 19, 2025
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Co-authored-by: Karen Chung <karenc@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contribution Pull request is from an external contributor fix size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants