fix: guided decoding params handling in vLLM#4770
Conversation
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
|
👋 Hi vladnosiv! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughSupport for guided decoding and structured outputs is added to vLLM handlers by importing Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/src/dynamo/vllm/handlers.py(2 hunks)tests/serve/test_vllm.py(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-06-08T08:28:20.100Z
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 1409
File: examples/router_standalone/router.py:113-118
Timestamp: 2025-06-08T08:28:20.100Z
Learning: In vLLM, TokensPrompt objects support dictionary-style access (e.g., prompt["prompt_token_ids"]) rather than attribute access (e.g., prompt.prompt_token_ids). The dictionary-style access is the correct way to extract prompt_token_ids from TokensPrompt objects. Attempting to use attribute access (prompt.prompt_token_ids) will result in an error.
Applied to files:
components/src/dynamo/vllm/handlers.py
📚 Learning: 2025-06-08T08:28:20.100Z
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 1409
File: examples/router_standalone/router.py:113-118
Timestamp: 2025-06-08T08:28:20.100Z
Learning: In vLLM, TokensPrompt objects support dictionary-style access (e.g., prompt["prompt_token_ids"]) rather than attribute access (e.g., prompt.prompt_token_ids). The dictionary-style access is the correct way to extract prompt_token_ids from TokensPrompt objects.
Applied to files:
components/src/dynamo/vllm/handlers.py
🧬 Code graph analysis (1)
tests/serve/test_vllm.py (1)
tests/utils/payload_builder.py (1)
chat_payload(129-156)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (5)
tests/serve/test_vllm.py (3)
464-473: Sameextra_bodyissue applies here.This test configuration also passes
extra_bodytochat_payload, which will fail for the same reason as the JSON config above.
431-475: Test configurations are well-structured for guided decoding validation.The test designs are appropriate—using
temperature=0.0for deterministic outputs and checking for expected JSON keys/email format. Thepre_mergemarks ensure these run in CI.However, ensure the
extra_bodyparameter issue is resolved before merging.
438-454: Due to repository access limitations, I cannot complete verification of this review comment. To properly categorize this review, please provide:
- The current signature of
chat_payload()fromtests/utils/payload_builder.py- Confirmation that the test code at lines 438-454 in
tests/serve/test_vllm.pyactually usesextra_bodyparameterAlternatively, if you can restore repository access, I can complete a full verification.
components/src/dynamo/vllm/handlers.py (2)
98-104: LGTM on the skip logic.Skipping
guided_decodingin the generic loop is correct since it's already been converted tostructured_outputs. The inline comment clearly explains the reasoning.
15-15: LGTM on the import addition.The
StructuredOutputsParamsimport fromvllm.sampling_paramsis correct and aligns with vLLM's documented API for structured outputs. This class is used to configure structured-output generation for JSON, regex, choice, grammar, and other structural patterns.
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
|
/ok to test 5bbfc81 |
|
@karen-sy thanks for the quick review! |
|
/ok to test 93cf7f1 |
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com> Co-authored-by: Karen Chung <karenc@nvidia.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com> Co-authored-by: Karen Chung <karenc@nvidia.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com> Co-authored-by: Karen Chung <karenc@nvidia.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com> Co-authored-by: Karen Chung <karenc@nvidia.com>
Overview:
After that PR vLLM changed the type of SO params from GuidedDecodingParams to StructuredOutputsParams.
The current handler code for the vLLM worker assigned an attribute based on guided decoding params name, which excluded the possibility of using guided decoding.
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.