Skip to content

feat: Add guided decoding passthrough to vLLM#827

Open
ybgao-nvidia wants to merge 19 commits intomainfrom
ybgao/aug3-guided-decoding
Open

feat: Add guided decoding passthrough to vLLM#827
ybgao-nvidia wants to merge 19 commits intomainfrom
ybgao/aug3-guided-decoding

Conversation

@ybgao-nvidia
Copy link
Contributor

@ybgao-nvidia ybgao-nvidia commented Aug 3, 2025

What does this PR do ?

This PR adds options passthrough to vLLM generation policy to enable guided decoding.

Issues

This PR resolves #603.

Usage

This PR adds a backend agnostic (i.e. does not depend on vLLM should new generation backend is added in the future) guided decoding config class (nemo_rl.models.generation.interfaces.GuidedDecodingConfig).

regex_config = GuidedDecodingConfig(mode="regex", regex=r"\d{3}-\d{3}-\d{4}")
phone_outputs = policy.generate(data, guided_decoding_config=regex_config)

where policy is any subclass of GenerationInterface which includes VllmGeneration.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Summary by CodeRabbit

  • New Features

    • Added optional guided decoding support for text generation, enabling output constraints via regex patterns, JSON schemas, predefined choices, and grammar rules. This feature is backward compatible and disabled by default.
  • Tests

    • Added unit tests validating guided decoding functionality with regex and choice-based constraints.

Signed-off-by: Yubo Gao <yubog@nvidia.com>
Signed-off-by: Yubo Gao <yubog@nvidia.com>
@ybgao-nvidia ybgao-nvidia marked this pull request as ready for review August 4, 2025 18:21
Signed-off-by: Yubo Gao <yubog@nvidia.com>
Signed-off-by: Yubo Gao <yubog@nvidia.com>
wangshangsam
wangshangsam previously approved these changes Aug 7, 2025
Copy link
Contributor

@wangshangsam wangshangsam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits, but overall LGTM!
@SahilJain314 wanna take another look (in case I missed anything)?

Co-authored-by: Shang Wang <samshang.wang@mail.utoronto.ca>
Signed-off-by: Yubo Gao <yubog@nvidia.com>
Co-authored-by: Shang Wang <samshang.wang@mail.utoronto.ca>
Signed-off-by: Yubo Gao <yubog@nvidia.com>
wangshangsam
wangshangsam previously approved these changes Aug 11, 2025
Copy link
Contributor

@SahilJain314 SahilJain314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@SahilJain314
Copy link
Contributor

@parthchadha can you take a quick look as well before merge?

@snowmanwwg
Copy link
Contributor

@ybgao-nvidia I dont need to review code :) you can remove me from the list of reviewers. Thank you!

@ybgao-nvidia ybgao-nvidia added the CI:L0 Run doctests and unit tests label Oct 28, 2025
Signed-off-by: Yubo Gao <yubog@nvidia.com>
@ybgao-nvidia ybgao-nvidia added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Oct 28, 2025
@terrykong terrykong removed the r0.4.0 label Oct 28, 2025
@ybgao-nvidia ybgao-nvidia added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Oct 29, 2025
Signed-off-by: Yubo Gao <yubog@nvidia.com>
@ybgao-nvidia ybgao-nvidia added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Oct 30, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 30, 2025

📝 Walkthrough

Walkthrough

This PR adds guided decoding support to NeMo-RL's vLLM generation pipeline by introducing an optional GuidedDecodingConfig parameter that threads through rollout entry points, generation interfaces, and vLLM workers, enabling structured output modes like regex matching, JSON schema validation, and predefined choice constraints.

Changes

Cohort / File(s) Summary
Rollout Parameter Threading
nemo_rl/experience/rollouts.py
Added guided_decoding_config: Optional[GuidedDecodingConfig] = None parameter to six public methods (generate_responses, generate_responses_async, run_multi_turn_rollout, async_generate_response_for_sample_turn, run_sample_multi_turn_rollout, run_async_multi_turn_rollout) and threaded parameter through internal call chains.
Generation Interface Definitions
nemo_rl/models/generation/interfaces.py
Added new GuidedDecodingConfig TypedDict with fields: mode (str), json (optional), regex (optional), choice (optional), grammar (optional). Extended GenerationConfig with guided_decoding: NotRequired[GuidedDecodingConfig] field. Updated abstract method GenerationInterface.generate() signature to include guided_decoding_config: Optional[GuidedDecodingConfig] parameter.
vLLM Implementation
nemo_rl/models/generation/vllm/vllm_generation.py
Added guided_decoding_config parameter to four public methods (generate, generate_async, generate_text, generate_text_async). Extended _async_generate_base to accept and forward **kwargs. Updated worker invocations to propagate guided decoding configuration through common_kwargs.
vLLM Worker
nemo_rl/models/generation/vllm/vllm_worker.py
Implemented _get_vllm_guided_decoding_params() helper to translate GuidedDecodingConfig into vLLM's GuidedDecodingParams (supports modes: json, regex, choice, grammar, json_object). Updated generate() and generate_text() method signatures to accept guided_decoding_config and integrated conversion logic. Extended _build_sampling_params() to accept and apply guided_decoding_params to SamplingParams.
vLLM Async Worker
nemo_rl/models/generation/vllm/vllm_worker_async.py
Added guided_decoding_config parameter to generate_async() and guided_decoding_params parameter to generate_text_async(). Integrated guided decoding propagation through async per-sample generation paths via _get_vllm_guided_decoding_params() conversion.
Policy Layer
nemo_rl/models/policy/lm_policy.py
Added guided_decoding_config: Optional[GuidedDecodingConfig] = None parameter to generate() method with guard assertion requiring parameter to be None, indicating guided decoding is not supported for this backend.
Unit Test
tests/unit/models/generation/test_vllm_generation.py
Added test_vllm_guided_decoding() test exercising two guided decoding configurations (regex phone-number pattern and predefined-choice mode) and validating output conformance to constraints.

Sequence Diagram

sequenceDiagram
    participant Rollout as Rollout Layer
    participant GenInterface as Generation Interface
    participant VllmGen as VllmGeneration
    participant VllmWorker as VllmWorker
    participant vLLM as vLLM Library

    Rollout->>GenInterface: generate_responses(data, guided_decoding_config)
    GenInterface->>VllmGen: generate(data, guided_decoding_config)
    VllmGen->>VllmWorker: generate(data, guided_decoding_config)
    activate VllmWorker
    VllmWorker->>VllmWorker: _get_vllm_guided_decoding_params(guided_decoding_config)
    VllmWorker->>VllmWorker: _build_sampling_params(..., guided_decoding_params)
    deactivate VllmWorker
    VllmWorker->>vLLM: generate_completion(sampling_params with guided_decoding)
    vLLM-->>VllmWorker: structured output (matches constraints)
    VllmWorker-->>VllmGen: BatchedDataDict
    VllmGen-->>GenInterface: BatchedDataDict
    GenInterface-->>Rollout: BatchedDataDict
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Areas requiring extra attention:

  • _get_vllm_guided_decoding_params() conversion logic in nemo_rl/models/generation/vllm/vllm_worker.py — Verify mode-to-vLLM parameter mapping is complete and handles all supported modes (json, regex, choice, grammar, json_object); ensure ValueError is raised appropriately for unsupported modes.
  • Abstract method contract change in nemo_rl/models/generation/interfaces.pyGenerationInterface.generate() signature now requires guided_decoding_config parameter; verify all subclass implementations are properly updated (check for any implementations outside the main files in this diff).
  • Parameter threading consistency — Trace guided_decoding propagation across async vs. sync paths (generate vs. generate_async, generate_text vs. generate_text_async) to ensure no divergence in parameter passing.
  • Guard assertion in lm_policy.py — Confirm the assertion message and behavior are appropriate for blocking guided decoding on non-vLLM backends.

Possibly related PRs

  • feat: add async RL support #1098 — Async GRPO code calls rollout functions like run_async_multi_turn_rollout() which now accept guided decoding config; may need coordination for integrated testing.
  • chore: use pydantic for yaml test validation #1382 — Also modifies nemo_rl/models/generation/interfaces.py to update GenerationConfig fields; potential merge conflict or cross-feature interaction at the type level.

Suggested labels

CI:L1

Suggested reviewers

  • terrykong
  • parthchadha

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.79% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Test Results For Major Changes ⚠️ Warning This PR introduces a major feature (guided decoding support for vLLM) and includes a comprehensive unit test that exercises regex and choice-based guided decoding modes with assertions about output shapes and constraint adherence. However, the PR description indicates that pre-check checklist items are unchecked, and there is no explicit documentation of test execution results or confirmation that the tests pass. While the test code exists and appears well-designed, the lack of documented test results in the PR description means the requirement for major changes to include test result information has not been satisfied. Additionally, there is an outstanding review comment requesting improved validation for guided decoding configuration fields. Update the PR description to explicitly document that tests have been executed and pass. Include test output or a reference to CI/workflow results demonstrating that test_vllm_guided_decoding passes successfully. Additionally, address the outstanding review comment regarding field validation for guided decoding modes to ensure proper error handling with descriptive ValueError messages instead of bare KeyError exceptions. Mark all pre-check checklist items as complete once these steps are finished.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "feat: Add guided decoding passthrough to vLLM" accurately and specifically describes the primary change in this pull request. It clearly conveys that the feature adds guided decoding support through the vLLM generation pipeline without using vague terms or noise. The title is concise at 7 words and effectively communicates the main objective to someone scanning the project history.
Linked Issues Check ✅ Passed The pull request fully satisfies the requirements from linked issue #603. The implementation adds support for guided decoding parameters (json, regex, choice, grammar modes) through a new GuidedDecodingConfig interface class [interfaces.py], properly threads this configuration through vLLM generation entry points [vllm_generation.py], implements the core translation logic via _get_vllm_guided_decoding_params() helper that passes guided decoding to vLLM's SamplingParams exactly as specified in the issue [vllm_worker.py], extends async paths for completeness [vllm_worker_async.py], and validates the implementation with a new test exercising regex and choice-based guided decoding [test_vllm_generation.py]. All coding requirements for enabling structured output and tool calling through guided decoding have been met.
Out of Scope Changes Check ✅ Passed All changes in this pull request are directly related to implementing guided decoding support for vLLM. The new interface definitions [interfaces.py] and backend-specific implementations [vllm_generation.py, vllm_worker.py, vllm_worker_async.py] are core to the feature. The threading through rollouts [rollouts.py] and consistent interface updates across all backends including the guard assertion in lm_policy.py are supporting changes that align with the PR objective to provide end-to-end guided decoding capability. The test addition validates the implementation. No extraneous changes unrelated to guided decoding support were introduced.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ybgao/aug3-guided-decoding

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
nemo_rl/models/generation/interfaces.py (1)

143-157: Document the new guided_decoding config key.

Per NeMo-RL config guidelines, every new TypedDict key must document its purpose, valid values, and recommended default. GenerationConfig now exposes guided_decoding, but the class docstring still omits it, so downstream users won’t know how to populate it. Please describe the field (e.g., that it accepts a GuidedDecodingConfig and defaults to None) alongside the other keys. Based on learnings

nemo_rl/experience/rollouts.py (1)

599-608: Critical: Missing parameter forwarding breaks guided decoding.

The function accepts guided_decoding_config but doesn't forward it to generate_responses_async, breaking guided decoding for async single-sample rollouts.

Apply this diff:

     updated_batch, generated_ids, gen_metrics = await generate_responses_async(
         policy_generation,
         generation_input_data,
         dummy_batch,
         tokenizer,
         input_lengths=input_lengths,
         include_logprobs=True,
         greedy=greedy,
+        guided_decoding_config=guided_decoding_config,
     )
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd2e645 and 339755d.

📒 Files selected for processing (7)
  • nemo_rl/experience/rollouts.py (17 hunks)
  • nemo_rl/models/generation/interfaces.py (4 hunks)
  • nemo_rl/models/generation/vllm/vllm_generation.py (11 hunks)
  • nemo_rl/models/generation/vllm/vllm_worker.py (9 hunks)
  • nemo_rl/models/generation/vllm/vllm_worker_async.py (6 hunks)
  • nemo_rl/models/policy/lm_policy.py (2 hunks)
  • tests/unit/models/generation/test_vllm_generation.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

  • nemo_rl/models/policy/lm_policy.py
  • tests/unit/models/generation/test_vllm_generation.py
  • nemo_rl/experience/rollouts.py
  • nemo_rl/models/generation/interfaces.py
  • nemo_rl/models/generation/vllm/vllm_worker_async.py
  • nemo_rl/models/generation/vllm/vllm_worker.py
  • nemo_rl/models/generation/vllm/vllm_generation.py
nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

  • nemo_rl/models/policy/lm_policy.py
  • nemo_rl/experience/rollouts.py
  • nemo_rl/models/generation/interfaces.py
  • nemo_rl/models/generation/vllm/vllm_worker_async.py
  • nemo_rl/models/generation/vllm/vllm_worker.py
  • nemo_rl/models/generation/vllm/vllm_generation.py
🧠 Learnings (3)
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults

Applied to files:

  • nemo_rl/models/policy/lm_policy.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code

Applied to files:

  • nemo_rl/models/generation/interfaces.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : Express configuration optionality via TypedDict using typing.NotRequired

Applied to files:

  • nemo_rl/models/generation/interfaces.py
🧬 Code graph analysis (7)
nemo_rl/models/policy/lm_policy.py (2)
nemo_rl/models/generation/interfaces.py (3)
  • GuidedDecodingConfig (118-139)
  • GenerationDatumSpec (159-190)
  • GenerationOutputSpec (193-237)
nemo_rl/distributed/batched_data_dict.py (1)
  • BatchedDataDict (75-860)
tests/unit/models/generation/test_vllm_generation.py (4)
nemo_rl/models/generation/interfaces.py (2)
  • GuidedDecodingConfig (118-139)
  • generate (251-257)
tests/unit/environments/test_retriever.py (2)
  • cluster (97-114)
  • tokenizer (84-93)
nemo_rl/models/generation/vllm/vllm_generation.py (2)
  • generate (428-480)
  • shutdown (775-782)
nemo_rl/models/generation/vllm/vllm_worker.py (2)
  • generate (457-588)
  • shutdown (792-812)
nemo_rl/experience/rollouts.py (1)
nemo_rl/models/generation/interfaces.py (1)
  • GuidedDecodingConfig (118-139)
nemo_rl/models/generation/interfaces.py (1)
nemo_rl/distributed/batched_data_dict.py (1)
  • BatchedDataDict (75-860)
nemo_rl/models/generation/vllm/vllm_worker_async.py (2)
nemo_rl/models/generation/interfaces.py (1)
  • GuidedDecodingConfig (118-139)
nemo_rl/models/generation/vllm/vllm_worker.py (1)
  • _get_vllm_guided_decoding_params (345-368)
nemo_rl/models/generation/vllm/vllm_worker.py (1)
nemo_rl/models/generation/interfaces.py (1)
  • GuidedDecodingConfig (118-139)
nemo_rl/models/generation/vllm/vllm_generation.py (2)
nemo_rl/models/generation/interfaces.py (2)
  • GuidedDecodingConfig (118-139)
  • GenerationDatumSpec (159-190)
nemo_rl/models/generation/vllm/vllm_worker_async.py (1)
  • generate_async (509-732)
🪛 Ruff (0.14.2)
nemo_rl/experience/rollouts.py

554-554: Unused function argument: guided_decoding_config

(ARG001)

nemo_rl/models/generation/vllm/vllm_worker.py

366-368: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Check if PR branch is up to date
  • GitHub Check: Lint check
  • GitHub Check: Check submodule fast-forward / Check submodule fast-forward
  • GitHub Check: Post submodule check comment / Comment on PR
  • GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (11)
nemo_rl/experience/rollouts.py (5)

58-73: LGTM: Parameter correctly threaded through to generation interface.

The guided_decoding_config parameter is properly forwarded to policy_generation.generate().


125-155: LGTM: Parameter correctly threaded through async generation.

The guided_decoding_config parameter is properly forwarded to policy_generation.generate_async().


340-430: LGTM: Docstring updated and parameter correctly forwarded.

The docstring now documents the guided_decoding_config parameter (line 352), and the parameter is correctly forwarded to generate_responses at line 429.


625-688: LGTM: Docstring updated and parameter correctly forwarded.

The docstring documents the guided_decoding_config parameter (line 641), and the parameter is correctly forwarded to async_generate_response_for_sample_turn at line 687.


796-849: LGTM: Docstring updated and parameter correctly forwarded.

The docstring documents the guided_decoding_config parameter (line 811), and the parameter is correctly forwarded to run_sample_multi_turn_rollout at line 848.

nemo_rl/models/generation/vllm/vllm_generation.py (6)

19-41: LGTM: Proper use of TYPE_CHECKING for conditional imports.

The TYPE_CHECKING import pattern correctly avoids runtime dependency on vLLM's GuidedDecodingParams while enabling type hints.


428-457: LGTM: Parameter correctly threaded to workers.

The guided_decoding_config parameter is properly forwarded to worker methods via common_kwargs.


482-514: LGTM: Parameter correctly threaded to workers.

The guided_decoding_params parameter is properly forwarded to worker methods via common_kwargs.


534-578: LGTM: Flexible parameter passing via kwargs.

Using **kwargs in the base method appropriately supports different parameter names (guided_decoding_config vs guided_decoding_params) required by different callers.


664-692: LGTM: Parameter correctly forwarded to base method.

The guided_decoding_params parameter is properly forwarded to _async_generate_base.


694-722: LGTM: Parameter correctly forwarded to base method.

The guided_decoding_config parameter is properly forwarded to _async_generate_base.

Copy link
Collaborator

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment

@parthchadha to review

vllm_config["max_new_tokens"] = 16
vllm_config["vllm_cfg"]["async_engine"] = False
vllm_config = configure_generation_config(vllm_config, tokenizer)
vllm_policy = VllmGeneration(cluster, vllm_config)
Copy link
Collaborator

@terrykong terrykong Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also test that the generation log probs also match our expectations: logprob=0 (1 in the linear domain) for the guided tokens?

Signed-off-by: Yubo Gao <yubog@nvidia.com>
Signed-off-by: root <root@pool0-01584.cm.cluster>
Signed-off-by: Yubo Gao <yubog@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L0 Run doctests and unit tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for guided decoding in vLLM

7 participants