feat: Add guided decoding passthrough to vLLM by ybgao-nvidia · Pull Request #827 · NVIDIA-NeMo/RL

ybgao-nvidia · 2025-08-03T21:20:49Z

What does this PR do ?

This PR adds options passthrough to vLLM generation policy to enable guided decoding.

Issues

This PR resolves #603.

Usage

This PR adds a backend agnostic (i.e. does not depend on vLLM should new generation backend is added in the future) guided decoding config class (nemo_rl.models.generation.interfaces.GuidedDecodingConfig).

regex_config = GuidedDecodingConfig(mode="regex", regex=r"\d{3}-\d{3}-\d{4}")
phone_outputs = policy.generate(data, guided_decoding_config=regex_config)

where policy is any subclass of GenerationInterface which includes VllmGeneration.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Summary by CodeRabbit

New Features
- Added optional guided decoding support for text generation, enabling output constraints via regex patterns, JSON schemas, predefined choices, and grammar rules. This feature is backward compatible and disabled by default.
Tests
- Added unit tests validating guided decoding functionality with regex and choice-based constraints.

Signed-off-by: Yubo Gao <yubog@nvidia.com>

nemo_rl/models/generation/interfaces.py

nemo_rl/models/generation/vllm.py

tests/unit/models/generation/test_vllm_generation.py

Signed-off-by: Yubo Gao <yubog@nvidia.com>

comments addressed

nemo_rl/models/generation/vllm.py

nemo_rl/experience/rollouts.py

Signed-off-by: Yubo Gao <yubog@nvidia.com>

comments addressed

Signed-off-by: Yubo Gao <yubog@nvidia.com>

wangshangsam

A few nits, but overall LGTM!
@SahilJain314 wanna take another look (in case I missed anything)?

nemo_rl/models/generation/vllm.py

Co-authored-by: Shang Wang <samshang.wang@mail.utoronto.ca> Signed-off-by: Yubo Gao <yubog@nvidia.com>

SahilJain314

lgtm

SahilJain314 · 2025-08-11T20:42:44Z

@parthchadha can you take a quick look as well before merge?

nemo_rl/models/generation/interfaces.py

snowmanwwg · 2025-09-15T01:40:25Z

@ybgao-nvidia I dont need to review code :) you can remove me from the list of reviewers. Thank you!

Signed-off-by: Yubo Gao <yubog@nvidia.com>

coderabbitai · 2025-10-30T20:27:42Z

📝 Walkthrough

Walkthrough

This PR adds guided decoding support to NeMo-RL's vLLM generation pipeline by introducing an optional GuidedDecodingConfig parameter that threads through rollout entry points, generation interfaces, and vLLM workers, enabling structured output modes like regex matching, JSON schema validation, and predefined choice constraints.

Changes

Cohort / File(s)	Summary
Rollout Parameter Threading `nemo_rl/experience/rollouts.py`	Added `guided_decoding_config: Optional[GuidedDecodingConfig] = None` parameter to six public methods (`generate_responses`, `generate_responses_async`, `run_multi_turn_rollout`, `async_generate_response_for_sample_turn`, `run_sample_multi_turn_rollout`, `run_async_multi_turn_rollout`) and threaded parameter through internal call chains.
Generation Interface Definitions `nemo_rl/models/generation/interfaces.py`	Added new `GuidedDecodingConfig` TypedDict with fields: `mode` (str), `json` (optional), `regex` (optional), `choice` (optional), `grammar` (optional). Extended `GenerationConfig` with `guided_decoding: NotRequired[GuidedDecodingConfig]` field. Updated abstract method `GenerationInterface.generate()` signature to include `guided_decoding_config: Optional[GuidedDecodingConfig]` parameter.
vLLM Implementation `nemo_rl/models/generation/vllm/vllm_generation.py`	Added `guided_decoding_config` parameter to four public methods (`generate`, `generate_async`, `generate_text`, `generate_text_async`). Extended `_async_generate_base` to accept and forward `**kwargs`. Updated worker invocations to propagate guided decoding configuration through `common_kwargs`.
vLLM Worker `nemo_rl/models/generation/vllm/vllm_worker.py`	Implemented `_get_vllm_guided_decoding_params()` helper to translate `GuidedDecodingConfig` into vLLM's `GuidedDecodingParams` (supports modes: json, regex, choice, grammar, json_object). Updated `generate()` and `generate_text()` method signatures to accept `guided_decoding_config` and integrated conversion logic. Extended `_build_sampling_params()` to accept and apply `guided_decoding_params` to `SamplingParams`.
vLLM Async Worker `nemo_rl/models/generation/vllm/vllm_worker_async.py`	Added `guided_decoding_config` parameter to `generate_async()` and `guided_decoding_params` parameter to `generate_text_async()`. Integrated guided decoding propagation through async per-sample generation paths via `_get_vllm_guided_decoding_params()` conversion.
Policy Layer `nemo_rl/models/policy/lm_policy.py`	Added `guided_decoding_config: Optional[GuidedDecodingConfig] = None` parameter to `generate()` method with guard assertion requiring parameter to be `None`, indicating guided decoding is not supported for this backend.
Unit Test `tests/unit/models/generation/test_vllm_generation.py`	Added `test_vllm_guided_decoding()` test exercising two guided decoding configurations (regex phone-number pattern and predefined-choice mode) and validating output conformance to constraints.

Sequence Diagram

sequenceDiagram
    participant Rollout as Rollout Layer
    participant GenInterface as Generation Interface
    participant VllmGen as VllmGeneration
    participant VllmWorker as VllmWorker
    participant vLLM as vLLM Library

    Rollout->>GenInterface: generate_responses(data, guided_decoding_config)
    GenInterface->>VllmGen: generate(data, guided_decoding_config)
    VllmGen->>VllmWorker: generate(data, guided_decoding_config)
    activate VllmWorker
    VllmWorker->>VllmWorker: _get_vllm_guided_decoding_params(guided_decoding_config)
    VllmWorker->>VllmWorker: _build_sampling_params(..., guided_decoding_params)
    deactivate VllmWorker
    VllmWorker->>vLLM: generate_completion(sampling_params with guided_decoding)
    vLLM-->>VllmWorker: structured output (matches constraints)
    VllmWorker-->>VllmGen: BatchedDataDict
    VllmGen-->>GenInterface: BatchedDataDict
    GenInterface-->>Rollout: BatchedDataDict

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Areas requiring extra attention:

_get_vllm_guided_decoding_params() conversion logic in nemo_rl/models/generation/vllm/vllm_worker.py — Verify mode-to-vLLM parameter mapping is complete and handles all supported modes (json, regex, choice, grammar, json_object); ensure ValueError is raised appropriately for unsupported modes.
Abstract method contract change in nemo_rl/models/generation/interfaces.py — GenerationInterface.generate() signature now requires guided_decoding_config parameter; verify all subclass implementations are properly updated (check for any implementations outside the main files in this diff).
Parameter threading consistency — Trace guided_decoding propagation across async vs. sync paths (generate vs. generate_async, generate_text vs. generate_text_async) to ensure no divergence in parameter passing.
Guard assertion in lm_policy.py — Confirm the assertion message and behavior are appropriate for blocking guided decoding on non-vLLM backends.

Possibly related PRs

feat: add async RL support #1098 — Async GRPO code calls rollout functions like run_async_multi_turn_rollout() which now accept guided decoding config; may need coordination for integrated testing.
chore: use pydantic for yaml test validation #1382 — Also modifies nemo_rl/models/generation/interfaces.py to update GenerationConfig fields; potential merge conflict or cross-feature interaction at the type level.

Suggested labels

CI:L1

Suggested reviewers

terrykong
parthchadha

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 78.79% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Test Results For Major Changes	⚠️ Warning	This PR introduces a major feature (guided decoding support for vLLM) and includes a comprehensive unit test that exercises regex and choice-based guided decoding modes with assertions about output shapes and constraint adherence. However, the PR description indicates that pre-check checklist items are unchecked, and there is no explicit documentation of test execution results or confirmation that the tests pass. While the test code exists and appears well-designed, the lack of documented test results in the PR description means the requirement for major changes to include test result information has not been satisfied. Additionally, there is an outstanding review comment requesting improved validation for guided decoding configuration fields.	Update the PR description to explicitly document that tests have been executed and pass. Include test output or a reference to CI/workflow results demonstrating that `test_vllm_guided_decoding` passes successfully. Additionally, address the outstanding review comment regarding field validation for guided decoding modes to ensure proper error handling with descriptive ValueError messages instead of bare KeyError exceptions. Mark all pre-check checklist items as complete once these steps are finished.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "feat: Add guided decoding passthrough to vLLM" accurately and specifically describes the primary change in this pull request. It clearly conveys that the feature adds guided decoding support through the vLLM generation pipeline without using vague terms or noise. The title is concise at 7 words and effectively communicates the main objective to someone scanning the project history.
Linked Issues Check	✅ Passed	The pull request fully satisfies the requirements from linked issue #603. The implementation adds support for guided decoding parameters (json, regex, choice, grammar modes) through a new `GuidedDecodingConfig` interface class [interfaces.py], properly threads this configuration through vLLM generation entry points [vllm_generation.py], implements the core translation logic via `_get_vllm_guided_decoding_params()` helper that passes guided decoding to vLLM's `SamplingParams` exactly as specified in the issue [vllm_worker.py], extends async paths for completeness [vllm_worker_async.py], and validates the implementation with a new test exercising regex and choice-based guided decoding [test_vllm_generation.py]. All coding requirements for enabling structured output and tool calling through guided decoding have been met.
Out of Scope Changes Check	✅ Passed	All changes in this pull request are directly related to implementing guided decoding support for vLLM. The new interface definitions [interfaces.py] and backend-specific implementations [vllm_generation.py, vllm_worker.py, vllm_worker_async.py] are core to the feature. The threading through rollouts [rollouts.py] and consistent interface updates across all backends including the guard assertion in lm_policy.py are supporting changes that align with the PR objective to provide end-to-end guided decoding capability. The test addition validates the implementation. No extraneous changes unrelated to guided decoding support were introduced.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ybgao/aug3-guided-decoding

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

nemo_rl/models/generation/interfaces.py (1)

143-157: Document the new guided_decoding config key.

Per NeMo-RL config guidelines, every new TypedDict key must document its purpose, valid values, and recommended default. GenerationConfig now exposes guided_decoding, but the class docstring still omits it, so downstream users won’t know how to populate it. Please describe the field (e.g., that it accepts a GuidedDecodingConfig and defaults to None) alongside the other keys. Based on learnings
nemo_rl/experience/rollouts.py (1)
599-608: Critical: Missing parameter forwarding breaks guided decoding.

The function accepts guided_decoding_config but doesn't forward it to generate_responses_async, breaking guided decoding for async single-sample rollouts.

Apply this diff:
     updated_batch, generated_ids, gen_metrics = await generate_responses_async(
         policy_generation,
         generation_input_data,
         dummy_batch,
         tokenizer,
         input_lengths=input_lengths,
         include_logprobs=True,
         greedy=greedy,
+        guided_decoding_config=guided_decoding_config,
     )

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd2e645 and 339755d.

📒 Files selected for processing (7)

nemo_rl/experience/rollouts.py (17 hunks)
nemo_rl/models/generation/interfaces.py (4 hunks)
nemo_rl/models/generation/vllm/vllm_generation.py (11 hunks)
nemo_rl/models/generation/vllm/vllm_worker.py (9 hunks)
nemo_rl/models/generation/vllm/vllm_worker_async.py (6 hunks)
nemo_rl/models/policy/lm_policy.py (2 hunks)
tests/unit/models/generation/test_vllm_generation.py (3 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

nemo_rl/models/policy/lm_policy.py
tests/unit/models/generation/test_vllm_generation.py
nemo_rl/experience/rollouts.py
nemo_rl/models/generation/interfaces.py
nemo_rl/models/generation/vllm/vllm_worker_async.py
nemo_rl/models/generation/vllm/vllm_worker.py
nemo_rl/models/generation/vllm/vllm_generation.py

nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

nemo_rl/models/policy/lm_policy.py
nemo_rl/experience/rollouts.py
nemo_rl/models/generation/interfaces.py
nemo_rl/models/generation/vllm/vllm_worker_async.py
nemo_rl/models/generation/vllm/vllm_worker.py
nemo_rl/models/generation/vllm/vllm_generation.py

🧠 Learnings (3)

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults

Applied to files:

nemo_rl/models/policy/lm_policy.py

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code

Applied to files:

nemo_rl/models/generation/interfaces.py

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : Express configuration optionality via TypedDict using typing.NotRequired

Applied to files:

nemo_rl/models/generation/interfaces.py

🧬 Code graph analysis (7)

nemo_rl/models/policy/lm_policy.py (2)

nemo_rl/models/generation/interfaces.py (3)

GuidedDecodingConfig (118-139)

GenerationDatumSpec (159-190)

GenerationOutputSpec (193-237)

nemo_rl/distributed/batched_data_dict.py (1)

BatchedDataDict (75-860)

tests/unit/models/generation/test_vllm_generation.py (4)

nemo_rl/models/generation/interfaces.py (2)

GuidedDecodingConfig (118-139)

generate (251-257)

tests/unit/environments/test_retriever.py (2)

cluster (97-114)

tokenizer (84-93)

nemo_rl/models/generation/vllm/vllm_generation.py (2)

generate (428-480)

shutdown (775-782)

nemo_rl/models/generation/vllm/vllm_worker.py (2)

generate (457-588)

shutdown (792-812)

nemo_rl/experience/rollouts.py (1)

nemo_rl/models/generation/interfaces.py (1)

GuidedDecodingConfig (118-139)

nemo_rl/models/generation/interfaces.py (1)

nemo_rl/distributed/batched_data_dict.py (1)

BatchedDataDict (75-860)

nemo_rl/models/generation/vllm/vllm_worker_async.py (2)

nemo_rl/models/generation/interfaces.py (1)

GuidedDecodingConfig (118-139)

nemo_rl/models/generation/vllm/vllm_worker.py (1)

_get_vllm_guided_decoding_params (345-368)

nemo_rl/models/generation/vllm/vllm_worker.py (1)

nemo_rl/models/generation/interfaces.py (1)

GuidedDecodingConfig (118-139)

nemo_rl/models/generation/vllm/vllm_generation.py (2)

nemo_rl/models/generation/interfaces.py (2)

GuidedDecodingConfig (118-139)

GenerationDatumSpec (159-190)

nemo_rl/models/generation/vllm/vllm_worker_async.py (1)

generate_async (509-732)

🪛 Ruff (0.14.2)

nemo_rl/experience/rollouts.py

554-554: Unused function argument: guided_decoding_config

(ARG001)

nemo_rl/models/generation/vllm/vllm_worker.py

366-368: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Check if PR branch is up to date
GitHub Check: Lint check
GitHub Check: Check submodule fast-forward / Check submodule fast-forward
GitHub Check: Post submodule check comment / Comment on PR
GitHub Check: Post automodel integration comment / Comment on PR

🔇 Additional comments (11)

nemo_rl/experience/rollouts.py (5)

58-73: LGTM: Parameter correctly threaded through to generation interface.

The guided_decoding_config parameter is properly forwarded to policy_generation.generate().

125-155: LGTM: Parameter correctly threaded through async generation.

The guided_decoding_config parameter is properly forwarded to policy_generation.generate_async().

340-430: LGTM: Docstring updated and parameter correctly forwarded.

The docstring now documents the guided_decoding_config parameter (line 352), and the parameter is correctly forwarded to generate_responses at line 429.

625-688: LGTM: Docstring updated and parameter correctly forwarded.

The docstring documents the guided_decoding_config parameter (line 641), and the parameter is correctly forwarded to async_generate_response_for_sample_turn at line 687.

796-849: LGTM: Docstring updated and parameter correctly forwarded.

The docstring documents the guided_decoding_config parameter (line 811), and the parameter is correctly forwarded to run_sample_multi_turn_rollout at line 848.

nemo_rl/models/generation/vllm/vllm_generation.py (6)

19-41: LGTM: Proper use of TYPE_CHECKING for conditional imports.

The TYPE_CHECKING import pattern correctly avoids runtime dependency on vLLM's GuidedDecodingParams while enabling type hints.

428-457: LGTM: Parameter correctly threaded to workers.

The guided_decoding_config parameter is properly forwarded to worker methods via common_kwargs.

482-514: LGTM: Parameter correctly threaded to workers.

The guided_decoding_params parameter is properly forwarded to worker methods via common_kwargs.

534-578: LGTM: Flexible parameter passing via kwargs.

Using **kwargs in the base method appropriately supports different parameter names (guided_decoding_config vs guided_decoding_params) required by different callers.

664-692: LGTM: Parameter correctly forwarded to base method.

The guided_decoding_params parameter is properly forwarded to _async_generate_base.

694-722: LGTM: Parameter correctly forwarded to base method.

The guided_decoding_config parameter is properly forwarded to _async_generate_base.

nemo_rl/models/generation/vllm/vllm_worker.py

terrykong

small comment

@parthchadha to review

terrykong · 2025-10-30T22:25:59Z

tests/unit/models/generation/test_vllm_generation.py

+        vllm_config["max_new_tokens"] = 16
+        vllm_config["vllm_cfg"]["async_engine"] = False
+        vllm_config = configure_generation_config(vllm_config, tokenizer)
+        vllm_policy = VllmGeneration(cluster, vllm_config)


should we also test that the generation log probs also match our expectations: logprob=0 (1 in the linear domain) for the guided tokens?

Signed-off-by: Yubo Gao <yubog@nvidia.com>

Signed-off-by: root <root@pool0-01584.cm.cluster>

Signed-off-by: Yubo Gao <yubog@nvidia.com>

ybgao-nvidia added 2 commits August 3, 2025 14:14

Add guided decoding

60e7af7

Signed-off-by: Yubo Gao <yubog@nvidia.com>

make linter happy

ac66a8e

Signed-off-by: Yubo Gao <yubog@nvidia.com>

ybgao-nvidia marked this pull request as ready for review August 4, 2025 18:21

ybgao-nvidia requested review from parthchadha, snowmanwwg and wangshangsam August 4, 2025 18:21

wangshangsam previously requested changes Aug 5, 2025

View reviewed changes

SahilJain314 reviewed Aug 5, 2025

View reviewed changes

tests/unit/models/generation/test_vllm_generation.py Outdated Show resolved Hide resolved

address comments

01ba0d4

Signed-off-by: Yubo Gao <yubog@nvidia.com>

ybgao-nvidia requested review from SahilJain314 and wangshangsam August 6, 2025 19:02

wangshangsam previously requested changes Aug 7, 2025

View reviewed changes

resolve comments

ebc55dc

Signed-off-by: Yubo Gao <yubog@nvidia.com>

ybgao-nvidia requested a review from wangshangsam August 7, 2025 18:05

ybgao-nvidia added 2 commits August 7, 2025 12:59

eliminate ruff error

914fdc9

Signed-off-by: Yubo Gao <yubog@nvidia.com>

Merge branch 'main' into ybgao/aug3-guided-decoding

8e40e93

wangshangsam previously approved these changes Aug 7, 2025

View reviewed changes

nemo_rl/models/generation/vllm.py Outdated Show resolved Hide resolved

nemo_rl/models/generation/vllm.py Outdated Show resolved Hide resolved

Update nemo_rl/models/generation/vllm.py

09f41ff

Co-authored-by: Shang Wang <samshang.wang@mail.utoronto.ca> Signed-off-by: Yubo Gao <yubog@nvidia.com>

ybgao-nvidia dismissed wangshangsam’s stale review via 09f41ff August 7, 2025 21:02

Update nemo_rl/models/generation/vllm.py

76edc4e

Co-authored-by: Shang Wang <samshang.wang@mail.utoronto.ca> Signed-off-by: Yubo Gao <yubog@nvidia.com>

ybgao-nvidia requested a review from wangshangsam August 7, 2025 21:17

Merge branch 'main' into ybgao/aug3-guided-decoding

e76b097

wangshangsam previously approved these changes Aug 11, 2025

View reviewed changes

SahilJain314 approved these changes Aug 11, 2025

View reviewed changes

parthchadha reviewed Aug 13, 2025

View reviewed changes

nemo_rl/models/generation/interfaces.py Show resolved Hide resolved

wangshangsam assigned ybgao-nvidia Aug 20, 2025

ybgao-nvidia added the CI:L0 Run doctests and unit tests label Oct 28, 2025

ybgao-nvidia had a problem deploying to nemo-ci October 28, 2025 20:26 — with GitHub Actions Error

lint

fca98de

Signed-off-by: Yubo Gao <yubog@nvidia.com>

ybgao-nvidia added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Oct 28, 2025

ybgao-nvidia temporarily deployed to nemo-ci October 28, 2025 20:41 — with GitHub Actions Inactive

ybgao-nvidia temporarily deployed to nemo-ci October 28, 2025 20:55 — with GitHub Actions Inactive

terrykong removed the r0.4.0 label Oct 28, 2025

ybgao-nvidia requested review from SahilJain314, euronymous-aithal, parthchadha, terrykong and wangshangsam October 29, 2025 07:20

Merge branch 'main' into ybgao/aug3-guided-decoding

64a2b3d

ybgao-nvidia added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Oct 29, 2025

ybgao-nvidia temporarily deployed to nemo-ci October 29, 2025 19:52 — with GitHub Actions Inactive

ybgao-nvidia temporarily deployed to nemo-ci October 29, 2025 19:53 — with GitHub Actions Inactive

Merge branch 'main' into ybgao/aug3-guided-decoding

ef5f829

Signed-off-by: Yubo Gao <yubog@nvidia.com>

ybgao-nvidia added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Oct 30, 2025

ybgao-nvidia temporarily deployed to nemo-ci October 30, 2025 04:48 — with GitHub Actions Inactive

ybgao-nvidia temporarily deployed to nemo-ci October 30, 2025 06:01 — with GitHub Actions Inactive

Merge branch 'main' into ybgao/aug3-guided-decoding

339755d

coderabbitai bot reviewed Oct 30, 2025

View reviewed changes

nemo_rl/models/generation/vllm/vllm_worker.py Show resolved Hide resolved

terrykong reviewed Oct 30, 2025

View reviewed changes

ybgao-nvidia added 3 commits November 24, 2025 16:04

Merge branch 'main' into ybgao/aug3-guided-decoding

5e0eaa5

Signed-off-by: Yubo Gao <yubog@nvidia.com>

ensure constrained tokens have logprob 0

55428ac

Signed-off-by: root <root@pool0-01584.cm.cluster>

lint

400524e

Signed-off-by: Yubo Gao <yubog@nvidia.com>

Conversation

ybgao-nvidia commented Aug 3, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Summary by CodeRabbit

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wangshangsam left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

SahilJain314 left a comment

Choose a reason for hiding this comment

Uh oh!

SahilJain314 commented Aug 11, 2025

Uh oh!

Uh oh!

snowmanwwg commented Sep 15, 2025

Uh oh!

coderabbitai bot commented Oct 30, 2025

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

terrykong left a comment

Choose a reason for hiding this comment

Uh oh!

terrykong Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ybgao-nvidia commented Aug 3, 2025 •

edited by coderabbitai bot

Loading

terrykong Oct 30, 2025 •

edited

Loading