Skip to content

cp: use pydantic for yaml test validation (#1382) into r0.4.0#1459

Merged
terrykong merged 2 commits intor0.4.0from
chtruong/cp-1382-r0.4.0
Nov 2, 2025
Merged

cp: use pydantic for yaml test validation (#1382) into r0.4.0#1459
terrykong merged 2 commits intor0.4.0from
chtruong/cp-1382-r0.4.0

Conversation

@chtruong814
Copy link
Contributor

@chtruong814 chtruong814 commented Oct 31, 2025

What does this PR do ?

cp: use pydantic for yaml test validation (#1382) into r0.4.0

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

Release Notes

  • Configuration Updates

    • Enhanced configuration validation with improved error messaging for invalid configurations.
    • Updated generation and model configurations to support optional parameters and stricter type validation.
    • Refined recipe configurations for various training and evaluation scenarios.
  • Evaluation Support

    • Expanded evaluation configuration options for multiple benchmark datasets and custom math evaluation.
  • Internal Improvements

    • Optimized padding token handling across model generation pipeline.
    • Refined distributed training configuration options.
    • Updated tooling for configuration management and validation.

@chtruong814 chtruong814 requested review from a team as code owners October 31, 2025 23:12
@github-actions github-actions bot added documentation Improvements or additions to documentation CI Relating to CI labels Oct 31, 2025
@terrykong terrykong force-pushed the chtruong/cp-1382-r0.4.0 branch from ceb25fc to b7ef96f Compare November 1, 2025 06:22
@github-actions
Copy link

github-actions bot commented Nov 1, 2025

ℹ️ File Consistency Check

Check based on commit: b7ef96f (PR #1459 from chtruong/cp-1382-r0.4.0)

✅ DTensor Policy Worker Synchronization Check

Both DTensor policy worker files were modified in this PR:

  • nemo_rl/models/policy/dtensor_policy_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py

Please ensure that the changes are consistent between both files where applicable.


This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@github-actions
Copy link

github-actions bot commented Nov 1, 2025

✅ Submodule Fast-Forward Check Results

Check based on commit: b7ef96f (PR #1459 from chtruong/cp-1382-r0.4.0)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of r0.4.0 branch (fast-forward)

All submodule changes look good! ✨

@terrykong terrykong force-pushed the chtruong/cp-1382-r0.4.0 branch from b7ef96f to 0a2fde0 Compare November 1, 2025 06:26
@github-actions github-actions bot removed the CI Relating to CI label Nov 1, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 1, 2025

📝 Walkthrough

Walkthrough

This PR refactors configuration schemas across the codebase, introducing Disabled variants for policy configs, expanding eval dataset types, updating generation code to use internal _pad_token_id keys, converting config validation to Pydantic, and adding distillation support to CLI tooling and pre-commit hooks.

Changes

Cohort / File(s) Summary
Pre-commit and CLI tooling
.pre-commit-config.yaml, tools/config_cli.py
Added distillation minimize-check entries to pre-commit hooks; wired distillation algorithm to use new base config path in CLI workflow.
Distillation configurations
examples/configs/distillation_math.yaml, examples/configs/distillation_math_megatron.yaml
Changed defer_fp32_logits from null to False in Megatron config.
Base algorithm configs
examples/configs/{dpo.yaml, grpo_math_1B.yaml, sft.yaml, sft_openmathinstruct2_megatron.yaml, vlm_grpo_3B.yaml, vlm_grpo_3B_megatron.yaml}
Added defer_fp32_logits, env_vars, use_custom_fsdp, and hf_config_overrides fields across Megatron and DDP configurations.
GRPO and distillation recipes
examples/configs/recipes/llm/{distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml, grpo-llama3\\*.yaml, grpo-qwen2.5-*.yaml, grpo-llama3.2-1b-*.yaml}
Removed stop_token_ids configuration from generation settings and save_period from checkpointing.
Data and environment typing
nemo_rl/data/__init__.py, nemo_rl/environments/math_environment.py
Introduced new EvalDataConfig TypedDict variants (MMLU, MMLUPro, AIME, GPQA, Math, LocalMath); removed MathDataConfig; updated optional field typing for prompt_file, system_prompt_file, stop_strings, verifier_type.
Eval configuration
nemo_rl/evals/eval.py
Updated imports and TypedDict signatures to use EvalDataConfigType; introduced _PassThroughMathConfig wrapper for backward compatibility.
Generation interfaces and typing
nemo_rl/models/generation/interfaces.py, docs/design-docs/generation.md
Introduced OptionalResourcesConfig; updated GenerationConfig to allow None for top_k, stop_token_ids, stop_strings, model_name; added internal _pad_token_id field.
Generation initialization
nemo_rl/models/generation/__init__.py
Added warning guard for _pad_token_id override before setting internal padding key.
vLLM generation implementations
nemo_rl/models/generation/vllm/{vllm_generation.py, vllm_worker.py, vllm_worker_async.py}
Migrated from pad_token_id to _pad_token_id; made model_name required; switched top_k from safe-get to direct access.
Policy configuration types
nemo_rl/models/policy/__init__.py
Introduced Disabled variants (DTensorConfigDisabled, SequencePackingConfigDisabled, MegatronConfigDisabled, DynamicBatchingConfigDisabled); updated enabled flags to Literal[True]; added defer_fp32_logits, sequence_length_round; broadened optional field typing; made optimizer/scheduler required in MegatronConfig.
Policy implementations
nemo_rl/models/policy/{lm_policy.py, megatron_policy_worker.py}
Updated pad_value_dict to use _pad_token_id; added assertion requiring defer_fp32_logits: True when logprob_chunk_size is set; broadened model_save_format guard logic.
Rollouts and loss functions
nemo_rl/experience/rollouts.py, nemo_rl/algorithms/loss_functions.py
Switched padding token source from policy config to tokenizer; updated ratio_clip_c type to `float
Checkpoint utilities
nemo_rl/utils/checkpoint.py
Updated CheckpointingConfig.model_save_format type to `str
Test updates
tests/unit/models/generation/{test_vllm_generation.py, test_vllm_large_model.py}, tests/unit/test_config_validation.py, tests/unit/test_recipes_and_test_suites.py
Migrated pad_token_id to _pad_token_id in async tests; refactored config validation to use Pydantic TypeAdapter; introduced dynamic MasterConfig variant selection; removed recipe merge test.
Project configuration
pyproject.toml
Renamed Ruff section from [tool.ruff.per-file-ignores] to [tool.ruff.lint.per-file-ignores].

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Policy Config
    participant Generation Init
    participant vLLM Worker
    participant Output Processor

    User->>Policy Config: set generation.top_k (now int|None)
    Policy Config->>Generation Init: load config
    rect rgb(200, 220, 255)
        Note over Generation Init: Check if _pad_token_id exists<br/>Warn if override needed<br/>Set internal _pad_token_id
    end
    Generation Init->>vLLM Worker: pass config with _pad_token_id
    rect rgb(220, 200, 255)
        Note over vLLM Worker: Use _pad_token_id<br/>(not public pad_token_id)
    end
    vLLM Worker->>Output Processor: batch results with _pad_token_id
    Output Processor->>User: padded sequences
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Policy configuration refactoring (nemo_rl/models/policy/__init__.py): extensive TypedDict additions with Disabled variants and Literal-based enabled flags require careful verification of all field transitions and default behaviors.
  • Pad token key migration (vLLM generation stack, tests, policy): verify all references to pad_token_id vs _pad_token_id are consistent across padding, batching, and output handling code paths.
  • Config validation refactoring (tests/unit/test_config_validation.py): complete rewrite to Pydantic-based approach with dynamic MasterConfig variant selection; test file discovery and mapping logic require thorough review.
  • Data config typing changes (nemo_rl/data/__init__.py): removed MathDataConfig, introduced 6 new eval config TypedDicts with Literal constraints; verify all downstream code handles new EvalDataConfigType union correctly.
  • Megatron config assertions (nemo_rl/models/policy/megatron_policy_worker.py): new dependency requirement between defer_fp32_logits and logprob_chunk_size should be validated for backward compatibility impact.

Possibly related PRs

Suggested labels

cherry-pick, Run CICD, r0.4.0, CI:L1

Suggested reviewers

  • yuki-97
  • parthchadha

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.47% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Test Results For Major Changes ⚠️ Warning This PR contains major changes, including significant refactoring of the YAML config validation infrastructure to use Pydantic, extensive type signature updates across multiple modules, introduction of new disabled configuration variants, and removal of existing test functions. These changes represent breaking changes and substantial refactoring. However, the PR description does not include test results, testing information, or any demonstration that regressions have been avoided. The PR objectives note that the PR body contains mostly placeholder content with an unchecked testing checklist item. For a cherry-pick of a substantial feature like moving to Pydantic-based validation, test documentation is essential to verify the changes work correctly and don't introduce regressions. The PR description must be updated to include test results demonstrating that the Pydantic validation migration works correctly and that existing configs continue to validate properly. Additionally, if this refactoring affects performance or convergence, before-and-after metrics should be provided with configuration context. The unchecked testing checklist item should be addressed by providing evidence of test execution and results.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "cp: use pydantic for yaml test validation (#1382) into r0.4.0" accurately references a significant and real change present in the changeset. The raw_summary confirms that tests/unit/test_config_validation.py was substantially refactored to replace custom TypedDict validation with Pydantic-based validation using TypeAdapter, which directly aligns with the title's focus. However, the changeset extends well beyond test validation improvements and includes numerous other substantial modifications across the codebase, including type system updates, configuration changes, generation pipeline modifications, padding token handling, and policy configuration restructuring. The title captures one important aspect of the PR but does not represent the full scope of changes.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chtruong/cp-1382-r0.4.0

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong force-pushed the chtruong/cp-1382-r0.4.0 branch from 0a2fde0 to 7c7f221 Compare November 1, 2025 22:23
@terrykong terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Nov 1, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
nemo_rl/models/generation/__init__.py (1)

30-36: Add stacklevel to warning for better debugging.

The guarded pattern and warning are good, but the warning should include stacklevel=2 to point to the caller's line rather than the warning line itself.

Apply this diff:

     if "_pad_token_id" in config:
         warnings.warn(
             "'_pad_token_id' found in generation config and will be overridden with tokenizer.pad_token_id. "
             "Note: '_pad_token_id' is intended for internal use and has no effect when set in user-provided configs.",
             UserWarning,
+            stacklevel=2,
         )
     config["_pad_token_id"] = tokenizer.pad_token_id
tests/unit/test_config_validation.py (1)

113-128: Drop the unused config_type.

config_type is assigned in every branch but never read, so it just trips lint (F841). Please remove the variable (and the assignments) to keep the test file clean.

-    master_config_class = None
-    config_type = None
+    master_config_class = None
@@
-        master_config_class = EvalMasterConfig
-        config_type = "eval"
+        master_config_class = EvalMasterConfig
@@
-        master_config_class = DistillationMasterConfig
-        config_type = "distillation"
+        master_config_class = DistillationMasterConfig
@@
-        master_config_class = DPOMasterConfig
-        config_type = "dpo"
+        master_config_class = DPOMasterConfig
@@
-        master_config_class = SFTMasterConfig
-        config_type = "sft"
+        master_config_class = SFTMasterConfig
@@
-        master_config_class = GRPOMasterConfig
-        config_type = "grpo"
+        master_config_class = GRPOMasterConfig
@@
-        master_config_class = RMMasterConfig
-        config_type = "rm"
+        master_config_class = RMMasterConfig
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e941b4e and 7c7f221.

📒 Files selected for processing (42)
  • .pre-commit-config.yaml (1 hunks)
  • docs/design-docs/generation.md (1 hunks)
  • examples/configs/distillation_math.yaml (1 hunks)
  • examples/configs/distillation_math_megatron.yaml (1 hunks)
  • examples/configs/dpo.yaml (2 hunks)
  • examples/configs/grpo_math_1B.yaml (2 hunks)
  • examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-e2e.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v3.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-2n8g-fsdp2tp1-noncolocated.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-fsdp2tp1.v3.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt-long.v3.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt.v3.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-fsdp2tp4sp.v3.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-megatron.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1.v3.yaml (0 hunks)
  • examples/configs/sft.yaml (3 hunks)
  • examples/configs/sft_openmathinstruct2_megatron.yaml (1 hunks)
  • examples/configs/vlm_grpo_3B.yaml (1 hunks)
  • examples/configs/vlm_grpo_3B_megatron.yaml (1 hunks)
  • nemo_rl/algorithms/loss_functions.py (1 hunks)
  • nemo_rl/data/__init__.py (2 hunks)
  • nemo_rl/environments/math_environment.py (2 hunks)
  • nemo_rl/evals/eval.py (2 hunks)
  • nemo_rl/experience/rollouts.py (1 hunks)
  • nemo_rl/models/generation/__init__.py (2 hunks)
  • nemo_rl/models/generation/interfaces.py (2 hunks)
  • nemo_rl/models/generation/vllm/vllm_generation.py (4 hunks)
  • nemo_rl/models/generation/vllm/vllm_worker.py (2 hunks)
  • nemo_rl/models/generation/vllm/vllm_worker_async.py (2 hunks)
  • nemo_rl/models/policy/__init__.py (5 hunks)
  • nemo_rl/models/policy/lm_policy.py (2 hunks)
  • nemo_rl/models/policy/megatron_policy_worker.py (3 hunks)
  • nemo_rl/utils/checkpoint.py (2 hunks)
  • pyproject.toml (1 hunks)
  • tests/unit/models/generation/test_vllm_generation.py (1 hunks)
  • tests/unit/models/generation/test_vllm_large_model.py (1 hunks)
  • tests/unit/test_config_validation.py (1 hunks)
  • tests/unit/test_recipes_and_test_suites.py (1 hunks)
  • tools/config_cli.py (1 hunks)
💤 Files with no reviewable changes (12)
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-2n8g-fsdp2tp1-noncolocated.yaml
  • examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-megatron.yaml
  • examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron.yaml
  • examples/configs/recipes/llm/grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1.v3.yaml
  • examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt.v3.yaml
  • examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml
  • examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-fsdp2tp1.v3.yaml
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-e2e.yaml
  • examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt-long.v3.yaml
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v3.yaml
  • examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-fsdp2tp4sp.v3.yaml
🧰 Additional context used
📓 Path-based instructions (4)
docs/**/*.md

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

When a markdown doc under docs/**/*.md is added or renamed, update docs/index.md to include it in the appropriate section

Files:

  • docs/design-docs/generation.md
examples/configs/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

examples/configs/*.yaml: Exemplar configs under examples/configs/.yaml must include documented defaults
When adding a new config key, reflect its recommended default in exemplar YAMLs under examples/configs/
.yaml

Files:

  • examples/configs/grpo_math_1B.yaml
  • examples/configs/vlm_grpo_3B.yaml
  • examples/configs/distillation_math.yaml
  • examples/configs/sft_openmathinstruct2_megatron.yaml
  • examples/configs/distillation_math_megatron.yaml
  • examples/configs/vlm_grpo_3B_megatron.yaml
  • examples/configs/dpo.yaml
  • examples/configs/sft.yaml
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

  • nemo_rl/algorithms/loss_functions.py
  • nemo_rl/experience/rollouts.py
  • nemo_rl/models/policy/lm_policy.py
  • tests/unit/models/generation/test_vllm_generation.py
  • nemo_rl/models/generation/__init__.py
  • tests/unit/test_config_validation.py
  • tools/config_cli.py
  • nemo_rl/models/generation/vllm/vllm_worker.py
  • nemo_rl/models/policy/megatron_policy_worker.py
  • nemo_rl/utils/checkpoint.py
  • nemo_rl/environments/math_environment.py
  • tests/unit/models/generation/test_vllm_large_model.py
  • nemo_rl/models/generation/vllm/vllm_worker_async.py
  • nemo_rl/data/__init__.py
  • nemo_rl/models/generation/vllm/vllm_generation.py
  • nemo_rl/models/generation/interfaces.py
  • nemo_rl/evals/eval.py
  • tests/unit/test_recipes_and_test_suites.py
  • nemo_rl/models/policy/__init__.py
nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

  • nemo_rl/algorithms/loss_functions.py
  • nemo_rl/experience/rollouts.py
  • nemo_rl/models/policy/lm_policy.py
  • nemo_rl/models/generation/__init__.py
  • nemo_rl/models/generation/vllm/vllm_worker.py
  • nemo_rl/models/policy/megatron_policy_worker.py
  • nemo_rl/utils/checkpoint.py
  • nemo_rl/environments/math_environment.py
  • nemo_rl/models/generation/vllm/vllm_worker_async.py
  • nemo_rl/data/__init__.py
  • nemo_rl/models/generation/vllm/vllm_generation.py
  • nemo_rl/models/generation/interfaces.py
  • nemo_rl/evals/eval.py
  • nemo_rl/models/policy/__init__.py
🧠 Learnings (15)
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/**/*.{yaml,sh} : Known exception: Deepscaler recipes may encode context length in place of the cluster tuple (e.g., grpo-deepscaler-1.5b-8K.*); allowed but document intended hardware in the script

Applied to files:

  • .pre-commit-config.yaml
  • tools/config_cli.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/vlm/*.yaml : VLM recipe YAML filenames must follow: vlm_<algo>-<model>-<nodes>n<gpus>g-<strategy>[-modifiers][.vN].yaml

Applied to files:

  • .pre-commit-config.yaml
  • tools/config_cli.py
  • tests/unit/test_recipes_and_test_suites.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/**/*.yaml : When adding support for a new model, add a recipe YAML under examples/configs/recipes/ in the appropriate domain (llm/ or vlm/) with the correct name

Applied to files:

  • .pre-commit-config.yaml
  • tools/config_cli.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/llm/*.yaml : LLM recipe YAML filenames must follow: <algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>[-modifiers][-long][.vN].yaml

Applied to files:

  • .pre-commit-config.yaml
  • tools/config_cli.py
  • tests/unit/test_recipes_and_test_suites.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/**/*.yaml : Recipe YAMLs under examples/configs/recipes/** are runnable snapshots and may omit documentation

Applied to files:

  • .pre-commit-config.yaml
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/vlm/*.sh : VLM driver script filenames must mirror the YAML base name and follow the same pattern with .sh extension

Applied to files:

  • .pre-commit-config.yaml
📚 Learning: 2025-10-30T20:50:44.126Z
Learnt from: adil-a
Repo: NVIDIA-NeMo/RL PR: 1440
File: examples/configs/sft_automodel.yaml:48-58
Timestamp: 2025-10-30T20:50:44.126Z
Learning: In DTensor configurations for MoE (Mixture of Experts) models, expert_parallel_size and data_parallel_size can be applied together without multiplying the GPU requirements. Expert Parallelism (EP) only applies to MoE layers, while Data Parallelism/FSDP applies to non-MoE layers. Therefore, configurations like expert_parallel_size: 8 and data_parallel_size: 8 are valid on an 8-GPU cluster for MoE models.

Applied to files:

  • examples/configs/sft_openmathinstruct2_megatron.yaml
📚 Learning: 2025-09-10T05:29:34.349Z
Learnt from: bxyu-nvidia
Repo: NVIDIA-NeMo/RL PR: 1110
File: nemo_rl/models/generation/vllm/vllm_worker_async.py:98-105
Timestamp: 2025-09-10T05:29:34.349Z
Learning: In the _maybe_correct_merged_tokens function in nemo_rl/models/generation/vllm/vllm_worker_async.py, the loop condition `len(candidate_token_ids) < len(actual_token_ids) - 1` is intentionally designed to prevent accessing the final token in actual_token_ids, likely to handle specific tokenization edge cases in the vLLM HTTP server integration.

Applied to files:

  • tests/unit/models/generation/test_vllm_generation.py
  • nemo_rl/models/generation/vllm/vllm_worker.py
  • tests/unit/models/generation/test_vllm_large_model.py
  • nemo_rl/models/generation/vllm/vllm_worker_async.py
📚 Learning: 2025-09-10T05:34:35.406Z
Learnt from: bxyu-nvidia
Repo: NVIDIA-NeMo/RL PR: 1110
File: nemo_rl/models/generation/vllm/vllm_worker_async.py:346-359
Timestamp: 2025-09-10T05:34:35.406Z
Learning: In nemo_rl/models/generation/vllm/vllm_worker_async.py, the HTTP server intentionally uses different path structures: `/v1/chat/completions` is under the `/v1` prefix while `/tokenize` is at the root level without the `/v1` prefix. This is the intended design.

Applied to files:

  • tests/unit/models/generation/test_vllm_generation.py
  • nemo_rl/models/generation/vllm/vllm_worker.py
  • nemo_rl/models/generation/vllm/vllm_worker_async.py
📚 Learning: 2025-09-19T03:00:58.662Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:85-101
Timestamp: 2025-09-19T03:00:58.662Z
Learning: In distillation and GRPO configurations, max_new_tokens is intentionally set to the full context window (max_total_sequence_length) for consistency across the codebase. Overflow cases when prompt + generation tokens exceed max_model_len are handled by safeguards implemented in vllm_worker.py.

Applied to files:

  • tests/unit/models/generation/test_vllm_generation.py
  • nemo_rl/models/generation/vllm/vllm_worker.py
  • tests/unit/models/generation/test_vllm_large_model.py
  • nemo_rl/models/generation/vllm/vllm_worker_async.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code

Applied to files:

  • tests/unit/test_config_validation.py
  • nemo_rl/utils/checkpoint.py
  • nemo_rl/data/__init__.py
  • nemo_rl/models/generation/interfaces.py
  • nemo_rl/evals/eval.py
  • nemo_rl/models/policy/__init__.py
📚 Learning: 2025-09-18T14:57:31.003Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1006
File: nemo_rl/algorithms/distillation.py:312-354
Timestamp: 2025-09-18T14:57:31.003Z
Learning: The distillation algorithm's cluster setup logic is designed to follow the same patterns used in GRPO for handling distributed training clusters and resource allocation.

Applied to files:

  • tools/config_cli.py
📚 Learning: 2025-09-17T01:52:21.399Z
Learnt from: ffrujeri
Repo: NVIDIA-NeMo/RL PR: 1023
File: nemo_rl/utils/checkpoint.py:58-65
Timestamp: 2025-09-17T01:52:21.399Z
Learning: model_state_dict_keys is not intended to be part of the nemo-rl CheckpointingConfig TypedDict - it's handled at the automodel implementation layer, not as a general checkpointing configuration parameter.

Applied to files:

  • nemo_rl/utils/checkpoint.py
  • nemo_rl/models/policy/__init__.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : Express configuration optionality via TypedDict using typing.NotRequired

Applied to files:

  • nemo_rl/environments/math_environment.py
  • nemo_rl/data/__init__.py
  • nemo_rl/models/generation/interfaces.py
  • nemo_rl/models/policy/__init__.py
📚 Learning: 2025-09-19T02:44:38.451Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:73-84
Timestamp: 2025-09-19T02:44:38.451Z
Learning: The scheduler configuration format with a separate "milestones: [20]" entry (not wrapped under name/kwargs) is a valid and established pattern used across GRPO, DPO, and distillation configs in the NeMo RL codebase. This format specifies transition points between different schedulers (e.g., LinearLR for warmup steps, then ConstantLR).

Applied to files:

  • nemo_rl/models/policy/__init__.py
🧬 Code graph analysis (10)
nemo_rl/experience/rollouts.py (2)
tests/unit/environments/test_code_environment.py (1)
  • tokenizer (85-94)
tests/unit/environments/test_retriever.py (1)
  • tokenizer (84-93)
tests/unit/models/generation/test_vllm_generation.py (2)
tests/unit/environments/test_code_environment.py (1)
  • tokenizer (85-94)
tests/unit/environments/test_retriever.py (1)
  • tokenizer (84-93)
nemo_rl/models/generation/__init__.py (2)
tests/unit/models/generation/test_vllm_generation.py (1)
  • tokenizer (238-241)
tests/unit/models/generation/test_vllm_large_model.py (1)
  • tokenizer (82-85)
tests/unit/test_config_validation.py (4)
nemo_rl/evals/eval.py (1)
  • MasterConfig (57-63)
nemo_rl/algorithms/distillation.py (1)
  • MasterConfig (110-121)
nemo_rl/algorithms/grpo.py (1)
  • MasterConfig (161-169)
tools/config_cli.py (1)
  • load_config_with_inheritance (100-141)
nemo_rl/models/generation/vllm/vllm_worker.py (1)
nemo_rl/models/generation/interfaces.py (1)
  • verify_right_padding (23-99)
tests/unit/models/generation/test_vllm_large_model.py (2)
tests/unit/environments/test_code_environment.py (1)
  • tokenizer (85-94)
tests/unit/environments/test_retriever.py (1)
  • tokenizer (84-93)
nemo_rl/models/generation/vllm/vllm_worker_async.py (1)
nemo_rl/models/generation/interfaces.py (1)
  • verify_right_padding (23-99)
nemo_rl/data/__init__.py (1)
tests/unit/data/test_data_processor.py (1)
  • system_prompt_file (191-195)
nemo_rl/evals/eval.py (3)
nemo_rl/environments/math_environment.py (1)
  • MathEnvConfig (42-46)
nemo_rl/models/generation/interfaces.py (1)
  • GenerationConfig (118-131)
nemo_rl/models/policy/__init__.py (1)
  • TokenizerConfig (129-133)
nemo_rl/models/policy/__init__.py (2)
nemo_rl/models/generation/interfaces.py (1)
  • GenerationConfig (118-131)
nemo_rl/models/policy/megatron_policy_worker.py (1)
  • freeze_moe_router (251-263)
🪛 Ruff (0.14.2)
nemo_rl/models/generation/__init__.py

31-31: No explicit stacklevel keyword argument found

Set stacklevel=2

(B028)

tests/unit/test_config_validation.py

49-49: Avoid specifying long messages outside the exception class

(TRY003)


104-104: Avoid specifying long messages outside the exception class

(TRY003)


127-127: Local variable config_type is assigned to but never used

Remove assignment to unused variable config_type

(F841)


129-131: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: build-container / main
  • GitHub Check: sphinx-build / Build docs
  • GitHub Check: Lint check
  • GitHub Check: Lint check
  • GitHub Check: Post submodule check comment / Comment on PR
  • GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (24)
pyproject.toml (1)

249-249: Configuration change is compatible with Ruff 0.9.9.

Ruff 0.9.9 supports the [tool.ruff.lint.per-file-ignores] configuration format, so the restructuring at line 249 is valid and will not cause configuration parsing issues.

tests/unit/test_recipes_and_test_suites.py (2)

39-39: No issues found with the "rm" algorithm mapping addition.

The entry "rm": "examples/configs/rm.yaml" at line 39 is correct. The referenced config file exists (6.7K), and supporting recipe files are present (examples/run_rm.py, examples/run_grpo_rm.py, examples/configs/grpo_rm_1B.yaml). The addition properly follows the established pattern in ALGO_MAPPING_TO_BASE_YAML and integrates cleanly with the existing codebase.


33-41: Test removal was intentional and replaced with comprehensive Pydantic validation.

The removed test function test_all_recipes_can_merge_configs_with_base_config has been replaced by the new Pydantic-based validation in test_config_validation.py. The new approach:

  1. Validates all config files using Pydantic's TypeAdapter against typed MasterConfig classes
  2. Handles config merging through load_config_with_inheritance before validation
  3. Provides comprehensive validation of the entire config structure, not just merge capability
  4. Maintains allowed additional keys via ALLOWED_ADDITIONAL_CONFIG_KEYS for edge cases

The removal is justified and part of a systematic shift to Pydantic-based validation.

nemo_rl/models/policy/megatron_policy_worker.py (1)

640-647: No config issues found—assertion will not cause runtime failures.

The only recipe with active logprob_chunk_size: 2048 (grpo-math-qwen3-30ba3b-megatron-tp4-32k.yaml) already sets defer_fp32_logits: true at line 31. All other configs have logprob_chunk_size: null, so they bypass the assertion. The assertion is compatible with all existing configurations.

examples/configs/grpo_math_1B.yaml (1)

66-66: LGTM: Explicit defaults improve config clarity.

These changes make configuration defaults explicit:

  • hf_config_overrides: {} clarifies an empty dict rather than null
  • defer_fp32_logits: False provides an explicit boolean default

This aligns with the coding guideline that exemplar configs should document defaults.

Also applies to: 106-106

examples/configs/vlm_grpo_3B.yaml (1)

96-96: LGTM: Explicit boolean default.

Setting defer_fp32_logits: False explicitly aligns with the pattern across other Megatron configs in this PR.

examples/configs/vlm_grpo_3B_megatron.yaml (1)

135-135: LGTM: Consistent with config standardization.

The explicit defer_fp32_logits: False default maintains consistency across all Megatron-based configurations.

tests/unit/models/generation/test_vllm_generation.py (1)

440-440: LGTM: Internal padding key convention.

The switch to _pad_token_id (with underscore prefix) indicates this is now an internal configuration key. The fallback to tokenizer.pad_token_id provides good defensive handling.

This aligns with similar changes in vllm_worker_async.py and lm_policy.py.

examples/configs/sft.yaml (1)

38-38: LGTM: Explicit defaults for DTensor and Megatron configs.

These additions make configuration defaults explicit:

  • env_vars: {} under both dtensor_cfg and megatron_cfg clarifies environment variable configuration
  • defer_fp32_logits: False provides an explicit Megatron default

This improves config clarity and aligns with the guideline that YAML is the single source of truth for defaults.

Also applies to: 77-77, 96-96

examples/configs/distillation_math.yaml (1)

106-106: LGTM: Explicit Megatron default.

The change to defer_fp32_logits: False maintains consistency with other Megatron configurations across the codebase.

nemo_rl/models/generation/vllm/vllm_worker_async.py (1)

531-531: LGTM: Consistent internal padding key usage.

Both changes adopt the _pad_token_id internal key convention:

  • Line 531: For padding verification
  • Line 639: For tensor initialization with padding

This is consistent with the same refactoring in test_vllm_generation.py (line 440) and lm_policy.py (line 585).

Also applies to: 639-639

nemo_rl/models/policy/lm_policy.py (2)

585-585: LGTM: Internal padding key for generation output.

Using _pad_token_id from the generation config maintains consistency with the internal key convention adopted across vllm_worker_async.py and test files.


735-740: Verify broadened checkpoint validation logic.

The condition changed from checking a specific value to checking if model_save_format is not None:

# Before (implied): checked if model_save_format == "safetensors" (or similar)
# After: checks if model_save_format is not None
if (
    checkpointing_cfg is not None
    and checkpointing_cfg.get("model_save_format", None) is not None
):
    raise ValueError(
        "model_save_format must be None or omitted if using DTensorPolicyWorker (_v2=False)."
    )

This now rejects any non-None model_save_format for DTensorPolicyWorker (v1). Ensure this is the intended behavior and won't break existing workflows that might pass other format values.

docs/design-docs/generation.md (1)

19-19: LGTM! Documentation aligns with code.

The type annotation update correctly reflects that top_k is optional in the generation configuration.

tests/unit/models/generation/test_vllm_large_model.py (1)

171-171: LGTM! Consistent with internal padding token refactoring.

The change to use "_pad_token_id" aligns with the broader refactoring across the vLLM generation path. The fallback to tokenizer.pad_token_id ensures robustness.

nemo_rl/models/generation/vllm/vllm_worker.py (2)

539-539: LGTM! Internal padding key usage.

The change to use self.cfg["_pad_token_id"] for padding validation is consistent with the broader refactoring to use internal padding token keys across the vLLM generation path.


573-573: LGTM! Internal padding key usage.

Using self.cfg["_pad_token_id"] for constructing padded output tensors aligns with the internal padding token key refactoring throughout the codebase.

tools/config_cli.py (1)

49-55: LGTM! Distillation support added correctly.

The addition follows the existing pattern for GRPO and correctly maps the distillation algorithm to its base configuration file. This enables the minimize workflow for distillation recipes.

nemo_rl/experience/rollouts.py (1)

163-163: LGTM! Simplified padding token source.

The change to directly use tokenizer.pad_token_id simplifies the padding logic by treating the tokenizer as the single source of truth for padding tokens in async generation paths.

nemo_rl/environments/math_environment.py (2)

18-18: LGTM! Correct import for TypedDict optionality.

Adding NotRequired enables proper expression of optional configuration fields in the TypedDict.


44-46: LGTM! Proper TypedDict optionality pattern.

The change from Optional[...] to NotRequired[... | None] correctly expresses configuration optionality using TypedDict semantics, allowing fields to be omitted while also supporting explicit None values. As per coding guidelines.

nemo_rl/models/generation/vllm/vllm_generation.py (3)

120-123: LGTM! Good defensive check with clear explanation.

The explicit check for model_name with a helpful comment correctly handles the case where this field is NotRequired in the base GenerationConfig but required by VllmGenerationWorker.


453-453: LGTM! Consistent internal padding key usage.

Both changes correctly use "_pad_token_id" for padding values when combining worker results, aligning with the internal padding token key refactoring across the vLLM generation path.

Also applies to: 504-504


92-92: Verify that top_k is always present in config.

The change from .get("top_k") to ["top_k"] now requires top_k to be present in the config (though it can be None). Please verify that configure_generation_config always populates this key before VllmGeneration initialization.

Comment on lines +42 to +43
# Dual-clipping value (should be >1 if enabled; usually set to 3 empirically). None to disable.
ratio_clip_c: float | None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use NotRequired[float] instead of float | None for optional configuration.

The coding guidelines specify: "Express configuration optionality via TypedDict using typing.NotRequired". The current implementation uses float | None, which means the key must be present in the config (but can be None), whereas NotRequired[float] indicates the key may be absent entirely.

As per coding guidelines.

Apply this diff to align with the coding guidelines:

-    # Dual-clipping value (should be >1 if enabled; usually set to 3 empirically). None to disable.
-    ratio_clip_c: float | None
+    # Dual-clipping value (should be >1 if enabled; usually set to 3 empirically). Omit or set to None to disable.
+    ratio_clip_c: NotRequired[float | None]

Additionally, update line 111 to handle the key's potential absence:

self.ratio_clip_c = cfg.get("ratio_clip_c", None)  # set to None to disable dual-clipping
🤖 Prompt for AI Agents
In nemo_rl/algorithms/loss_functions.py around lines 42-43, change the TypedDict
declaration for ratio_clip_c from a required field typed as "float | None" to an
optional field using "NotRequired[float]" (add/import NotRequired from typing)
so the config key may be omitted; then at line 111 update the assignment to pull
the value with cfg.get("ratio_clip_c", None) so missing keys default to None
(disable dual-clipping).

Comment on lines +125 to 129
top_k: int | None
model_name: NotRequired[str] # Not Required b/c GRPO writes this
stop_token_ids: list[int] | None
stop_strings: list[str] | None
colocated: NotRequired[ColocationConfig]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Restore optionality for stop_strings.

Switching stop_strings from NotRequired to a required key (even if None is allowed) will start failing every existing YAML config that omits this field once the new Pydantic validation runs. This PR is supposed to be a cherry-pick, so we cannot introduce a breaking config change here. Please keep stop_strings optional (e.g., stop_strings: NotRequired[list[str] | None]) so legacy configs continue to load. As per coding guidelines

-    stop_strings: list[str] | None
+    stop_strings: NotRequired[list[str] | None]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
top_k: int | None
model_name: NotRequired[str] # Not Required b/c GRPO writes this
stop_token_ids: list[int] | None
stop_strings: list[str] | None
colocated: NotRequired[ColocationConfig]
top_k: int | None
model_name: NotRequired[str] # Not Required b/c GRPO writes this
stop_token_ids: list[int] | None
stop_strings: NotRequired[list[str] | None]
colocated: NotRequired[ColocationConfig]
🤖 Prompt for AI Agents
In nemo_rl/models/generation/interfaces.py around lines 125 to 129, stop_strings
was changed from an optional key to a required key which will break existing
YAML configs that omit it; revert it to an optional field by declaring it as
NotRequired[list[str] | None] so the key may be absent while still allowing None
or a list when present, ensuring backward compatibility with legacy configs and
matching the existing pattern used for colocated/model_name.

Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 2, 2025
@terrykong terrykong enabled auto-merge (squash) November 2, 2025 03:43
@terrykong terrykong merged commit bed7217 into r0.4.0 Nov 2, 2025
40 of 41 checks passed
@terrykong terrykong deleted the chtruong/cp-1382-r0.4.0 branch November 2, 2025 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants