cp: use pydantic for yaml test validation (#1382) into r0.4.0#1459
cp: use pydantic for yaml test validation (#1382) into r0.4.0#1459
use pydantic for yaml test validation (#1382) into r0.4.0#1459Conversation
ceb25fc to
b7ef96f
Compare
ℹ️ File Consistency CheckCheck based on commit: b7ef96f (PR #1459 from ✅ DTensor Policy Worker Synchronization CheckBoth DTensor policy worker files were modified in this PR:
Please ensure that the changes are consistent between both files where applicable. This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning. |
b7ef96f to
0a2fde0
Compare
📝 WalkthroughWalkthroughThis PR refactors configuration schemas across the codebase, introducing Disabled variants for policy configs, expanding eval dataset types, updating generation code to use internal Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Policy Config
participant Generation Init
participant vLLM Worker
participant Output Processor
User->>Policy Config: set generation.top_k (now int|None)
Policy Config->>Generation Init: load config
rect rgb(200, 220, 255)
Note over Generation Init: Check if _pad_token_id exists<br/>Warn if override needed<br/>Set internal _pad_token_id
end
Generation Init->>vLLM Worker: pass config with _pad_token_id
rect rgb(220, 200, 255)
Note over vLLM Worker: Use _pad_token_id<br/>(not public pad_token_id)
end
vLLM Worker->>Output Processor: batch results with _pad_token_id
Output Processor->>User: padded sequences
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Terry Kong <terryk@nvidia.com>
0a2fde0 to
7c7f221
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
nemo_rl/models/generation/__init__.py (1)
30-36: Add stacklevel to warning for better debugging.The guarded pattern and warning are good, but the warning should include
stacklevel=2to point to the caller's line rather than the warning line itself.Apply this diff:
if "_pad_token_id" in config: warnings.warn( "'_pad_token_id' found in generation config and will be overridden with tokenizer.pad_token_id. " "Note: '_pad_token_id' is intended for internal use and has no effect when set in user-provided configs.", UserWarning, + stacklevel=2, ) config["_pad_token_id"] = tokenizer.pad_token_idtests/unit/test_config_validation.py (1)
113-128: Drop the unusedconfig_type.
config_typeis assigned in every branch but never read, so it just trips lint (F841). Please remove the variable (and the assignments) to keep the test file clean.- master_config_class = None - config_type = None + master_config_class = None @@ - master_config_class = EvalMasterConfig - config_type = "eval" + master_config_class = EvalMasterConfig @@ - master_config_class = DistillationMasterConfig - config_type = "distillation" + master_config_class = DistillationMasterConfig @@ - master_config_class = DPOMasterConfig - config_type = "dpo" + master_config_class = DPOMasterConfig @@ - master_config_class = SFTMasterConfig - config_type = "sft" + master_config_class = SFTMasterConfig @@ - master_config_class = GRPOMasterConfig - config_type = "grpo" + master_config_class = GRPOMasterConfig @@ - master_config_class = RMMasterConfig - config_type = "rm" + master_config_class = RMMasterConfig
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (42)
.pre-commit-config.yaml(1 hunks)docs/design-docs/generation.md(1 hunks)examples/configs/distillation_math.yaml(1 hunks)examples/configs/distillation_math_megatron.yaml(1 hunks)examples/configs/dpo.yaml(2 hunks)examples/configs/grpo_math_1B.yaml(2 hunks)examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml(0 hunks)examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-e2e.yaml(0 hunks)examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v3.yaml(0 hunks)examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-2n8g-fsdp2tp1-noncolocated.yaml(0 hunks)examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml(0 hunks)examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-fsdp2tp1.v3.yaml(0 hunks)examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron.yaml(0 hunks)examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt-long.v3.yaml(0 hunks)examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt.v3.yaml(0 hunks)examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-fsdp2tp4sp.v3.yaml(0 hunks)examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-megatron.yaml(0 hunks)examples/configs/recipes/llm/grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1.v3.yaml(0 hunks)examples/configs/sft.yaml(3 hunks)examples/configs/sft_openmathinstruct2_megatron.yaml(1 hunks)examples/configs/vlm_grpo_3B.yaml(1 hunks)examples/configs/vlm_grpo_3B_megatron.yaml(1 hunks)nemo_rl/algorithms/loss_functions.py(1 hunks)nemo_rl/data/__init__.py(2 hunks)nemo_rl/environments/math_environment.py(2 hunks)nemo_rl/evals/eval.py(2 hunks)nemo_rl/experience/rollouts.py(1 hunks)nemo_rl/models/generation/__init__.py(2 hunks)nemo_rl/models/generation/interfaces.py(2 hunks)nemo_rl/models/generation/vllm/vllm_generation.py(4 hunks)nemo_rl/models/generation/vllm/vllm_worker.py(2 hunks)nemo_rl/models/generation/vllm/vllm_worker_async.py(2 hunks)nemo_rl/models/policy/__init__.py(5 hunks)nemo_rl/models/policy/lm_policy.py(2 hunks)nemo_rl/models/policy/megatron_policy_worker.py(3 hunks)nemo_rl/utils/checkpoint.py(2 hunks)pyproject.toml(1 hunks)tests/unit/models/generation/test_vllm_generation.py(1 hunks)tests/unit/models/generation/test_vllm_large_model.py(1 hunks)tests/unit/test_config_validation.py(1 hunks)tests/unit/test_recipes_and_test_suites.py(1 hunks)tools/config_cli.py(1 hunks)
💤 Files with no reviewable changes (12)
- examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-2n8g-fsdp2tp1-noncolocated.yaml
- examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-megatron.yaml
- examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron.yaml
- examples/configs/recipes/llm/grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1.v3.yaml
- examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt.v3.yaml
- examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.yaml
- examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml
- examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-fsdp2tp1.v3.yaml
- examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-e2e.yaml
- examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt-long.v3.yaml
- examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v3.yaml
- examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-fsdp2tp4sp.v3.yaml
🧰 Additional context used
📓 Path-based instructions (4)
docs/**/*.md
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
When a markdown doc under docs/**/*.md is added or renamed, update docs/index.md to include it in the appropriate section
Files:
docs/design-docs/generation.md
examples/configs/*.yaml
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
examples/configs/*.yaml: Exemplar configs under examples/configs/.yaml must include documented defaults
When adding a new config key, reflect its recommended default in exemplar YAMLs under examples/configs/.yaml
Files:
examples/configs/grpo_math_1B.yamlexamples/configs/vlm_grpo_3B.yamlexamples/configs/distillation_math.yamlexamples/configs/sft_openmathinstruct2_megatron.yamlexamples/configs/distillation_math_megatron.yamlexamples/configs/vlm_grpo_3B_megatron.yamlexamples/configs/dpo.yamlexamples/configs/sft.yaml
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts
Files:
nemo_rl/algorithms/loss_functions.pynemo_rl/experience/rollouts.pynemo_rl/models/policy/lm_policy.pytests/unit/models/generation/test_vllm_generation.pynemo_rl/models/generation/__init__.pytests/unit/test_config_validation.pytools/config_cli.pynemo_rl/models/generation/vllm/vllm_worker.pynemo_rl/models/policy/megatron_policy_worker.pynemo_rl/utils/checkpoint.pynemo_rl/environments/math_environment.pytests/unit/models/generation/test_vllm_large_model.pynemo_rl/models/generation/vllm/vllm_worker_async.pynemo_rl/data/__init__.pynemo_rl/models/generation/vllm/vllm_generation.pynemo_rl/models/generation/interfaces.pynemo_rl/evals/eval.pytests/unit/test_recipes_and_test_suites.pynemo_rl/models/policy/__init__.py
nemo_rl/**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)
Files:
nemo_rl/algorithms/loss_functions.pynemo_rl/experience/rollouts.pynemo_rl/models/policy/lm_policy.pynemo_rl/models/generation/__init__.pynemo_rl/models/generation/vllm/vllm_worker.pynemo_rl/models/policy/megatron_policy_worker.pynemo_rl/utils/checkpoint.pynemo_rl/environments/math_environment.pynemo_rl/models/generation/vllm/vllm_worker_async.pynemo_rl/data/__init__.pynemo_rl/models/generation/vllm/vllm_generation.pynemo_rl/models/generation/interfaces.pynemo_rl/evals/eval.pynemo_rl/models/policy/__init__.py
🧠 Learnings (15)
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/**/*.{yaml,sh} : Known exception: Deepscaler recipes may encode context length in place of the cluster tuple (e.g., grpo-deepscaler-1.5b-8K.*); allowed but document intended hardware in the script
Applied to files:
.pre-commit-config.yamltools/config_cli.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/vlm/*.yaml : VLM recipe YAML filenames must follow: vlm_<algo>-<model>-<nodes>n<gpus>g-<strategy>[-modifiers][.vN].yaml
Applied to files:
.pre-commit-config.yamltools/config_cli.pytests/unit/test_recipes_and_test_suites.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/**/*.yaml : When adding support for a new model, add a recipe YAML under examples/configs/recipes/ in the appropriate domain (llm/ or vlm/) with the correct name
Applied to files:
.pre-commit-config.yamltools/config_cli.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/llm/*.yaml : LLM recipe YAML filenames must follow: <algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>[-modifiers][-long][.vN].yaml
Applied to files:
.pre-commit-config.yamltools/config_cli.pytests/unit/test_recipes_and_test_suites.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/**/*.yaml : Recipe YAMLs under examples/configs/recipes/** are runnable snapshots and may omit documentation
Applied to files:
.pre-commit-config.yaml
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/vlm/*.sh : VLM driver script filenames must mirror the YAML base name and follow the same pattern with .sh extension
Applied to files:
.pre-commit-config.yaml
📚 Learning: 2025-10-30T20:50:44.126Z
Learnt from: adil-a
Repo: NVIDIA-NeMo/RL PR: 1440
File: examples/configs/sft_automodel.yaml:48-58
Timestamp: 2025-10-30T20:50:44.126Z
Learning: In DTensor configurations for MoE (Mixture of Experts) models, expert_parallel_size and data_parallel_size can be applied together without multiplying the GPU requirements. Expert Parallelism (EP) only applies to MoE layers, while Data Parallelism/FSDP applies to non-MoE layers. Therefore, configurations like expert_parallel_size: 8 and data_parallel_size: 8 are valid on an 8-GPU cluster for MoE models.
Applied to files:
examples/configs/sft_openmathinstruct2_megatron.yaml
📚 Learning: 2025-09-10T05:29:34.349Z
Learnt from: bxyu-nvidia
Repo: NVIDIA-NeMo/RL PR: 1110
File: nemo_rl/models/generation/vllm/vllm_worker_async.py:98-105
Timestamp: 2025-09-10T05:29:34.349Z
Learning: In the _maybe_correct_merged_tokens function in nemo_rl/models/generation/vllm/vllm_worker_async.py, the loop condition `len(candidate_token_ids) < len(actual_token_ids) - 1` is intentionally designed to prevent accessing the final token in actual_token_ids, likely to handle specific tokenization edge cases in the vLLM HTTP server integration.
Applied to files:
tests/unit/models/generation/test_vllm_generation.pynemo_rl/models/generation/vllm/vllm_worker.pytests/unit/models/generation/test_vllm_large_model.pynemo_rl/models/generation/vllm/vllm_worker_async.py
📚 Learning: 2025-09-10T05:34:35.406Z
Learnt from: bxyu-nvidia
Repo: NVIDIA-NeMo/RL PR: 1110
File: nemo_rl/models/generation/vllm/vllm_worker_async.py:346-359
Timestamp: 2025-09-10T05:34:35.406Z
Learning: In nemo_rl/models/generation/vllm/vllm_worker_async.py, the HTTP server intentionally uses different path structures: `/v1/chat/completions` is under the `/v1` prefix while `/tokenize` is at the root level without the `/v1` prefix. This is the intended design.
Applied to files:
tests/unit/models/generation/test_vllm_generation.pynemo_rl/models/generation/vllm/vllm_worker.pynemo_rl/models/generation/vllm/vllm_worker_async.py
📚 Learning: 2025-09-19T03:00:58.662Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:85-101
Timestamp: 2025-09-19T03:00:58.662Z
Learning: In distillation and GRPO configurations, max_new_tokens is intentionally set to the full context window (max_total_sequence_length) for consistency across the codebase. Overflow cases when prompt + generation tokens exceed max_model_len are handled by safeguards implemented in vllm_worker.py.
Applied to files:
tests/unit/models/generation/test_vllm_generation.pynemo_rl/models/generation/vllm/vllm_worker.pytests/unit/models/generation/test_vllm_large_model.pynemo_rl/models/generation/vllm/vllm_worker_async.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
Applied to files:
tests/unit/test_config_validation.pynemo_rl/utils/checkpoint.pynemo_rl/data/__init__.pynemo_rl/models/generation/interfaces.pynemo_rl/evals/eval.pynemo_rl/models/policy/__init__.py
📚 Learning: 2025-09-18T14:57:31.003Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1006
File: nemo_rl/algorithms/distillation.py:312-354
Timestamp: 2025-09-18T14:57:31.003Z
Learning: The distillation algorithm's cluster setup logic is designed to follow the same patterns used in GRPO for handling distributed training clusters and resource allocation.
Applied to files:
tools/config_cli.py
📚 Learning: 2025-09-17T01:52:21.399Z
Learnt from: ffrujeri
Repo: NVIDIA-NeMo/RL PR: 1023
File: nemo_rl/utils/checkpoint.py:58-65
Timestamp: 2025-09-17T01:52:21.399Z
Learning: model_state_dict_keys is not intended to be part of the nemo-rl CheckpointingConfig TypedDict - it's handled at the automodel implementation layer, not as a general checkpointing configuration parameter.
Applied to files:
nemo_rl/utils/checkpoint.pynemo_rl/models/policy/__init__.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : Express configuration optionality via TypedDict using typing.NotRequired
Applied to files:
nemo_rl/environments/math_environment.pynemo_rl/data/__init__.pynemo_rl/models/generation/interfaces.pynemo_rl/models/policy/__init__.py
📚 Learning: 2025-09-19T02:44:38.451Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:73-84
Timestamp: 2025-09-19T02:44:38.451Z
Learning: The scheduler configuration format with a separate "milestones: [20]" entry (not wrapped under name/kwargs) is a valid and established pattern used across GRPO, DPO, and distillation configs in the NeMo RL codebase. This format specifies transition points between different schedulers (e.g., LinearLR for warmup steps, then ConstantLR).
Applied to files:
nemo_rl/models/policy/__init__.py
🧬 Code graph analysis (10)
nemo_rl/experience/rollouts.py (2)
tests/unit/environments/test_code_environment.py (1)
tokenizer(85-94)tests/unit/environments/test_retriever.py (1)
tokenizer(84-93)
tests/unit/models/generation/test_vllm_generation.py (2)
tests/unit/environments/test_code_environment.py (1)
tokenizer(85-94)tests/unit/environments/test_retriever.py (1)
tokenizer(84-93)
nemo_rl/models/generation/__init__.py (2)
tests/unit/models/generation/test_vllm_generation.py (1)
tokenizer(238-241)tests/unit/models/generation/test_vllm_large_model.py (1)
tokenizer(82-85)
tests/unit/test_config_validation.py (4)
nemo_rl/evals/eval.py (1)
MasterConfig(57-63)nemo_rl/algorithms/distillation.py (1)
MasterConfig(110-121)nemo_rl/algorithms/grpo.py (1)
MasterConfig(161-169)tools/config_cli.py (1)
load_config_with_inheritance(100-141)
nemo_rl/models/generation/vllm/vllm_worker.py (1)
nemo_rl/models/generation/interfaces.py (1)
verify_right_padding(23-99)
tests/unit/models/generation/test_vllm_large_model.py (2)
tests/unit/environments/test_code_environment.py (1)
tokenizer(85-94)tests/unit/environments/test_retriever.py (1)
tokenizer(84-93)
nemo_rl/models/generation/vllm/vllm_worker_async.py (1)
nemo_rl/models/generation/interfaces.py (1)
verify_right_padding(23-99)
nemo_rl/data/__init__.py (1)
tests/unit/data/test_data_processor.py (1)
system_prompt_file(191-195)
nemo_rl/evals/eval.py (3)
nemo_rl/environments/math_environment.py (1)
MathEnvConfig(42-46)nemo_rl/models/generation/interfaces.py (1)
GenerationConfig(118-131)nemo_rl/models/policy/__init__.py (1)
TokenizerConfig(129-133)
nemo_rl/models/policy/__init__.py (2)
nemo_rl/models/generation/interfaces.py (1)
GenerationConfig(118-131)nemo_rl/models/policy/megatron_policy_worker.py (1)
freeze_moe_router(251-263)
🪛 Ruff (0.14.2)
nemo_rl/models/generation/__init__.py
31-31: No explicit stacklevel keyword argument found
Set stacklevel=2
(B028)
tests/unit/test_config_validation.py
49-49: Avoid specifying long messages outside the exception class
(TRY003)
104-104: Avoid specifying long messages outside the exception class
(TRY003)
127-127: Local variable config_type is assigned to but never used
Remove assignment to unused variable config_type
(F841)
129-131: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: build-container / main
- GitHub Check: sphinx-build / Build docs
- GitHub Check: Lint check
- GitHub Check: Lint check
- GitHub Check: Post submodule check comment / Comment on PR
- GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (24)
pyproject.toml (1)
249-249: Configuration change is compatible with Ruff 0.9.9.Ruff 0.9.9 supports the
[tool.ruff.lint.per-file-ignores]configuration format, so the restructuring at line 249 is valid and will not cause configuration parsing issues.tests/unit/test_recipes_and_test_suites.py (2)
39-39: No issues found with the "rm" algorithm mapping addition.The entry
"rm": "examples/configs/rm.yaml"at line 39 is correct. The referenced config file exists (6.7K), and supporting recipe files are present (examples/run_rm.py,examples/run_grpo_rm.py,examples/configs/grpo_rm_1B.yaml). The addition properly follows the established pattern inALGO_MAPPING_TO_BASE_YAMLand integrates cleanly with the existing codebase.
33-41: Test removal was intentional and replaced with comprehensive Pydantic validation.The removed test function
test_all_recipes_can_merge_configs_with_base_confighas been replaced by the new Pydantic-based validation intest_config_validation.py. The new approach:
- Validates all config files using Pydantic's
TypeAdapteragainst typed MasterConfig classes- Handles config merging through
load_config_with_inheritancebefore validation- Provides comprehensive validation of the entire config structure, not just merge capability
- Maintains allowed additional keys via
ALLOWED_ADDITIONAL_CONFIG_KEYSfor edge casesThe removal is justified and part of a systematic shift to Pydantic-based validation.
nemo_rl/models/policy/megatron_policy_worker.py (1)
640-647: No config issues found—assertion will not cause runtime failures.The only recipe with active
logprob_chunk_size: 2048(grpo-math-qwen3-30ba3b-megatron-tp4-32k.yaml) already setsdefer_fp32_logits: trueat line 31. All other configs havelogprob_chunk_size: null, so they bypass the assertion. The assertion is compatible with all existing configurations.examples/configs/grpo_math_1B.yaml (1)
66-66: LGTM: Explicit defaults improve config clarity.These changes make configuration defaults explicit:
hf_config_overrides: {}clarifies an empty dict rather than nulldefer_fp32_logits: Falseprovides an explicit boolean defaultThis aligns with the coding guideline that exemplar configs should document defaults.
Also applies to: 106-106
examples/configs/vlm_grpo_3B.yaml (1)
96-96: LGTM: Explicit boolean default.Setting
defer_fp32_logits: Falseexplicitly aligns with the pattern across other Megatron configs in this PR.examples/configs/vlm_grpo_3B_megatron.yaml (1)
135-135: LGTM: Consistent with config standardization.The explicit
defer_fp32_logits: Falsedefault maintains consistency across all Megatron-based configurations.tests/unit/models/generation/test_vllm_generation.py (1)
440-440: LGTM: Internal padding key convention.The switch to
_pad_token_id(with underscore prefix) indicates this is now an internal configuration key. The fallback totokenizer.pad_token_idprovides good defensive handling.This aligns with similar changes in
vllm_worker_async.pyandlm_policy.py.examples/configs/sft.yaml (1)
38-38: LGTM: Explicit defaults for DTensor and Megatron configs.These additions make configuration defaults explicit:
env_vars: {}under bothdtensor_cfgandmegatron_cfgclarifies environment variable configurationdefer_fp32_logits: Falseprovides an explicit Megatron defaultThis improves config clarity and aligns with the guideline that YAML is the single source of truth for defaults.
Also applies to: 77-77, 96-96
examples/configs/distillation_math.yaml (1)
106-106: LGTM: Explicit Megatron default.The change to
defer_fp32_logits: Falsemaintains consistency with other Megatron configurations across the codebase.nemo_rl/models/generation/vllm/vllm_worker_async.py (1)
531-531: LGTM: Consistent internal padding key usage.Both changes adopt the
_pad_token_idinternal key convention:
- Line 531: For padding verification
- Line 639: For tensor initialization with padding
This is consistent with the same refactoring in
test_vllm_generation.py(line 440) andlm_policy.py(line 585).Also applies to: 639-639
nemo_rl/models/policy/lm_policy.py (2)
585-585: LGTM: Internal padding key for generation output.Using
_pad_token_idfrom the generation config maintains consistency with the internal key convention adopted acrossvllm_worker_async.pyand test files.
735-740: Verify broadened checkpoint validation logic.The condition changed from checking a specific value to checking if
model_save_formatis not None:# Before (implied): checked if model_save_format == "safetensors" (or similar) # After: checks if model_save_format is not None if ( checkpointing_cfg is not None and checkpointing_cfg.get("model_save_format", None) is not None ): raise ValueError( "model_save_format must be None or omitted if using DTensorPolicyWorker (_v2=False)." )This now rejects any non-None
model_save_formatfor DTensorPolicyWorker (v1). Ensure this is the intended behavior and won't break existing workflows that might pass other format values.docs/design-docs/generation.md (1)
19-19: LGTM! Documentation aligns with code.The type annotation update correctly reflects that
top_kis optional in the generation configuration.tests/unit/models/generation/test_vllm_large_model.py (1)
171-171: LGTM! Consistent with internal padding token refactoring.The change to use
"_pad_token_id"aligns with the broader refactoring across the vLLM generation path. The fallback totokenizer.pad_token_idensures robustness.nemo_rl/models/generation/vllm/vllm_worker.py (2)
539-539: LGTM! Internal padding key usage.The change to use
self.cfg["_pad_token_id"]for padding validation is consistent with the broader refactoring to use internal padding token keys across the vLLM generation path.
573-573: LGTM! Internal padding key usage.Using
self.cfg["_pad_token_id"]for constructing padded output tensors aligns with the internal padding token key refactoring throughout the codebase.tools/config_cli.py (1)
49-55: LGTM! Distillation support added correctly.The addition follows the existing pattern for GRPO and correctly maps the distillation algorithm to its base configuration file. This enables the minimize workflow for distillation recipes.
nemo_rl/experience/rollouts.py (1)
163-163: LGTM! Simplified padding token source.The change to directly use
tokenizer.pad_token_idsimplifies the padding logic by treating the tokenizer as the single source of truth for padding tokens in async generation paths.nemo_rl/environments/math_environment.py (2)
18-18: LGTM! Correct import for TypedDict optionality.Adding
NotRequiredenables proper expression of optional configuration fields in the TypedDict.
44-46: LGTM! Proper TypedDict optionality pattern.The change from
Optional[...]toNotRequired[... | None]correctly expresses configuration optionality using TypedDict semantics, allowing fields to be omitted while also supporting explicit None values. As per coding guidelines.nemo_rl/models/generation/vllm/vllm_generation.py (3)
120-123: LGTM! Good defensive check with clear explanation.The explicit check for
model_namewith a helpful comment correctly handles the case where this field isNotRequiredin the baseGenerationConfigbut required byVllmGenerationWorker.
453-453: LGTM! Consistent internal padding key usage.Both changes correctly use
"_pad_token_id"for padding values when combining worker results, aligning with the internal padding token key refactoring across the vLLM generation path.Also applies to: 504-504
92-92: Verify that top_k is always present in config.The change from
.get("top_k")to["top_k"]now requirestop_kto be present in the config (though it can beNone). Please verify thatconfigure_generation_configalways populates this key beforeVllmGenerationinitialization.
| # Dual-clipping value (should be >1 if enabled; usually set to 3 empirically). None to disable. | ||
| ratio_clip_c: float | None |
There was a problem hiding this comment.
Use NotRequired[float] instead of float | None for optional configuration.
The coding guidelines specify: "Express configuration optionality via TypedDict using typing.NotRequired". The current implementation uses float | None, which means the key must be present in the config (but can be None), whereas NotRequired[float] indicates the key may be absent entirely.
As per coding guidelines.
Apply this diff to align with the coding guidelines:
- # Dual-clipping value (should be >1 if enabled; usually set to 3 empirically). None to disable.
- ratio_clip_c: float | None
+ # Dual-clipping value (should be >1 if enabled; usually set to 3 empirically). Omit or set to None to disable.
+ ratio_clip_c: NotRequired[float | None]Additionally, update line 111 to handle the key's potential absence:
self.ratio_clip_c = cfg.get("ratio_clip_c", None) # set to None to disable dual-clipping🤖 Prompt for AI Agents
In nemo_rl/algorithms/loss_functions.py around lines 42-43, change the TypedDict
declaration for ratio_clip_c from a required field typed as "float | None" to an
optional field using "NotRequired[float]" (add/import NotRequired from typing)
so the config key may be omitted; then at line 111 update the assignment to pull
the value with cfg.get("ratio_clip_c", None) so missing keys default to None
(disable dual-clipping).
| top_k: int | None | ||
| model_name: NotRequired[str] # Not Required b/c GRPO writes this | ||
| stop_token_ids: list[int] | None | ||
| stop_strings: list[str] | None | ||
| colocated: NotRequired[ColocationConfig] |
There was a problem hiding this comment.
Restore optionality for stop_strings.
Switching stop_strings from NotRequired to a required key (even if None is allowed) will start failing every existing YAML config that omits this field once the new Pydantic validation runs. This PR is supposed to be a cherry-pick, so we cannot introduce a breaking config change here. Please keep stop_strings optional (e.g., stop_strings: NotRequired[list[str] | None]) so legacy configs continue to load. As per coding guidelines
- stop_strings: list[str] | None
+ stop_strings: NotRequired[list[str] | None]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| top_k: int | None | |
| model_name: NotRequired[str] # Not Required b/c GRPO writes this | |
| stop_token_ids: list[int] | None | |
| stop_strings: list[str] | None | |
| colocated: NotRequired[ColocationConfig] | |
| top_k: int | None | |
| model_name: NotRequired[str] # Not Required b/c GRPO writes this | |
| stop_token_ids: list[int] | None | |
| stop_strings: NotRequired[list[str] | None] | |
| colocated: NotRequired[ColocationConfig] |
🤖 Prompt for AI Agents
In nemo_rl/models/generation/interfaces.py around lines 125 to 129, stop_strings
was changed from an optional key to a required key which will break existing
YAML configs that omit it; revert it to an optional field by declaring it as
NotRequired[list[str] | None] so the key may be absent while still allowing None
or a list when present, ensuring backward compatibility with legacy configs and
matching the existing pattern used for colocated/model_name.
Signed-off-by: Terry Kong <terryk@nvidia.com>
What does this PR do ?
cp:
use pydantic for yaml test validation(#1382) intor0.4.0Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information
Summary by CodeRabbit
Release Notes
Configuration Updates
Evaluation Support
Internal Improvements