fix: Fixes to make Megatron backend match dtensor#1389
Conversation
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
📝 WalkthroughWalkthroughThe PR removes the Changes
Sequence DiagramsequenceDiagram
participant Config as Training Config
participant PolicyWorker as MegatronPolicyWorker
participant Model as Model Config
Config->>PolicyWorker: Initialize with policy settings
PolicyWorker->>Model: Enable calculate_per_token_loss
PolicyWorker->>Model: Enable perform_initialization
alt calculate_per_token_loss is True
PolicyWorker->>Model: Override average_in_collective = False
Note over PolicyWorker,Model: Avoid MCore assertion errors
else calculate_per_token_loss is False
Note over PolicyWorker,Model: Use default average_in_collective
end
PolicyWorker->>PolicyWorker: check_tensor_parallel_attributes()
Note over PolicyWorker: Inspect TP status, partition dims,<br/>shapes across model parameters
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/unit/models/policy/test_megatron_worker.py (1)
2436-2436: Consider adding explicitstrict=Trueto zip() call.While the test already asserts equal length on lines 2432-2434, adding
strict=Trueto thezip()call would provide additional runtime enforcement and follows Python 3.10+ best practices. Since the coding guidelines target Python 3.12+, this parameter is available.Apply this diff:
- for i, (gn, ref_gn) in enumerate(zip(grad_norm, reference_grad_norm)): + for i, (gn, ref_gn) in enumerate(zip(grad_norm, reference_grad_norm, strict=True)):Based on static analysis hint from Ruff.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
examples/configs/distillation_math.yaml(0 hunks)examples/configs/distillation_math_megatron.yaml(0 hunks)examples/configs/dpo.yaml(0 hunks)examples/configs/grpo_math_1B.yaml(0 hunks)examples/configs/grpo_math_1B_megatron.yaml(0 hunks)examples/configs/recipes/llm/sft-qwen2.5-math-7b-megatron.yaml(1 hunks)examples/configs/rm.yaml(0 hunks)examples/configs/sft.yaml(0 hunks)examples/configs/sft_openmathinstruct2_megatron.yaml(0 hunks)examples/configs/vlm_grpo_3B.yaml(0 hunks)examples/configs/vlm_grpo_3B_megatron.yaml(0 hunks)nemo_rl/models/policy/__init__.py(0 hunks)nemo_rl/models/policy/megatron_policy_worker.py(3 hunks)tests/test_suites/llm/sft-qwen2.5-math-7b-megatron.sh(1 hunks)tests/test_suites/nightly.txt(1 hunks)tests/unit/models/generation/test_vllm_generation.py(0 hunks)tests/unit/models/policy/test_megatron_worker.py(2 hunks)tools/refit_verifier.py(0 hunks)
💤 Files with no reviewable changes (13)
- examples/configs/sft_openmathinstruct2_megatron.yaml
- examples/configs/dpo.yaml
- examples/configs/distillation_math.yaml
- examples/configs/vlm_grpo_3B_megatron.yaml
- examples/configs/vlm_grpo_3B.yaml
- examples/configs/grpo_math_1B.yaml
- tools/refit_verifier.py
- examples/configs/grpo_math_1B_megatron.yaml
- tests/unit/models/generation/test_vllm_generation.py
- examples/configs/sft.yaml
- examples/configs/distillation_math_megatron.yaml
- examples/configs/rm.yaml
- nemo_rl/models/policy/init.py
🧰 Additional context used
📓 Path-based instructions (10)
tests/test_suites/nightly.txt
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Append the new driver script path (relative to tests/test_suites/) to tests/test_suites/nightly.txt
Files:
tests/test_suites/nightly.txt
tests/test_suites/**
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Place driver shell scripts and common.env under tests/test_suites// and list nightly tests in tests/test_suites/nightly.txt
Files:
tests/test_suites/nightly.txttests/test_suites/llm/sft-qwen2.5-math-7b-megatron.sh
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts
Files:
nemo_rl/models/policy/megatron_policy_worker.pytests/unit/models/policy/test_megatron_worker.py
nemo_rl/**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)
Files:
nemo_rl/models/policy/megatron_policy_worker.py
examples/configs/recipes/**/*.yaml
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
examples/configs/recipes/**/*.yaml: Recipe YAMLs under examples/configs/recipes/** are runnable snapshots and may omit documentation
When adding support for a new model, add a recipe YAML under examples/configs/recipes/ in the appropriate domain (llm/ or vlm/) with the correct name
Files:
examples/configs/recipes/llm/sft-qwen2.5-math-7b-megatron.yaml
examples/configs/recipes/llm/*.yaml
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
LLM recipe YAML filenames must follow: --ng-[-modifiers][-long][.vN].yaml
Files:
examples/configs/recipes/llm/sft-qwen2.5-math-7b-megatron.yaml
examples/configs/recipes/**/*.{yaml,sh}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Known exception: Deepscaler recipes may encode context length in place of the cluster tuple (e.g., grpo-deepscaler-1.5b-8K.*); allowed but document intended hardware in the script
Files:
examples/configs/recipes/llm/sft-qwen2.5-math-7b-megatron.yaml
examples/configs/recipes/**
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Place recipe YAMLs under examples/configs/recipes//
Files:
examples/configs/recipes/llm/sft-qwen2.5-math-7b-megatron.yaml
**/*.sh
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.sh: Follow the Google Shell Style Guide for all shell scripts
Useuv runto execute Python scripts in shell/driver scripts instead of activating virtualenvs and callingpythondirectly
Add the NVIDIA copyright header (with current year) at the top of all shell scripts, excluding tests/ and test-only scripts
Files:
tests/test_suites/llm/sft-qwen2.5-math-7b-megatron.sh
tests/test_suites/llm/*.sh
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
LLM driver script filenames must mirror the YAML base name and follow the same pattern with .sh extension
Files:
tests/test_suites/llm/sft-qwen2.5-math-7b-megatron.sh
🧠 Learnings (10)
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/nightly.txt : Append the new driver script path (relative to tests/test_suites/) to tests/test_suites/nightly.txt
Applied to files:
tests/test_suites/nightly.txt
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/** : Place driver shell scripts and common.env under tests/test_suites/<domain>/ and list nightly tests in tests/test_suites/nightly.txt
Applied to files:
tests/test_suites/nightly.txt
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/llm/*.sh : LLM driver script filenames must mirror the YAML base name and follow the same pattern with .sh extension
Applied to files:
tests/test_suites/nightly.txttests/test_suites/llm/sft-qwen2.5-math-7b-megatron.sh
📚 Learning: 2025-09-17T01:52:21.399Z
Learnt from: ffrujeri
PR: NVIDIA-NeMo/RL#1023
File: nemo_rl/utils/checkpoint.py:58-65
Timestamp: 2025-09-17T01:52:21.399Z
Learning: model_state_dict_keys is not intended to be part of the nemo-rl CheckpointingConfig TypedDict - it's handled at the automodel implementation layer, not as a general checkpointing configuration parameter.
Applied to files:
nemo_rl/models/policy/megatron_policy_worker.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/llm/*.yaml : LLM recipe YAML filenames must follow: <algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>[-modifiers][-long][.vN].yaml
Applied to files:
examples/configs/recipes/llm/sft-qwen2.5-math-7b-megatron.yaml
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/**/*.{yaml,sh} : Known exception: Deepscaler recipes may encode context length in place of the cluster tuple (e.g., grpo-deepscaler-1.5b-8K.*); allowed but document intended hardware in the script
Applied to files:
examples/configs/recipes/llm/sft-qwen2.5-math-7b-megatron.yaml
📚 Learning: 2025-10-12T14:46:57.171Z
Learnt from: zpqiu
PR: NVIDIA-NeMo/RL#1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:6-11
Timestamp: 2025-10-12T14:46:57.171Z
Learning: Test scripts in tests/test_suites/llm/ follow a standard configuration pattern that includes NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS (calculated as `$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))`), and NUM_MINUTES. These variables are part of the test infrastructure's standard interface and should not be flagged as unused even if not directly referenced within the individual script, as they are consumed by external launch tooling or common.env.
Applied to files:
tests/test_suites/llm/sft-qwen2.5-math-7b-megatron.sh
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/**/*.{sh} : For new model support, add a matching driver shell script under tests/test_suites/<domain>/ that sources common.env and invokes 'uv run ... --config <yaml>'
Applied to files:
tests/test_suites/llm/sft-qwen2.5-math-7b-megatron.sh
📚 Learning: 2025-10-12T14:46:55.513Z
Learnt from: zpqiu
PR: NVIDIA-NeMo/RL#1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30
Timestamp: 2025-10-12T14:46:55.513Z
Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.
Applied to files:
tests/test_suites/llm/sft-qwen2.5-math-7b-megatron.sh
📚 Learning: 2025-09-19T07:28:29.887Z
Learnt from: shuo-nvidia
PR: NVIDIA-NeMo/RL#1006
File: tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh:1-4
Timestamp: 2025-09-19T07:28:29.887Z
Learning: The NVIDIA-NeMo/RL project prefers to maintain consistent formatting across test scripts rather than applying individual bash hardening improvements like `set -euo pipefail` or proper quoting for sourcing files.
Applied to files:
tests/test_suites/llm/sft-qwen2.5-math-7b-megatron.sh
🧬 Code graph analysis (1)
tests/unit/models/policy/test_megatron_worker.py (7)
tests/unit/conftest.py (1)
tiny_llama_model_path(500-524)nemo_rl/distributed/batched_data_dict.py (2)
to(825-832)BatchedDataDict(75-860)nemo_rl/distributed/virtual_cluster.py (2)
RayVirtualCluster(186-505)shutdown(477-496)nemo_rl/models/policy/lm_policy.py (4)
Policy(59-809)prepare_for_training(652-655)train(465-558)shutdown(749-756)nemo_rl/algorithms/loss_functions.py (1)
NLLLoss(379-455)nemo_rl/models/policy/megatron_policy_worker.py (3)
prepare_for_training(2003-2026)train(899-1149)shutdown(2229-2234)nemo_rl/distributed/worker_groups.py (2)
run_all_workers_single_data(728-772)shutdown(930-1004)
🪛 Ruff (0.14.2)
tests/unit/models/policy/test_megatron_worker.py
2436-2436: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
🪛 Shellcheck (0.11.0)
tests/test_suites/llm/sft-qwen2.5-math-7b-megatron.sh
[warning] 9-9: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 12-12: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 13-13: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 19-19: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 32-32: Double quote array expansions to avoid re-splitting elements.
(SC2068)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Lint check
- GitHub Check: Post submodule check comment / Comment on PR
- GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (7)
examples/configs/recipes/llm/sft-qwen2.5-math-7b-megatron.yaml (1)
36-42: Data paths verified and correct.Both
examples/prompts/math.txtand theopenmathinstruct2dataset are available in the repository. The prompt file exists at the referenced path, and the dataset is implemented asOpenMathInstruct2Datasetwith full integration in the data loading framework. No changes needed.tests/test_suites/nightly.txt (1)
70-71: Nightly test entry correctly appended.The new test path is properly added to the SFT/Megatron section with a descriptive comment. Path format (relative to
tests/test_suites/) and placement follow established patterns in the file.tests/test_suites/llm/sft-qwen2.5-math-7b-megatron.sh (1)
1-43: Verification complete—all dependencies satisfied.All nine required variables and functions are properly defined in
tests/test_suites/llm/common.env:PROJECT_ROOT,CONFIG_PATH,LOG_DIR,CKPT_DIR,EXP_NAME,RUN_LOG,JSON_METRICS,exit_if_max_steps_reached, and they are correctly initialized and exported. The script has no missing dependencies.nemo_rl/models/policy/megatron_policy_worker.py (2)
707-709: Good addition of explanatory comment.The comment explains why
average_in_collective=Falseis required whencalculate_per_token_loss=True, which addresses the past review feedback requesting clarification of this setting. The relationship between these two configuration options is now documented.Based on past review comments requesting descriptive explanations for this configuration.
2250-2292: LGTM! Well-structured introspection utility.The new
check_tensor_parallel_attributesmethod provides a clear way to inspect tensor parallel attributes on model parameters. The implementation correctly:
- Iterates through all model parameters
- Captures relevant TP attributes (tensor_model_parallel, partition_dim, partition_stride)
- Returns a well-structured dictionary with comprehensive information
- Is properly tested in the test suite (see test_megatron_gradient_norm_consistency_across_parallelism)
This is a useful diagnostic tool for validating tensor parallel setup.
tests/unit/models/policy/test_megatron_worker.py (2)
19-21: LGTM! Imports support new test functionality.The numpy import is used for numerical comparisons in the gradient norm consistency test (line 2450), and the ray import is used to retrieve results from distributed workers (line 2378). Both are necessary for the new test function.
2271-2465: Excellent comprehensive test for gradient norm consistency!This test thoroughly validates that:
- Gradient norms are consistent across different parallelization strategies (DP, TP)
- Losses are consistent across configurations
- Tensor parallel attributes are correctly set on model parameters when TP > 1
The test design is sound:
- Uses reproducible test data (fixed seed)
- Tests three meaningful configurations: DP1TP1 (baseline), DP2, and TP2
- Uses float32 for stable gradient comparisons
- Validates tensor parallel attributes via the new
check_tensor_parallel_attributesmethod- Provides clear diagnostic output for debugging
The test directly validates the changes made in
megatron_policy_worker.pyrelated to per-token loss handling and tensor parallel setup.
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information
Summary by CodeRabbit
New Features
Tests
Chores