cp: feat: add Megatron support for on-policy distillation (1324) into r0.4.0#1398
cp: feat: add Megatron support for on-policy distillation (1324) into r0.4.0#1398
feat: add Megatron support for on-policy distillation (1324) into r0.4.0#1398Conversation
Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com> Signed-off-by: alexchiu <qiuzhaopeng@foxmail.com> Signed-off-by: alexchiu <alexq@nvidia.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
📝 WalkthroughWalkthroughThis PR adds Megatron on-policy distillation support by introducing epoch-based training tracking in the distillation algorithm, implementing top-k logits computation for the Megatron backend with context parallelism support, and providing comprehensive configuration examples and test coverage for Megatron-specific distillation workflows. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Distillation Loop
participant DistillationSaveState
participant MegatronWorker
User->>Distillation Loop: Start (max_num_epochs=10, max_num_steps=N)
loop for each epoch (0 to max_num_epochs-1)
loop for each step in epoch
Distillation Loop->>MegatronWorker: get_topk_logits()
rect rgb(200, 220, 255)
Note over MegatronWorker: compute local top-k via distributed_vocab_topk
end
alt context_parallel > 1 and sequence_packing
MegatronWorker->>MegatronWorker: allgather across CP groups
end
MegatronWorker-->>Distillation Loop: topk_logits, topk_indices
Distillation Loop->>Distillation Loop: train step
Distillation Loop->>DistillationSaveState: update(current_epoch, current_step, total_steps)
alt total_steps >= max_num_steps
Distillation Loop->>Distillation Loop: break (training complete)
end
end
end
Distillation Loop-->>User: Training complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Rationale: The PR modifies core data structures (DistillationConfig, DistillationSaveState) affecting training loop state management; implements a substantial new feature (get_topk_logits for Megatron with distributed computation, context parallelism, and sequence packing logic); spans 13 heterogeneous files including core logic, configs, and extensive test coverage; requires understanding epoch-based tracking, distributed tensor operations, and validation of test suite metrics. Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (17)
nemo_rl/algorithms/distillation.py (4)
78-80: Config and save-state shape look good; add brief key doc.New max_num_epochs and multi-counter save state are consistent. Add a short doc comment on recommended defaults/usage per coding guidelines for TypedDict keys.
Also applies to: 88-107
253-255: Setup logging is helpfulThe added progress prints (dataloader/val/cluster) aid operability without impacting behavior.
Consider gating verbose prints behind a logger verbosity level in future.
Also applies to: 270-273, 278-278, 295-297, 360-362
300-302: Clarify Megatron non‑colocated constraint in errorGood assertion. Consider appending “Use backend=vllm for non-colocated inference” to reduce config round‑trips.
578-580: Step header denominator can mislead late in trainingPrinting denominator as min(len(dataloader), max_steps) is coarse. Prefer remaining steps in this epoch.
- print( - f"\n{'=' * 25} Step {current_step + 1}/{min(len(dataloader), max_steps)} {'=' * 25}", - flush=True, - ) + steps_left = max_steps - total_steps + steps_this_epoch = min(len(dataloader), steps_left) + print( + f"\n{'=' * 25} Step {current_step + 1}/{steps_this_epoch} {'=' * 25}", + flush=True, + )tests/unit/test_recipes_and_test_suites.py (1)
156-156: Threshold bump acknowledged; consider centralizing the limit.Raising the cap to 1040 is fine. To avoid future rename churn, consider extracting 1040 to a module constant (e.g.,
MAX_NIGHTLY_GPU_HOURS) and reference it in the assertion/message; the test name can remain as-is. The new Megatron distillation driver is properly listed in nightly and present on disk.nemo_rl/models/policy/megatron_policy_worker.py (1)
1409-1700: Address lint nits: add return type, remove unused variables, shorten exception messageVerification confirms the CP > 1 test with sequence packing (
test_megatron_context_parallel_topk_agreement) exists and validates the slice math through comprehensive assertions. The implementation is sound. Apply these minor cleanup improvements:
- Add return type annotation for clarity:
- def get_topk_logits( + def get_topk_logits( self, *, data: BatchedDataDict[GenerationDatumSpec], k: int, micro_batch_size: Optional[int] = None, - ): + ) -> BatchedDataDict:
Remove unused variables (F841 Ruff warnings):
pp_size(line 1469): read from config but never usedinput_ids_unpacked(line 1477): unpacked but never referencedcu_seqlens(line 1480): unpacked but onlycu_seqlens_paddedis usedShorten exception message (TRY003):
- raise RuntimeError( - "Context Parallelism (CP>1) requires sequence packing to be enabled." - ) + raise RuntimeError("CP>1 requires sequence packing enabled.")tests/unit/algorithms/test_distillation.py (3)
552-554: Match interface: add return type annotation for DummyPolicy.offload_after_refit.Annotate to align with interfaces.PolicyWorker.offload_after_refit(self) -> None.
- def offload_after_refit(self): - return None + def offload_after_refit(self) -> None: + return None
91-96: Avoid fragile MagicMock.iter overrides.Assigning a plain function to iter can bypass method binding. Use return_value/side_effect on the mock instead.
- def train_iter(self): - return iter([mock_batch] * 10) - train_dataloader.__iter__ = train_iter + train_dataloader.__iter__.side_effect = lambda: iter([mock_batch] * 10) train_dataloader.__len__ = MagicMock(return_value=10) - def val_iter(self): - return iter([mock_batch] * 10) - val_dataloader.__iter__ = val_iter + val_dataloader.__iter__.side_effect = lambda: iter([mock_batch] * 10) val_dataloader.__len__ = MagicMock(return_value=10)Also applies to: 99-104
123-131: Add an epoch-termination unit test.You added distillation.max_num_epochs, but no test asserts epoch based early-exit. Recommend a focused test where max_num_epochs=1 and max_num_steps is large to verify the loop stops at epoch boundary. I can draft it if helpful.
Also applies to: 512-519, 620-627
tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh (1)
2-4: Export config vars sourced by helpers (silence SC2034).If common.env consumes these via env, export them; else add a brief comment.
source $SCRIPT_DIR/common.env # ===== BEGIN CONFIG ===== -NUM_NODES=1 -STEPS_PER_RUN=10 -MAX_STEPS=10 -NUM_RUNS=$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN )) # Round up -NUM_MINUTES=60 +export NUM_NODES=1 +export STEPS_PER_RUN=10 +export MAX_STEPS=10 +export NUM_RUNS=$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN )) # Round up +export NUM_MINUTES=60Also applies to: 6-11
tests/unit/models/policy/test_megatron_worker.py (4)
316-319: Narrow or annotate broad exception handlers (BLE001).Catching Exception masks real failures. Either catch specific errors (e.g., ray exceptions, CUDA OOM) or annotate to acknowledge intent in tests.
- except Exception as e: + except Exception as e: # noqa: BLE001 - broad to convert infra/env failures into skips in tests print(f"Error during training setup: {e}") pytest.skip(f"Training setup failed: {e}") @@ - except Exception as e: + except Exception as e: # noqa: BLE001 print(f"Error during generation setup: {e}") pytest.skip(f"Generation setup failed: {e}") @@ - except Exception as e: + except Exception as e: # noqa: BLE001 print(f"Error during logprob setup: {e}") pytest.skip(f"Logprob setup failed: {e}") @@ - except Exception as e: + except Exception as e: # noqa: BLE001 print(f"Error during topk setup: {e}") pytest.skip(f"Topk setup failed: {e}")Also applies to: 507-509, 657-660, 1413-1416
1460-1462: Silence unused cluster var (RUF059).Prefix with underscore.
- policy, cluster, data = topk_setup + policy, _cluster, data = topk_setup
1516-1521: Guard CP top‑k test on CUDA availability.Prevents failures on CPU-only CI.
def test_megatron_context_parallel_topk_agreement(tiny_qwen2_model_path): """Test that CP and non-CP models produce identical top-k logits with sequence packing enabled.""" - num_gpus = 2 + if not torch.cuda.is_available(): + pytest.skip("CUDA is required for context-parallel top-k test") + num_gpus = 2
1861-1873: Fix stray tuple; use msg for clarity.The current code builds an unused tuple. Pass msg= to assert_close.
- ( - torch.testing.assert_close( - logprobs_no_cp, logprobs_no_cp_no_packing, rtol=1e-3, atol=1e-3 - ), - ( - "Logprobs should match between non-CP and non-CP models with sequence packing" - ), - ) + torch.testing.assert_close( + logprobs_no_cp, + logprobs_no_cp_no_packing, + rtol=1e-3, + atol=1e-3, + msg="Logprobs should match between packing and non-packing (non-CP)", + )examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.yaml (3)
12-24: Align policy parallelism with filename “tp2pp2cp2”.File name implies TP=2, PP=2, CP=2, but policy.megatron_cfg doesn’t set these explicitly. Recommend pinning them to avoid surprises from defaults.
policy: @@ megatron_cfg: enabled: true + tensor_model_parallel_size: 2 + pipeline_model_parallel_size: 2 + context_parallel_size: 2
25-37: Teacher parallelism naming mismatch; verify intent.Recipe name suggests “tp2pp2cp2”, but teacher sets tp=4, cp=1 and omits PP. If intentional (e.g., different sharding for teacher), add a brief comment; otherwise align to tp2/pp2/cp2.
teacher: model_name: Qwen/Qwen3-32B @@ megatron_cfg: enabled: true - tensor_model_parallel_size: 4 - context_parallel_size: 1 + tensor_model_parallel_size: 2 + pipeline_model_parallel_size: 2 + context_parallel_size: 2
12-17: Ensure policy.model_name is set to Qwen3‑1.7B‑Base.The filename encodes “to‑1.7b‑base”, but policy.model_name isn’t set here. If it isn’t provided by defaults, set it explicitly to avoid composing the wrong model.
I can draft a quick OmegaConf loader script to print the composed policy/teacher names.
Also applies to: 25-27
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
README.md(0 hunks)examples/configs/distillation_math.yaml(2 hunks)examples/configs/distillation_math_megatron.yaml(1 hunks)examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.yaml(1 hunks)nemo_rl/algorithms/distillation.py(26 hunks)nemo_rl/models/policy/megatron_policy_worker.py(2 hunks)tests/functional/L1_Functional_Tests_GPU.sh(1 hunks)tests/functional/distillation_megatron.sh(1 hunks)tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh(1 hunks)tests/test_suites/nightly.txt(1 hunks)tests/unit/algorithms/test_distillation.py(4 hunks)tests/unit/models/policy/test_megatron_worker.py(1 hunks)tests/unit/test_recipes_and_test_suites.py(2 hunks)
💤 Files with no reviewable changes (1)
- README.md
🧰 Additional context used
📓 Path-based instructions (11)
examples/configs/*.yaml
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
examples/configs/*.yaml: Exemplar configs under examples/configs/.yaml must include documented defaults
When adding a new config key, reflect its recommended default in exemplar YAMLs under examples/configs/.yaml
Files:
examples/configs/distillation_math.yamlexamples/configs/distillation_math_megatron.yaml
**/*.sh
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.sh: Follow the Google Shell Style Guide for all shell scripts
Useuv runto execute Python scripts in shell/driver scripts instead of activating virtualenvs and callingpythondirectly
Add the NVIDIA copyright header (with current year) at the top of all shell scripts, excluding tests/ and test-only scripts
Files:
tests/functional/L1_Functional_Tests_GPU.shtests/functional/distillation_megatron.shtests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh
tests/test_suites/nightly.txt
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Append the new driver script path (relative to tests/test_suites/) to tests/test_suites/nightly.txt
Files:
tests/test_suites/nightly.txt
tests/test_suites/**
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Place driver shell scripts and common.env under tests/test_suites// and list nightly tests in tests/test_suites/nightly.txt
Files:
tests/test_suites/nightly.txttests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts
Files:
nemo_rl/algorithms/distillation.pytests/unit/test_recipes_and_test_suites.pytests/unit/models/policy/test_megatron_worker.pytests/unit/algorithms/test_distillation.pynemo_rl/models/policy/megatron_policy_worker.py
nemo_rl/**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)
Files:
nemo_rl/algorithms/distillation.pynemo_rl/models/policy/megatron_policy_worker.py
tests/test_suites/llm/*.sh
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
LLM driver script filenames must mirror the YAML base name and follow the same pattern with .sh extension
Files:
tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh
examples/configs/recipes/**/*.yaml
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
examples/configs/recipes/**/*.yaml: Recipe YAMLs under examples/configs/recipes/** are runnable snapshots and may omit documentation
When adding support for a new model, add a recipe YAML under examples/configs/recipes/ in the appropriate domain (llm/ or vlm/) with the correct name
Files:
examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.yaml
examples/configs/recipes/llm/*.yaml
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
LLM recipe YAML filenames must follow: --ng-[-modifiers][-long][.vN].yaml
Files:
examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.yaml
examples/configs/recipes/**/*.{yaml,sh}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Known exception: Deepscaler recipes may encode context length in place of the cluster tuple (e.g., grpo-deepscaler-1.5b-8K.*); allowed but document intended hardware in the script
Files:
examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.yaml
examples/configs/recipes/**
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Place recipe YAMLs under examples/configs/recipes//
Files:
examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.yaml
🧠 Learnings (2)
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/nightly.txt : Append the new driver script path (relative to tests/test_suites/) to tests/test_suites/nightly.txt
Applied to files:
tests/test_suites/nightly.txt
📚 Learning: 2025-09-18T14:20:36.297Z
Learnt from: zpqiu
PR: NVIDIA-NeMo/RL#1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.yaml:113-120
Timestamp: 2025-09-18T14:20:36.297Z
Learning: In distillation workflows, the teacher policy does not perform generation - it only does inference/logprob computation on sequences generated by the student policy. Therefore, teacher generation configuration mismatches (like vLLM tensor parallelism settings) and colocation concerns are not relevant.
Applied to files:
nemo_rl/algorithms/distillation.py
🧬 Code graph analysis (5)
nemo_rl/algorithms/distillation.py (6)
nemo_rl/utils/logger.py (1)
log_batched_dict_as_jsonl(893-917)nemo_rl/models/policy/megatron_policy_worker.py (5)
offload_after_refit(2121-2143)prepare_for_lp_inference(2046-2049)get_topk_logits(1409-1699)prepare_for_training(2051-2074)train(873-1126)nemo_rl/models/policy/dtensor_policy_worker.py (5)
offload_after_refit(1880-1903)prepare_for_lp_inference(1831-1838)get_topk_logits(1327-1622)prepare_for_training(1841-1862)train(529-895)nemo_rl/models/policy/interfaces.py (4)
offload_after_refit(153-154)get_topk_logits(84-95)prepare_for_training(125-126)train(98-115)nemo_rl/models/policy/lm_policy.py (6)
offload_after_refit(740-743)Policy(59-806)prepare_for_lp_inference(638-642)get_topk_logits(380-444)prepare_for_training(633-636)train(446-539)nemo_rl/distributed/batched_data_dict.py (2)
BatchedDataDict(75-860)size(814-823)
tests/unit/test_recipes_and_test_suites.py (1)
tests/unit/conftest.py (1)
tracker(221-252)
tests/unit/models/policy/test_megatron_worker.py (7)
nemo_rl/distributed/virtual_cluster.py (2)
RayVirtualCluster(177-435)shutdown(407-426)nemo_rl/algorithms/utils.py (1)
get_tokenizer(157-288)nemo_rl/models/generation/__init__.py (1)
configure_generation_config(24-45)nemo_rl/models/policy/lm_policy.py (3)
Policy(59-806)shutdown(780-787)get_topk_logits(380-444)nemo_rl/distributed/batched_data_dict.py (2)
to(825-832)BatchedDataDict(75-860)nemo_rl/models/policy/megatron_policy_worker.py (2)
shutdown(2286-2288)get_topk_logits(1409-1699)tests/unit/conftest.py (1)
tiny_qwen2_model_path(514-538)
tests/unit/algorithms/test_distillation.py (5)
nemo_rl/models/policy/megatron_policy_worker.py (1)
offload_after_refit(2121-2143)nemo_rl/models/policy/dtensor_policy_worker.py (1)
offload_after_refit(1880-1903)nemo_rl/models/policy/interfaces.py (1)
offload_after_refit(153-154)nemo_rl/models/policy/dtensor_policy_worker_v2.py (1)
offload_after_refit(1841-1864)nemo_rl/models/policy/lm_policy.py (1)
offload_after_refit(740-743)
nemo_rl/models/policy/megatron_policy_worker.py (2)
nemo_rl/distributed/model_utils.py (2)
allgather_cp_sharded_tensor(706-709)distributed_vocab_topk(829-896)nemo_rl/distributed/batched_data_dict.py (4)
BatchedDataDict(75-860)to(825-832)size(814-823)from_batches(102-172)
🪛 Ruff (0.14.1)
tests/unit/models/policy/test_megatron_worker.py
1413-1413: Do not catch blind exception: Exception
(BLE001)
1461-1461: Unpacked variable cluster is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
nemo_rl/models/policy/megatron_policy_worker.py
1469-1469: Local variable pp_size is assigned to but never used
Remove assignment to unused variable pp_size
(F841)
1477-1477: Unpacked variable input_ids_unpacked is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
1480-1480: Unpacked variable cu_seqlens is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
1601-1603: Avoid specifying long messages outside the exception class
(TRY003)
🪛 Shellcheck (0.11.0)
tests/functional/distillation_megatron.sh
[error] 52-52: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh
[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 29-29: Double quote array expansions to avoid re-splitting elements.
(SC2068)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Lint check
- GitHub Check: Lint check
- GitHub Check: Lint check
- GitHub Check: Post automodel integration comment / Comment on PR
- GitHub Check: Post submodule check comment / Comment on PR
🔇 Additional comments (13)
nemo_rl/algorithms/distillation.py (5)
193-214: DTensor SP + packing guardAssertion guarding SP+packing with DTensor is appropriate; thanks for the explicit message and link.
377-384: Megatron train_iters alignment across teacher/studentSetting train_iters=min(max_steps, max_epochs*len(dataloader)) for both teacher and student is correct and avoids scheduler mismatch.
If resuming from checkpoint with a changed dataset length, confirm train_iters remains consistent with the original run to preserve scheduler behavior.
Also applies to: 429-436
525-544: Epoch/step counters wired correctlycurrent_epoch/current_step/total_steps pulled from save-state align with the new training loop.
708-711: Training loop/CKPT/metrics flow
- is_last_step logic and periodic validation look correct.
- Save-state increments (+1) before persisting avoids off‑by‑one on resume.
- Metrics logged at step total_steps+1 are consistent with counters.
Also applies to: 714-736, 779-788, 802-808, 893-895, 909-912
924-937: Validation messaging improvementsHelpful user feedback on skipped validation and timing; no behavior change.
Also applies to: 1017-1033
nemo_rl/models/policy/megatron_policy_worker.py (1)
103-107: New utilities imported for TP/CP top‑kAppropriate dependencies for distributed top‑k and CP allgather.
tests/functional/L1_Functional_Tests_GPU.sh (1)
33-33: Include Megatron distillation functional testGood addition; consistent with uv-run usage across the suite.
tests/test_suites/nightly.txt (1)
95-96: Verification confirms nightly entry is correctAll checks passed:
- Script file exists and is executable
- Entry is present in nightly.txt
- Path format matches existing entries
tests/functional/distillation_megatron.sh (1)
20-27: Harden script: quote args/paths and guard cd (SC2068/SC2164).Prevents re-splitting and failing silently on directory changes.
-rm -rf $EXP_DIR $LOG_DIR -mkdir -p $EXP_DIR $LOG_DIR +rm -rf "$EXP_DIR" "$LOG_DIR" +mkdir -p "$EXP_DIR" "$LOG_DIR" -cd $PROJECT_ROOT +cd "$PROJECT_ROOT" || exit 1 uv run coverage run -a --data-file=$PROJECT_ROOT/tests/.coverage --source=$PROJECT_ROOT/nemo_rl \ - $PROJECT_ROOT/examples/run_distillation_math.py \ - --config $PROJECT_ROOT/examples/configs/distillation_math_megatron.yaml \ + "$PROJECT_ROOT/examples/run_distillation_math.py" \ + --config "$PROJECT_ROOT/examples/configs/distillation_math_megatron.yaml" \ policy.model_name=Qwen/Qwen3-0.6B-Base \ teacher.model_name=Qwen/Qwen3-0.6B \ cluster.gpus_per_node=2 \ @@ - logger.log_dir=$LOG_DIR \ + logger.log_dir="$LOG_DIR" \ @@ - checkpointing.checkpoint_dir=/tmp/distillation_checkpoints \ - $@ \ - 2>&1 | tee $RUN_LOG + checkpointing.checkpoint_dir=/tmp/distillation_checkpoints \ + "$@" \ + 2>&1 | tee "$RUN_LOG" -uv run tests/json_dump_tb_logs.py $LOG_DIR --output_path $JSON_METRICS +uv run tests/json_dump_tb_logs.py "$LOG_DIR" --output_path "$JSON_METRICS" -uv run tests/check_metrics.py $JSON_METRICS \ +uv run tests/check_metrics.py "$JSON_METRICS" \ 'data["train/loss"]["3"] < 1.0'Also applies to: 23-53, 55-58
⛔ Skipped due to learnings
Learnt from: zpqiu PR: NVIDIA-NeMo/RL#1324 File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30 Timestamp: 2025-10-12T14:46:55.513Z Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh (1)
16-31: Quote arguments and guard cd (SC2068, SC2164).Prevents word-splitting and failure-on-cd issues.
-cd $PROJECT_ROOT +cd "$PROJECT_ROOT" || exit 1 uv run examples/run_distillation_math.py \ - --config $CONFIG_PATH \ - distillation.max_num_steps=$MAX_STEPS \ + --config "$CONFIG_PATH" \ + distillation.max_num_steps="$MAX_STEPS" \ distillation.val_period=20 \ - logger.log_dir=$LOG_DIR \ + logger.log_dir="$LOG_DIR" \ logger.wandb_enabled=True \ logger.wandb.project=nemo-rl-distillation \ - logger.wandb.name=$EXP_NAME \ + logger.wandb.name="$EXP_NAME" \ logger.monitor_gpus=True \ logger.tensorboard_enabled=True \ checkpointing.enabled=True \ - checkpointing.checkpoint_dir=$CKPT_DIR \ - $@ \ - 2>&1 | tee $RUN_LOG + checkpointing.checkpoint_dir="$CKPT_DIR" \ + "$@" \ + 2>&1 | tee "$RUN_LOG" -uv run tests/json_dump_tb_logs.py $LOG_DIR --output_path $JSON_METRICS +uv run tests/json_dump_tb_logs.py "$LOG_DIR" --output_path "$JSON_METRICS" -if [[ $(jq 'to_entries | .[] | select(.key == "train/loss") | .value | keys | map(tonumber) | max' $JSON_METRICS) -ge $MAX_STEPS ]]; then - uv run tests/check_metrics.py $JSON_METRICS \ +if [[ $(jq 'to_entries | .[] | select(.key == "train/loss") | .value | keys | map(tonumber) | max' "$JSON_METRICS") -ge "$MAX_STEPS" ]]; then + uv run tests/check_metrics.py "$JSON_METRICS" \ 'data["train/loss"]["1"] < 1.5' \ 'data["train/loss"]["10"] < 0.5' \ 'max(data["ray/node.0.gpu.0.mem_gb"]) < 75' \ 'mean(data["timing/train/total_step_time"], -6, -1) < 500' fiAlso applies to: 33-42
⛔ Skipped due to learnings
Learnt from: zpqiu PR: NVIDIA-NeMo/RL#1324 File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30 Timestamp: 2025-10-12T14:46:55.513Z Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.examples/configs/distillation_math.yaml (1)
7-7: Documented defaults for new config keys are complete per guidelines.The
max_num_epochs: 10parameter and entiremegatron_cfgstructure include explicit defaults. This satisfies the requirement that exemplar configs underexamples/configs/*.yamldocument recommended defaults for new config keys.Also applies to: 84-150
examples/configs/distillation_math_megatron.yaml (2)
38-104: MEGATRON_BASE configuration inherits and properly enables Megatron backend.The
megatron_cfgblock is well-defined with all necessary settings for Megatron distributed training, context parallelism, and MoE routing. Theenabled: trueflag (line 39) correctly activates the Megatron backend, in contrast to the base config'senabled: false(distillation_math.yaml:85).
1-1: Disregard this review comment—the config is correct for Megatron.The megatron code path does not use
policy.optimizerandpolicy.scheduler. Whenmegatron_cfg.enabled=true, the training code instantiatesConfigContainerwith optimizer and scheduler pulled frommegatron_cfg(lines 656–693 ofmegatron_policy_worker.py), which are explicitly defined indistillation_math_megatron.yaml(lines 54–96). The missingpolicy.optimizerandpolicy.schedulerare not used and therefore not required. The configuration is correct and functional.Likely an incorrect or invalid review comment.
| print("▶ Preparing for teacher logprob inference...", flush=True) | ||
| with timer.time("teacher_logprob_inference_prep"): | ||
| teacher_policy.prepare_for_lp_inference() | ||
|
|
||
| print("▶ Computing teacher logprobs...") | ||
| print("▶ Computing teacher logprobs...", flush=True) | ||
| with timer.time("teacher_logprob_inference"): | ||
| teacher_topk = teacher_policy.get_topk_logits( | ||
| train_data, k=master_config["distillation"]["topk_logits_k"] | ||
| ) | ||
| train_data["teacher_topk_logits"] = teacher_topk["topk_logits"] | ||
| train_data["teacher_topk_indices"] = teacher_topk["topk_indices"] | ||
|
|
There was a problem hiding this comment.
Fix log message: it’s top‑k logits, not logprobs
These prints say “logprob” but the code calls get_topk_logits. Adjust to avoid confusion.
- print("▶ Preparing for teacher logprob inference...", flush=True)
+ print("▶ Preparing for teacher top‑k inference...", flush=True)
@@
- print("▶ Computing teacher logprobs...", flush=True)
+ print("▶ Computing teacher top‑k logits...", flush=True)Also applies to: 690-695
🤖 Prompt for AI Agents
In nemo_rl/algorithms/distillation.py around lines 686-697 (and similarly
690-695), the printed messages incorrectly refer to "logprob" while the code is
computing top-k logits via get_topk_logits; update the print statements to
accurately say "top-k logits" (e.g., "Preparing for teacher top-k logits
inference..." and "Computing teacher top-k logits...") so logs match the
operation and avoid confusion.
|
@zpqiu to take a look at test failure |
…to `r0.4.0` (#1398) Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com> Signed-off-by: alexchiu <qiuzhaopeng@foxmail.com> Signed-off-by: alexchiu <alexq@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: alexchiu <alexq@nvidia.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Yuki Huang <yukih@nvidia.com>
beep boop [🤖]: Hi @zpqiu 👋,
Summary by CodeRabbit
New Features
Documentation
Tests