Skip to content

cp: feat: add Megatron support for on-policy distillation (1324) into r0.4.0#1398

Merged
terrykong merged 3 commits intor0.4.0from
cherry-pick-1324-r0.4.0
Oct 29, 2025
Merged

cp: feat: add Megatron support for on-policy distillation (1324) into r0.4.0#1398
terrykong merged 3 commits intor0.4.0from
cherry-pick-1324-r0.4.0

Conversation

@chtruong814
Copy link
Contributor

@chtruong814 chtruong814 commented Oct 21, 2025

beep boop [🤖]: Hi @zpqiu 👋,

we've cherry picked #1324 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

  • New Features

    • Added epoch-based training progress tracking to distillation workflows alongside step counters.
    • Enabled top-k logit computation support for Megatron backend configurations.
    • Introduced new distillation configuration examples with Megatron parallelism support.
  • Documentation

    • Removed outdated backend compatibility constraints from documentation.
  • Tests

    • Added functional and unit tests for Megatron distillation and top-k logits validation.

Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
Signed-off-by: alexchiu <qiuzhaopeng@foxmail.com>
Signed-off-by: alexchiu <alexq@nvidia.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
@chtruong814 chtruong814 requested review from a team as code owners October 21, 2025 04:29
@chtruong814 chtruong814 requested a review from zpqiu October 21, 2025 04:29
@terrykong terrykong enabled auto-merge (squash) October 21, 2025 04:34
@terrykong terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Oct 21, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 21, 2025

📝 Walkthrough

Walkthrough

This PR adds Megatron on-policy distillation support by introducing epoch-based training tracking in the distillation algorithm, implementing top-k logits computation for the Megatron backend with context parallelism support, and providing comprehensive configuration examples and test coverage for Megatron-specific distillation workflows.

Changes

Cohort / File(s) Change Summary
Documentation & Configuration
README.md, examples/configs/distillation_math.yaml
Removed on-policy distillation backend restrictions; expanded distillation_math.yaml with max_num_epochs and detailed Megatron configuration block (parallelism, memory, optimization, routing, data-parallel settings).
New Megatron Distillation Configs
examples/configs/distillation_math_megatron.yaml, examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.yaml
Added complete Megatron-based distillation configurations with checkpointing, policy settings, generation backend (vLLM), logging, teacher model overrides, and cluster resource allocation.
Core Algorithm & Policy
nemo_rl/algorithms/distillation.py, nemo_rl/models/policy/megatron_policy_worker.py
Introduced max_num_epochs field and epoch/step tracking in DistillationConfig and DistillationSaveState; implemented get_topk_logits for Megatron backend with distributed top-k computation and context parallelism support; added imports for allgather_cp_sharded_tensor and distributed_vocab_topk.
Functional Test Infrastructure
tests/functional/L1_Functional_Tests_GPU.sh, tests/functional/distillation_megatron.sh, tests/test_suites/nightly.txt
Added distillation_megatron.sh test script to GPU functional tests; added nightly test suite entry for Megatron distillation run.
Distillation Test Suite
tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh
Added comprehensive test suite script for Qwen3 32B-to-1.7B distillation with Megatron parallelism; includes configuration, TensorBoard log conversion, and metric validation.
Unit Tests
tests/unit/algorithms/test_distillation.py, tests/unit/models/policy/test_megatron_worker.py, tests/unit/test_recipes_and_test_suites.py
Added max_num_epochs to test configurations and offload_after_refit method to DummyPolicy; introduced topk_setup fixture and comprehensive top-k logits validation tests for Megatron backend including context parallelism agreement checks; updated nightly GPU hours threshold from 1030 to 1040.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Distillation Loop
    participant DistillationSaveState
    participant MegatronWorker
    
    User->>Distillation Loop: Start (max_num_epochs=10, max_num_steps=N)
    
    loop for each epoch (0 to max_num_epochs-1)
        loop for each step in epoch
            Distillation Loop->>MegatronWorker: get_topk_logits()
            
            rect rgb(200, 220, 255)
            Note over MegatronWorker: compute local top-k via distributed_vocab_topk
            end
            
            alt context_parallel > 1 and sequence_packing
                MegatronWorker->>MegatronWorker: allgather across CP groups
            end
            
            MegatronWorker-->>Distillation Loop: topk_logits, topk_indices
            
            Distillation Loop->>Distillation Loop: train step
            Distillation Loop->>DistillationSaveState: update(current_epoch, current_step, total_steps)
            
            alt total_steps >= max_num_steps
                Distillation Loop->>Distillation Loop: break (training complete)
            end
        end
    end
    
    Distillation Loop-->>User: Training complete
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Rationale: The PR modifies core data structures (DistillationConfig, DistillationSaveState) affecting training loop state management; implements a substantial new feature (get_topk_logits for Megatron with distributed computation, context parallelism, and sequence packing logic); spans 13 heterogeneous files including core logic, configs, and extensive test coverage; requires understanding epoch-based tracking, distributed tensor operations, and validation of test suite metrics.

Possibly related PRs

Suggested labels

r0.4.0, CI:L1

Suggested reviewers

  • zpqiu
  • terrykong

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "cp: feat: add Megatron support for on-policy distillation (1324) into r0.4.0" clearly and specifically describes the main change in the changeset. Despite the unconventional formatting with the "cp:" prefix and "into r0.4.0" suffix (which convey cherry-pick metadata), the core message directly aligns with the substantial changes: implementing Megatron backend support for on-policy distillation across configuration files, the core distillation algorithm, policy workers, and comprehensive test coverage. The title is specific and concrete enough that a teammate scanning the history would immediately understand the primary change without confusion.
Test Results For Major Changes ✅ Passed The PR description includes alignment experiments showing convergence comparison between DTensor baseline and multiple Megatron configurations (with varying parallelism strategies), which demonstrates that the major changes do not introduce numeric regressions. The author has also confirmed in the pre-checks that unit tests and functional tests were run locally, and multiple new tests have been added to the codebase (functional tests, unit tests for top-k logits, and nightly test entries). The PR includes a usage example and clearly documents the new Megatron support feature being added.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cherry-pick-1324-r0.4.0

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (17)
nemo_rl/algorithms/distillation.py (4)

78-80: Config and save-state shape look good; add brief key doc.

New max_num_epochs and multi-counter save state are consistent. Add a short doc comment on recommended defaults/usage per coding guidelines for TypedDict keys.

Also applies to: 88-107


253-255: Setup logging is helpful

The added progress prints (dataloader/val/cluster) aid operability without impacting behavior.

Consider gating verbose prints behind a logger verbosity level in future.

Also applies to: 270-273, 278-278, 295-297, 360-362


300-302: Clarify Megatron non‑colocated constraint in error

Good assertion. Consider appending “Use backend=vllm for non-colocated inference” to reduce config round‑trips.


578-580: Step header denominator can mislead late in training

Printing denominator as min(len(dataloader), max_steps) is coarse. Prefer remaining steps in this epoch.

-            print(
-                f"\n{'=' * 25} Step {current_step + 1}/{min(len(dataloader), max_steps)} {'=' * 25}",
-                flush=True,
-            )
+            steps_left = max_steps - total_steps
+            steps_this_epoch = min(len(dataloader), steps_left)
+            print(
+                f"\n{'=' * 25} Step {current_step + 1}/{steps_this_epoch} {'=' * 25}",
+                flush=True,
+            )
tests/unit/test_recipes_and_test_suites.py (1)

156-156: Threshold bump acknowledged; consider centralizing the limit.

Raising the cap to 1040 is fine. To avoid future rename churn, consider extracting 1040 to a module constant (e.g., MAX_NIGHTLY_GPU_HOURS) and reference it in the assertion/message; the test name can remain as-is. The new Megatron distillation driver is properly listed in nightly and present on disk.

nemo_rl/models/policy/megatron_policy_worker.py (1)

1409-1700: Address lint nits: add return type, remove unused variables, shorten exception message

Verification confirms the CP > 1 test with sequence packing (test_megatron_context_parallel_topk_agreement) exists and validates the slice math through comprehensive assertions. The implementation is sound. Apply these minor cleanup improvements:

  • Add return type annotation for clarity:
-    def get_topk_logits(
+    def get_topk_logits(
         self,
         *,
         data: BatchedDataDict[GenerationDatumSpec],
         k: int,
         micro_batch_size: Optional[int] = None,
-    ):
+    ) -> BatchedDataDict:
  • Remove unused variables (F841 Ruff warnings):

    • pp_size (line 1469): read from config but never used
    • input_ids_unpacked (line 1477): unpacked but never referenced
    • cu_seqlens (line 1480): unpacked but only cu_seqlens_padded is used
  • Shorten exception message (TRY003):

-                        raise RuntimeError(
-                            "Context Parallelism (CP>1) requires sequence packing to be enabled."
-                        )
+                        raise RuntimeError("CP>1 requires sequence packing enabled.")
tests/unit/algorithms/test_distillation.py (3)

552-554: Match interface: add return type annotation for DummyPolicy.offload_after_refit.

Annotate to align with interfaces.PolicyWorker.offload_after_refit(self) -> None.

-        def offload_after_refit(self):
-            return None
+        def offload_after_refit(self) -> None:
+            return None

91-96: Avoid fragile MagicMock.iter overrides.

Assigning a plain function to iter can bypass method binding. Use return_value/side_effect on the mock instead.

-    def train_iter(self):
-        return iter([mock_batch] * 10)
-    train_dataloader.__iter__ = train_iter
+    train_dataloader.__iter__.side_effect = lambda: iter([mock_batch] * 10)
     train_dataloader.__len__ = MagicMock(return_value=10)

-    def val_iter(self):
-        return iter([mock_batch] * 10)
-    val_dataloader.__iter__ = val_iter
+    val_dataloader.__iter__.side_effect = lambda: iter([mock_batch] * 10)
     val_dataloader.__len__ = MagicMock(return_value=10)

Also applies to: 99-104


123-131: Add an epoch-termination unit test.

You added distillation.max_num_epochs, but no test asserts epoch based early-exit. Recommend a focused test where max_num_epochs=1 and max_num_steps is large to verify the loop stops at epoch boundary. I can draft it if helpful.

Also applies to: 512-519, 620-627

tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh (1)

2-4: Export config vars sourced by helpers (silence SC2034).

If common.env consumes these via env, export them; else add a brief comment.

 source $SCRIPT_DIR/common.env
 # ===== BEGIN CONFIG =====
-NUM_NODES=1
-STEPS_PER_RUN=10
-MAX_STEPS=10
-NUM_RUNS=$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))  # Round up
-NUM_MINUTES=60
+export NUM_NODES=1
+export STEPS_PER_RUN=10
+export MAX_STEPS=10
+export NUM_RUNS=$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))  # Round up
+export NUM_MINUTES=60

Also applies to: 6-11

tests/unit/models/policy/test_megatron_worker.py (4)

316-319: Narrow or annotate broad exception handlers (BLE001).

Catching Exception masks real failures. Either catch specific errors (e.g., ray exceptions, CUDA OOM) or annotate to acknowledge intent in tests.

-    except Exception as e:
+    except Exception as e:  # noqa: BLE001 - broad to convert infra/env failures into skips in tests
         print(f"Error during training setup: {e}")
         pytest.skip(f"Training setup failed: {e}")
@@
-    except Exception as e:
+    except Exception as e:  # noqa: BLE001
         print(f"Error during generation setup: {e}")
         pytest.skip(f"Generation setup failed: {e}")
@@
-    except Exception as e:
+    except Exception as e:  # noqa: BLE001
         print(f"Error during logprob setup: {e}")
         pytest.skip(f"Logprob setup failed: {e}")
@@
-    except Exception as e:
+    except Exception as e:  # noqa: BLE001
         print(f"Error during topk setup: {e}")
         pytest.skip(f"Topk setup failed: {e}")

Also applies to: 507-509, 657-660, 1413-1416


1460-1462: Silence unused cluster var (RUF059).

Prefix with underscore.

-    policy, cluster, data = topk_setup
+    policy, _cluster, data = topk_setup

1516-1521: Guard CP top‑k test on CUDA availability.

Prevents failures on CPU-only CI.

 def test_megatron_context_parallel_topk_agreement(tiny_qwen2_model_path):
     """Test that CP and non-CP models produce identical top-k logits with sequence packing enabled."""
-    num_gpus = 2
+    if not torch.cuda.is_available():
+        pytest.skip("CUDA is required for context-parallel top-k test")
+    num_gpus = 2

1861-1873: Fix stray tuple; use msg for clarity.

The current code builds an unused tuple. Pass msg= to assert_close.

-    (
-        torch.testing.assert_close(
-            logprobs_no_cp, logprobs_no_cp_no_packing, rtol=1e-3, atol=1e-3
-        ),
-        (
-            "Logprobs should match between non-CP and non-CP models with sequence packing"
-        ),
-    )
+    torch.testing.assert_close(
+        logprobs_no_cp,
+        logprobs_no_cp_no_packing,
+        rtol=1e-3,
+        atol=1e-3,
+        msg="Logprobs should match between packing and non-packing (non-CP)",
+    )
examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.yaml (3)

12-24: Align policy parallelism with filename “tp2pp2cp2”.

File name implies TP=2, PP=2, CP=2, but policy.megatron_cfg doesn’t set these explicitly. Recommend pinning them to avoid surprises from defaults.

 policy:
@@
   megatron_cfg:
     enabled: true
+    tensor_model_parallel_size: 2
+    pipeline_model_parallel_size: 2
+    context_parallel_size: 2

25-37: Teacher parallelism naming mismatch; verify intent.

Recipe name suggests “tp2pp2cp2”, but teacher sets tp=4, cp=1 and omits PP. If intentional (e.g., different sharding for teacher), add a brief comment; otherwise align to tp2/pp2/cp2.

 teacher:
   model_name: Qwen/Qwen3-32B
@@
   megatron_cfg:
     enabled: true
-    tensor_model_parallel_size: 4
-    context_parallel_size: 1
+    tensor_model_parallel_size: 2
+    pipeline_model_parallel_size: 2
+    context_parallel_size: 2

12-17: Ensure policy.model_name is set to Qwen3‑1.7B‑Base.

The filename encodes “to‑1.7b‑base”, but policy.model_name isn’t set here. If it isn’t provided by defaults, set it explicitly to avoid composing the wrong model.

I can draft a quick OmegaConf loader script to print the composed policy/teacher names.

Also applies to: 25-27

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a06941c and 6f686d5.

📒 Files selected for processing (13)
  • README.md (0 hunks)
  • examples/configs/distillation_math.yaml (2 hunks)
  • examples/configs/distillation_math_megatron.yaml (1 hunks)
  • examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.yaml (1 hunks)
  • nemo_rl/algorithms/distillation.py (26 hunks)
  • nemo_rl/models/policy/megatron_policy_worker.py (2 hunks)
  • tests/functional/L1_Functional_Tests_GPU.sh (1 hunks)
  • tests/functional/distillation_megatron.sh (1 hunks)
  • tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh (1 hunks)
  • tests/test_suites/nightly.txt (1 hunks)
  • tests/unit/algorithms/test_distillation.py (4 hunks)
  • tests/unit/models/policy/test_megatron_worker.py (1 hunks)
  • tests/unit/test_recipes_and_test_suites.py (2 hunks)
💤 Files with no reviewable changes (1)
  • README.md
🧰 Additional context used
📓 Path-based instructions (11)
examples/configs/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

examples/configs/*.yaml: Exemplar configs under examples/configs/.yaml must include documented defaults
When adding a new config key, reflect its recommended default in exemplar YAMLs under examples/configs/
.yaml

Files:

  • examples/configs/distillation_math.yaml
  • examples/configs/distillation_math_megatron.yaml
**/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.sh: Follow the Google Shell Style Guide for all shell scripts
Use uv run to execute Python scripts in shell/driver scripts instead of activating virtualenvs and calling python directly
Add the NVIDIA copyright header (with current year) at the top of all shell scripts, excluding tests/ and test-only scripts

Files:

  • tests/functional/L1_Functional_Tests_GPU.sh
  • tests/functional/distillation_megatron.sh
  • tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh
tests/test_suites/nightly.txt

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Append the new driver script path (relative to tests/test_suites/) to tests/test_suites/nightly.txt

Files:

  • tests/test_suites/nightly.txt
tests/test_suites/**

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Place driver shell scripts and common.env under tests/test_suites// and list nightly tests in tests/test_suites/nightly.txt

Files:

  • tests/test_suites/nightly.txt
  • tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

  • nemo_rl/algorithms/distillation.py
  • tests/unit/test_recipes_and_test_suites.py
  • tests/unit/models/policy/test_megatron_worker.py
  • tests/unit/algorithms/test_distillation.py
  • nemo_rl/models/policy/megatron_policy_worker.py
nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

  • nemo_rl/algorithms/distillation.py
  • nemo_rl/models/policy/megatron_policy_worker.py
tests/test_suites/llm/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

LLM driver script filenames must mirror the YAML base name and follow the same pattern with .sh extension

Files:

  • tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh
examples/configs/recipes/**/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

examples/configs/recipes/**/*.yaml: Recipe YAMLs under examples/configs/recipes/** are runnable snapshots and may omit documentation
When adding support for a new model, add a recipe YAML under examples/configs/recipes/ in the appropriate domain (llm/ or vlm/) with the correct name

Files:

  • examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.yaml
examples/configs/recipes/llm/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

LLM recipe YAML filenames must follow: --ng-[-modifiers][-long][.vN].yaml

Files:

  • examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.yaml
examples/configs/recipes/**/*.{yaml,sh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Known exception: Deepscaler recipes may encode context length in place of the cluster tuple (e.g., grpo-deepscaler-1.5b-8K.*); allowed but document intended hardware in the script

Files:

  • examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.yaml
examples/configs/recipes/**

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Place recipe YAMLs under examples/configs/recipes//

Files:

  • examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.yaml
🧠 Learnings (2)
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/nightly.txt : Append the new driver script path (relative to tests/test_suites/) to tests/test_suites/nightly.txt

Applied to files:

  • tests/test_suites/nightly.txt
📚 Learning: 2025-09-18T14:20:36.297Z
Learnt from: zpqiu
PR: NVIDIA-NeMo/RL#1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.yaml:113-120
Timestamp: 2025-09-18T14:20:36.297Z
Learning: In distillation workflows, the teacher policy does not perform generation - it only does inference/logprob computation on sequences generated by the student policy. Therefore, teacher generation configuration mismatches (like vLLM tensor parallelism settings) and colocation concerns are not relevant.

Applied to files:

  • nemo_rl/algorithms/distillation.py
🧬 Code graph analysis (5)
nemo_rl/algorithms/distillation.py (6)
nemo_rl/utils/logger.py (1)
  • log_batched_dict_as_jsonl (893-917)
nemo_rl/models/policy/megatron_policy_worker.py (5)
  • offload_after_refit (2121-2143)
  • prepare_for_lp_inference (2046-2049)
  • get_topk_logits (1409-1699)
  • prepare_for_training (2051-2074)
  • train (873-1126)
nemo_rl/models/policy/dtensor_policy_worker.py (5)
  • offload_after_refit (1880-1903)
  • prepare_for_lp_inference (1831-1838)
  • get_topk_logits (1327-1622)
  • prepare_for_training (1841-1862)
  • train (529-895)
nemo_rl/models/policy/interfaces.py (4)
  • offload_after_refit (153-154)
  • get_topk_logits (84-95)
  • prepare_for_training (125-126)
  • train (98-115)
nemo_rl/models/policy/lm_policy.py (6)
  • offload_after_refit (740-743)
  • Policy (59-806)
  • prepare_for_lp_inference (638-642)
  • get_topk_logits (380-444)
  • prepare_for_training (633-636)
  • train (446-539)
nemo_rl/distributed/batched_data_dict.py (2)
  • BatchedDataDict (75-860)
  • size (814-823)
tests/unit/test_recipes_and_test_suites.py (1)
tests/unit/conftest.py (1)
  • tracker (221-252)
tests/unit/models/policy/test_megatron_worker.py (7)
nemo_rl/distributed/virtual_cluster.py (2)
  • RayVirtualCluster (177-435)
  • shutdown (407-426)
nemo_rl/algorithms/utils.py (1)
  • get_tokenizer (157-288)
nemo_rl/models/generation/__init__.py (1)
  • configure_generation_config (24-45)
nemo_rl/models/policy/lm_policy.py (3)
  • Policy (59-806)
  • shutdown (780-787)
  • get_topk_logits (380-444)
nemo_rl/distributed/batched_data_dict.py (2)
  • to (825-832)
  • BatchedDataDict (75-860)
nemo_rl/models/policy/megatron_policy_worker.py (2)
  • shutdown (2286-2288)
  • get_topk_logits (1409-1699)
tests/unit/conftest.py (1)
  • tiny_qwen2_model_path (514-538)
tests/unit/algorithms/test_distillation.py (5)
nemo_rl/models/policy/megatron_policy_worker.py (1)
  • offload_after_refit (2121-2143)
nemo_rl/models/policy/dtensor_policy_worker.py (1)
  • offload_after_refit (1880-1903)
nemo_rl/models/policy/interfaces.py (1)
  • offload_after_refit (153-154)
nemo_rl/models/policy/dtensor_policy_worker_v2.py (1)
  • offload_after_refit (1841-1864)
nemo_rl/models/policy/lm_policy.py (1)
  • offload_after_refit (740-743)
nemo_rl/models/policy/megatron_policy_worker.py (2)
nemo_rl/distributed/model_utils.py (2)
  • allgather_cp_sharded_tensor (706-709)
  • distributed_vocab_topk (829-896)
nemo_rl/distributed/batched_data_dict.py (4)
  • BatchedDataDict (75-860)
  • to (825-832)
  • size (814-823)
  • from_batches (102-172)
🪛 Ruff (0.14.1)
tests/unit/models/policy/test_megatron_worker.py

1413-1413: Do not catch blind exception: Exception

(BLE001)


1461-1461: Unpacked variable cluster is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

nemo_rl/models/policy/megatron_policy_worker.py

1469-1469: Local variable pp_size is assigned to but never used

Remove assignment to unused variable pp_size

(F841)


1477-1477: Unpacked variable input_ids_unpacked is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


1480-1480: Unpacked variable cu_seqlens is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


1601-1603: Avoid specifying long messages outside the exception class

(TRY003)

🪛 Shellcheck (0.11.0)
tests/functional/distillation_megatron.sh

[error] 52-52: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 29-29: Double quote array expansions to avoid re-splitting elements.

(SC2068)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Lint check
  • GitHub Check: Lint check
  • GitHub Check: Lint check
  • GitHub Check: Post automodel integration comment / Comment on PR
  • GitHub Check: Post submodule check comment / Comment on PR
🔇 Additional comments (13)
nemo_rl/algorithms/distillation.py (5)

193-214: DTensor SP + packing guard

Assertion guarding SP+packing with DTensor is appropriate; thanks for the explicit message and link.


377-384: Megatron train_iters alignment across teacher/student

Setting train_iters=min(max_steps, max_epochs*len(dataloader)) for both teacher and student is correct and avoids scheduler mismatch.

If resuming from checkpoint with a changed dataset length, confirm train_iters remains consistent with the original run to preserve scheduler behavior.

Also applies to: 429-436


525-544: Epoch/step counters wired correctly

current_epoch/current_step/total_steps pulled from save-state align with the new training loop.


708-711: Training loop/CKPT/metrics flow

  • is_last_step logic and periodic validation look correct.
  • Save-state increments (+1) before persisting avoids off‑by‑one on resume.
  • Metrics logged at step total_steps+1 are consistent with counters.

Also applies to: 714-736, 779-788, 802-808, 893-895, 909-912


924-937: Validation messaging improvements

Helpful user feedback on skipped validation and timing; no behavior change.

Also applies to: 1017-1033

nemo_rl/models/policy/megatron_policy_worker.py (1)

103-107: New utilities imported for TP/CP top‑k

Appropriate dependencies for distributed top‑k and CP allgather.

tests/functional/L1_Functional_Tests_GPU.sh (1)

33-33: Include Megatron distillation functional test

Good addition; consistent with uv-run usage across the suite.

tests/test_suites/nightly.txt (1)

95-96: Verification confirms nightly entry is correct

All checks passed:

  • Script file exists and is executable
  • Entry is present in nightly.txt
  • Path format matches existing entries
tests/functional/distillation_megatron.sh (1)

20-27: Harden script: quote args/paths and guard cd (SC2068/SC2164).

Prevents re-splitting and failing silently on directory changes.

-rm -rf $EXP_DIR $LOG_DIR
-mkdir -p $EXP_DIR $LOG_DIR
+rm -rf "$EXP_DIR" "$LOG_DIR"
+mkdir -p "$EXP_DIR" "$LOG_DIR"

-cd $PROJECT_ROOT
+cd "$PROJECT_ROOT" || exit 1
 uv run coverage run -a --data-file=$PROJECT_ROOT/tests/.coverage --source=$PROJECT_ROOT/nemo_rl \
-    $PROJECT_ROOT/examples/run_distillation_math.py \
-    --config $PROJECT_ROOT/examples/configs/distillation_math_megatron.yaml \
+    "$PROJECT_ROOT/examples/run_distillation_math.py" \
+    --config "$PROJECT_ROOT/examples/configs/distillation_math_megatron.yaml" \
     policy.model_name=Qwen/Qwen3-0.6B-Base \
     teacher.model_name=Qwen/Qwen3-0.6B \
     cluster.gpus_per_node=2 \
@@
-    logger.log_dir=$LOG_DIR \
+    logger.log_dir="$LOG_DIR" \
@@
-    checkpointing.checkpoint_dir=/tmp/distillation_checkpoints \
-    $@ \
-    2>&1 | tee $RUN_LOG
+    checkpointing.checkpoint_dir=/tmp/distillation_checkpoints \
+    "$@" \
+    2>&1 | tee "$RUN_LOG"

-uv run tests/json_dump_tb_logs.py $LOG_DIR --output_path $JSON_METRICS
+uv run tests/json_dump_tb_logs.py "$LOG_DIR" --output_path "$JSON_METRICS"

-uv run tests/check_metrics.py $JSON_METRICS \
+uv run tests/check_metrics.py "$JSON_METRICS" \
   'data["train/loss"]["3"] < 1.0'

Also applies to: 23-53, 55-58

⛔ Skipped due to learnings
Learnt from: zpqiu
PR: NVIDIA-NeMo/RL#1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30
Timestamp: 2025-10-12T14:46:55.513Z
Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.
tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh (1)

16-31: Quote arguments and guard cd (SC2068, SC2164).

Prevents word-splitting and failure-on-cd issues.

-cd $PROJECT_ROOT
+cd "$PROJECT_ROOT" || exit 1
 uv run examples/run_distillation_math.py \
-    --config $CONFIG_PATH \
-    distillation.max_num_steps=$MAX_STEPS \
+    --config "$CONFIG_PATH" \
+    distillation.max_num_steps="$MAX_STEPS" \
     distillation.val_period=20 \
-    logger.log_dir=$LOG_DIR \
+    logger.log_dir="$LOG_DIR" \
     logger.wandb_enabled=True \
     logger.wandb.project=nemo-rl-distillation \
-    logger.wandb.name=$EXP_NAME \
+    logger.wandb.name="$EXP_NAME" \
     logger.monitor_gpus=True \
     logger.tensorboard_enabled=True \
     checkpointing.enabled=True \
-    checkpointing.checkpoint_dir=$CKPT_DIR \
-    $@ \
-    2>&1 | tee $RUN_LOG
+    checkpointing.checkpoint_dir="$CKPT_DIR" \
+    "$@" \
+    2>&1 | tee "$RUN_LOG"

-uv run tests/json_dump_tb_logs.py $LOG_DIR --output_path $JSON_METRICS
+uv run tests/json_dump_tb_logs.py "$LOG_DIR" --output_path "$JSON_METRICS"

-if [[ $(jq 'to_entries | .[] | select(.key == "train/loss") | .value | keys | map(tonumber) | max' $JSON_METRICS) -ge $MAX_STEPS ]]; then
-    uv run tests/check_metrics.py $JSON_METRICS \
+if [[ $(jq 'to_entries | .[] | select(.key == "train/loss") | .value | keys | map(tonumber) | max' "$JSON_METRICS") -ge "$MAX_STEPS" ]]; then
+    uv run tests/check_metrics.py "$JSON_METRICS" \
         'data["train/loss"]["1"] < 1.5' \
         'data["train/loss"]["10"] < 0.5' \
         'max(data["ray/node.0.gpu.0.mem_gb"]) < 75' \
         'mean(data["timing/train/total_step_time"], -6, -1) < 500'
 fi

Also applies to: 33-42

⛔ Skipped due to learnings
Learnt from: zpqiu
PR: NVIDIA-NeMo/RL#1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30
Timestamp: 2025-10-12T14:46:55.513Z
Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.
examples/configs/distillation_math.yaml (1)

7-7: Documented defaults for new config keys are complete per guidelines.

The max_num_epochs: 10 parameter and entire megatron_cfg structure include explicit defaults. This satisfies the requirement that exemplar configs under examples/configs/*.yaml document recommended defaults for new config keys.

Also applies to: 84-150

examples/configs/distillation_math_megatron.yaml (2)

38-104: MEGATRON_BASE configuration inherits and properly enables Megatron backend.

The megatron_cfg block is well-defined with all necessary settings for Megatron distributed training, context parallelism, and MoE routing. The enabled: true flag (line 39) correctly activates the Megatron backend, in contrast to the base config's enabled: false (distillation_math.yaml:85).


1-1: Disregard this review comment—the config is correct for Megatron.

The megatron code path does not use policy.optimizer and policy.scheduler. When megatron_cfg.enabled=true, the training code instantiates ConfigContainer with optimizer and scheduler pulled from megatron_cfg (lines 656–693 of megatron_policy_worker.py), which are explicitly defined in distillation_math_megatron.yaml (lines 54–96). The missing policy.optimizer and policy.scheduler are not used and therefore not required. The configuration is correct and functional.

Likely an incorrect or invalid review comment.

Comment on lines +686 to 697
print("▶ Preparing for teacher logprob inference...", flush=True)
with timer.time("teacher_logprob_inference_prep"):
teacher_policy.prepare_for_lp_inference()

print("▶ Computing teacher logprobs...")
print("▶ Computing teacher logprobs...", flush=True)
with timer.time("teacher_logprob_inference"):
teacher_topk = teacher_policy.get_topk_logits(
train_data, k=master_config["distillation"]["topk_logits_k"]
)
train_data["teacher_topk_logits"] = teacher_topk["topk_logits"]
train_data["teacher_topk_indices"] = teacher_topk["topk_indices"]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix log message: it’s top‑k logits, not logprobs

These prints say “logprob” but the code calls get_topk_logits. Adjust to avoid confusion.

-                print("▶ Preparing for teacher logprob inference...", flush=True)
+                print("▶ Preparing for teacher top‑k inference...", flush=True)
@@
-                print("▶ Computing teacher logprobs...", flush=True)
+                print("▶ Computing teacher top‑k logits...", flush=True)

Also applies to: 690-695

🤖 Prompt for AI Agents
In nemo_rl/algorithms/distillation.py around lines 686-697 (and similarly
690-695), the printed messages incorrectly refer to "logprob" while the code is
computing top-k logits via get_topk_logits; update the print statements to
accurately say "top-k logits" (e.g., "Preparing for teacher top-k logits
inference..." and "Computing teacher top-k logits...") so logs match the
operation and avoid confusion.

@terrykong
Copy link
Collaborator

@zpqiu to take a look at test failure

@zpqiu zpqiu added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Oct 28, 2025
@zpqiu zpqiu added Run CICD and removed Run CICD labels Oct 28, 2025
@terrykong terrykong removed the CI:L1 Run doctests, unit tests, and functional tests label Oct 28, 2025
@terrykong terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Oct 28, 2025
@terrykong terrykong merged commit 85a87a0 into r0.4.0 Oct 29, 2025
40 of 41 checks passed
@terrykong terrykong deleted the cherry-pick-1324-r0.4.0 branch October 29, 2025 05:04
terrykong pushed a commit that referenced this pull request Nov 19, 2025
…to `r0.4.0` (#1398)

Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
Signed-off-by: alexchiu <qiuzhaopeng@foxmail.com>
Signed-off-by: alexchiu <alexq@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: alexchiu <alexq@nvidia.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick CI:L1 Run doctests, unit tests, and functional tests Run CICD

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants