cp: `fix: mcore generation config restored in nightly test (1720)` into `r0.5.0` by chtruong814 · Pull Request #1740 · NVIDIA-NeMo/RL

chtruong814 · 2026-01-08T09:13:06Z

beep boop [🤖]: Hi @terrykong 👋,

we've cherry picked #1720 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

Configuration Updates
- Introduced new Megatron generation configuration options including CUDA graph settings, buffer management, prefill chunking, and memory optimization parameters to supported generation workflows.
Tests
- Updated test configurations with new Megatron generation parameters and adjusted performance thresholds.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

coderabbitai · 2026-01-08T09:18:12Z

📝 Walkthrough

Walkthrough

This PR introduces a MegatronGenerationConfig TypedDict to formally define generation configuration parameters and updates configuration files and policy worker code to use these parameters, including CUDA graph and prefill chunking settings.

Changes

Cohort / File(s)	Summary
Configuration files `examples/configs/grpo_math_1B.yaml`, `examples/configs/grpo_math_1B_megatron.yaml`	Added `mcore_generation_config` block with CUDA graph settings (buffer size, graph count, block size, prefill chunking) and updated inline documentation for max_tokens
Policy worker implementation `nemo_rl/models/policy/workers/megatron_policy_worker.py`	Added `MegatronGenerationConfig` TypedDict with generation parameters; changed config access from `.get()` with defaults to direct dictionary indexing, making all keys explicitly required
Test configurations `tests/unit/models/policy/test_megatron_worker.py`, `tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.sh`	Added Megatron generation configuration options to test mcore_generation_config; increased timing metric threshold and added documentation comment

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

build: Use dynamic engine for generate. #1502: Modifies megatron_policy_worker.py generate() method and configuration handling.
fix: mcore generation config restored in nightly test #1720: Updates mcore_generation_config in grpo_math_1B example configuration with CUDA graph tuning fields.
chore: use pydantic for yaml test validation #1382: Adds Megatron generation configuration typing and runtime validation in policy worker.

Suggested labels

r0.5.0, mcore

Suggested reviewers

terrykong

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR contains major changes including new mcore_generation_config parameters, TypedDict addition, and 67% timing threshold increase, but PR description lacks technical justification, test results, performance data, or convergence analysis.	Update PR description to reference original PR #1720's test results, justify the timing threshold increase with performance data, confirm no convergence regressions, and validate new TypedDict type enforcement in testing.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title indicates a cherry-pick of PR #1720 into r0.5.0, which aligns with the PR objectives and the substantive changes across multiple files related to mcore generation config restoration.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e576a6 and 312c07c.

📒 Files selected for processing (5)

examples/configs/grpo_math_1B.yaml
examples/configs/grpo_math_1B_megatron.yaml
nemo_rl/models/policy/workers/megatron_policy_worker.py
tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.sh
tests/unit/models/policy/test_megatron_worker.py

🧰 Additional context used

📓 Path-based instructions (6)

!(**/tests/**|**/test_*.py|**/test_*.sh)