Skip to content

cp: fix: mcore generation config restored in nightly test (1720) into r0.5.0#1740

Merged
yuki-97 merged 1 commit intor0.5.0from
cherry-pick-1720-r0.5.0
Jan 8, 2026
Merged

cp: fix: mcore generation config restored in nightly test (1720) into r0.5.0#1740
yuki-97 merged 1 commit intor0.5.0from
cherry-pick-1720-r0.5.0

Conversation

@chtruong814
Copy link
Copy Markdown
Contributor

@chtruong814 chtruong814 commented Jan 8, 2026

beep boop [🤖]: Hi @terrykong 👋,

we've cherry picked #1720 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

  • Configuration Updates

    • Introduced new Megatron generation configuration options including CUDA graph settings, buffer management, prefill chunking, and memory optimization parameters to supported generation workflows.
  • Tests

    • Updated test configurations with new Megatron generation parameters and adjusted performance thresholds.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
@chtruong814 chtruong814 requested a review from a team as a code owner January 8, 2026 09:13
@chtruong814 chtruong814 requested a review from terrykong January 8, 2026 09:13
@chtruong814 chtruong814 requested review from a team as code owners January 8, 2026 09:13
@yuki-97 yuki-97 added the CI:L1 Run doctests, unit tests, and functional tests label Jan 8, 2026
@yuki-97 yuki-97 enabled auto-merge (squash) January 8, 2026 09:14
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 8, 2026

📝 Walkthrough

Walkthrough

This PR introduces a MegatronGenerationConfig TypedDict to formally define generation configuration parameters and updates configuration files and policy worker code to use these parameters, including CUDA graph and prefill chunking settings.

Changes

Cohort / File(s) Summary
Configuration files
examples/configs/grpo_math_1B.yaml, examples/configs/grpo_math_1B_megatron.yaml
Added mcore_generation_config block with CUDA graph settings (buffer size, graph count, block size, prefill chunking) and updated inline documentation for max_tokens
Policy worker implementation
nemo_rl/models/policy/workers/megatron_policy_worker.py
Added MegatronGenerationConfig TypedDict with generation parameters; changed config access from .get() with defaults to direct dictionary indexing, making all keys explicitly required
Test configurations
tests/unit/models/policy/test_megatron_worker.py, tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.sh
Added Megatron generation configuration options to test mcore_generation_config; increased timing metric threshold and added documentation comment

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

r0.5.0, mcore

Suggested reviewers

  • terrykong
🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR contains major changes including new mcore_generation_config parameters, TypedDict addition, and 67% timing threshold increase, but PR description lacks technical justification, test results, performance data, or convergence analysis. Update PR description to reference original PR #1720's test results, justify the timing threshold increase with performance data, confirm no convergence regressions, and validate new TypedDict type enforcement in testing.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title indicates a cherry-pick of PR #1720 into r0.5.0, which aligns with the PR objectives and the substantive changes across multiple files related to mcore generation config restoration.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e576a6 and 312c07c.

📒 Files selected for processing (5)
  • examples/configs/grpo_math_1B.yaml
  • examples/configs/grpo_math_1B_megatron.yaml
  • nemo_rl/models/policy/workers/megatron_policy_worker.py
  • tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.sh
  • tests/unit/models/policy/test_megatron_worker.py
🧰 Additional context used
📓 Path-based instructions (6)
!(**/tests/**|**/test_*.py|**/test_*.sh)

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year

Files:

  • examples/configs/grpo_math_1B_megatron.yaml
  • examples/configs/grpo_math_1B.yaml
  • tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.sh
  • nemo_rl/models/policy/workers/megatron_policy_worker.py
  • tests/unit/models/policy/test_megatron_worker.py
**/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.sh: Use uv run instead of python to execute scripts
Follow the Google Shell Style Guide for shell scripts

Files:

  • tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.sh
tests/test_suites/**/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

tests/test_suites/**/*.sh: When adding support for a new model, create a corresponding driver shell script under tests/test_suites/ in the matching domain
Driver shell scripts should match the YAML base name with .sh extension and invoke training entrypoint with uv run

Files:

  • tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.sh
**/*.{py,sh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

The NVIDIA copyright header should appear at the top of all Python files and shell scripts (excluding tests)

Files:

  • tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.sh
  • nemo_rl/models/policy/workers/megatron_policy_worker.py
  • tests/unit/models/policy/test_megatron_worker.py
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Conform code to Python 3.12+
Indent code with 4 spaces. Do not use tabs
Use snake_case for file names
Use PascalCase for class names
Use snake_case for function and method names
Use snake_case for local variables
Prefix variable names that start with a number with 'k' (e.g., k_99th_percentile)
Use upper snake_case with 'G' prefix for global variables (e.g., G_MY_GLOBAL)
Use upper snake_case for constants
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
Prefer docstrings over comments for interfaces that may be used outside a file
Reserve comments for code within a function or interfaces that are local to a file
If a piece of code is commented out, include a comment describing its usage and why it's commented out. Remove debug comments before merging
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx
Avoid using reflection when functionality can be easily achieved without reflection
When using try-except blocks, limit the except clause to the smallest set of specific errors possible
When using try-except blocks for duck-typing, keep the body of the try as small as possible and use the else block for logic
YAML is the single source of truth for configuration defaults. Do not set non-None defaults in code for configuration values
For required configuration attributes, access config directly and expect presence (e.g., policy_cfg['precision']) without hidden defaults
Use typing.NotRequired to mark optional attributes in TypedDict for configuration
When adding a new config key to a TypedDict subclass, document the key's purpose, valid values/types, and recommended default, and reflect the default in exemplar YAMLs under examples/configs/*.yaml
Follow the Google Python Style Guide for Python code

Files:

  • nemo_rl/models/policy/workers/megatron_policy_worker.py
  • tests/unit/models/policy/test_megatron_worker.py
nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

For any source file under nemo_rl/*.py that defines a class or function decorated with @ray.remote, add a coverage pragma (# pragma: no cover) because these run in separate Ray processes

Files:

  • nemo_rl/models/policy/workers/megatron_policy_worker.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to **/*.py : When adding a new config key to a TypedDict subclass, document the key's purpose, valid values/types, and recommended default, and reflect the default in exemplar YAMLs under examples/configs/*.yaml
📚 Learning: 2025-09-19T03:00:58.662Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:85-101
Timestamp: 2025-09-19T03:00:58.662Z
Learning: In distillation and GRPO configurations, max_new_tokens is intentionally set to the full context window (max_total_sequence_length) for consistency across the codebase. Overflow cases when prompt + generation tokens exceed max_model_len are handled by safeguards implemented in vllm_worker.py.

Applied to files:

  • examples/configs/grpo_math_1B_megatron.yaml
  • examples/configs/grpo_math_1B.yaml
📚 Learning: 2025-09-19T03:08:11.537Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-4b-instruct-2n8g-fsdp2tp2-seqpack.v1.yaml:85-101
Timestamp: 2025-09-19T03:08:11.537Z
Learning: In math reasoning distillation tasks, max_new_tokens should be set to the full context window because prompts are typically much shorter than outputs, which require detailed step-by-step reasoning chains. Reducing max_new_tokens could prevent models from outputting complete answers, negatively impacting validation accuracy calculations.

Applied to files:

  • examples/configs/grpo_math_1B_megatron.yaml
📚 Learning: 2025-10-12T14:46:57.171Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:6-11
Timestamp: 2025-10-12T14:46:57.171Z
Learning: Test scripts in tests/test_suites/llm/ follow a standard configuration pattern that includes NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS (calculated as `$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))`), and NUM_MINUTES. These variables are part of the test infrastructure's standard interface and should not be flagged as unused even if not directly referenced within the individual script, as they are consumed by external launch tooling or common.env.

Applied to files:

  • tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: pre-flight
  • GitHub Check: Lint check
  • GitHub Check: Lint check
  • GitHub Check: Lint check
  • GitHub Check: Post submodule check comment / Comment on PR
  • GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (6)
examples/configs/grpo_math_1B_megatron.yaml (1)

153-153: LGTM - Documentation improvement

The inline comment formatting and clarification improve readability without changing functionality.

examples/configs/grpo_math_1B.yaml (1)

219-227: LGTM - Well-documented new configuration block

The new mcore_generation_config block is properly documented with inline comments explaining each parameter's purpose. The defaults are clearly specified and align with the MegatronGenerationConfig TypedDict defined in the policy worker.

Based on learnings, this follows the coding guideline for documenting new config keys with purpose, valid values, and defaults in exemplar YAMLs.

tests/unit/models/policy/test_megatron_worker.py (1)

93-96: LGTM - Test configuration updated for new generation options

The test configuration now includes the new Megatron generation options (block_size_tokens, use_cuda_graphs_for_non_decode_steps, enable_chunked_prefill, unified_memory_level), ensuring tests exercise the complete MegatronGenerationConfig interface.

tests/test_suites/llm/grpo-llama3.2-1b-instruct-1n8g-megatron_generation.sh (1)

37-42: Verify the timing threshold increase is expected

The timing threshold for total_step_time increased from 10.5 to 17.5 (67% increase). While the comment explains the rationale (observed around 16), this is a substantial performance change.

Please confirm that this performance impact is expected with the new mcore_generation_config settings (CUDA graphs for non-decode steps, chunked prefill, etc.).

nemo_rl/models/policy/workers/megatron_policy_worker.py (2)

148-167: LGTM - Well-documented TypedDict for generation configuration

The MegatronGenerationConfig TypedDict is well-documented with inline comments explaining each field's purpose, valid values, and usage. This provides clear typing for the Megatron generation configuration.

Based on learnings, this follows the coding guideline to document config keys with their purpose and valid values.


1844-1859: No backward compatibility concern—all existing configurations already include the required keys

Verification confirms all configuration files in examples/configs/ that define mcore_generation_config already include all 8 required keys (buffer_size_gb, buffer_guaranteed_fraction, num_cuda_graphs, block_size_tokens, use_cuda_graphs_for_non_decode_steps, enable_chunked_prefill, unified_memory_level, max_tokens). The direct key access at lines 1844–1859 is appropriate and aligns with the coding guideline requiring configuration attributes to be accessed directly without defaults. No breaking change has been introduced.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@yuki-97 yuki-97 merged commit 075389b into r0.5.0 Jan 8, 2026
68 of 71 checks passed
@yuki-97 yuki-97 deleted the cherry-pick-1720-r0.5.0 branch January 8, 2026 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick CI:L1 Run doctests, unit tests, and functional tests Run CICD

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants