feat: Add GPT-OSS support via mcore by ashors1 · Pull Request #1452 · NVIDIA-NeMo/RL

ashors1 · 2025-10-31T19:35:03Z

What does this PR do ?

~~NOTE: blocked on vllm-project/vllm#27722~~ -- Fix was merged

20B example run

uv run examples/run_grpo_math.py --config examples/configs/grpo_math_70B_megatron.yaml cluster.num_nodes=8 policy.megatron_cfg.expert_model_parallel_size=4 policy.megatron_cfg.tensor_model_parallel_size=4 policy.sequence_packing.enabled=False policy.model_name=openai/gpt-oss-20b policy.megatron_cfg.sequence_parallel=True loss_fn.use_importance_sampling_correction=True policy.megatron_cfg.moe_router_dtype=fp64 policy.megatron_cfg.moe_permute_fusion=True policy.dynamic_batching.enabled=False policy.megatron_cfg.pipeline_model_parallel_size=4

wandb: https://wandb.ai/nvidia/ashors-gpt-oss?nw=jkyx3u96d2

120B example run

uv run examples/run_grpo_math.py --config examples/configs/grpo_math_70B_megatron.yaml cluster.num_nodes=32 policy.megatron_cfg.expert_model_parallel_size=8 policy.megatron_cfg.tensor_model_parallel_size=8 policy.megatron_cfg.pipeline_model_parallel_size=4 policy.sequence_packing.enabled=False policy.model_name=openai/gpt-oss-120b policy.megatron_cfg.sequence_parallel=True loss_fn.use_importance_sampling_correction=True policy.megatron_cfg.moe_router_dtype=fp64 policy.megatron_cfg.moe_permute_fusion=True

wandb: https://wandb.ai/nvidia/ashors-gpt-oss?nw=x17i7zpgtap

Issues

closes #869

Usage

A ready-to-use config is provided for 20B. This config requires 8 nodes and can be used to reproduce the reward curve displayed in the above section.

uv run examples/run_grpo_math.py \
    --config examples/configs/recipes/llm/grpo-gptoss-20b-8n8g-megatron.yaml \
    logger.wandb_enabled=True \
    logger.wandb.project=gpt-oss-test \
    logger.wandb.name=20b

For 120B, please refer to the example command in the above section. Optionally enable wandb logging by adding the following command-line args:

logger.wandb_enabled=True logger.wandb.project=gpt-oss-test logger.wandb.name=120b

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

New Features
- Added support for GPT-OSS-20B model training with GRPO methodology.
- Introduced optimized model name handling for GPT-OSS variants.
- Added compatibility improvements for enhanced model execution.
Tests
- Added comprehensive test suite for GPT-OSS-20B training on multi-node GPU clusters with integrated metrics validation.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

github-actions · 2025-10-31T19:39:47Z

❌ Submodule Fast-Forward Check Failed

Check based on commit: 9c9711a (PR #1452 from ashors/gpt-oss-tot)

✅ Submodules that are properly updated:

Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

❌ Submodules that need attention:

Megatron-Bridge: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Megatron-Bridge/commits/62f4704b8d665ac4a8c318a809a070217caa8901/
CURRENT (PR #1452 from ashors/gpt-oss-tot): https://github.com/NVIDIA-NeMo/Megatron-Bridge/commits/7367429371012ca6aa4f78f56091b9dbb325c12d/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

github-actions · 2025-11-04T18:36:10Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 0f7e8e8 (PR #1452 from ashors/gpt-oss-tot)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)
Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

github-actions · 2025-11-21T04:24:01Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 46fcd7d (PR #1452 from ashors/gpt-oss-tot)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)
Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

github-actions · 2025-11-25T05:48:10Z

✅ Submodule Fast-Forward Check Results

Check based on commit: fec527e (PR #1452 from ashors/gpt-oss-tot)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)
Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

github-actions · 2025-11-25T06:08:56Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 71bc701 (PR #1452 from ashors/gpt-oss-tot)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)
Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

github-actions · 2025-11-25T16:59:03Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 63d10eb (PR #1452 from ashors/gpt-oss-tot)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)
Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

github-actions · 2025-11-25T17:07:35Z

✅ Submodule Fast-Forward Check Results

Check based on commit: bee06cf (PR #1452 from ashors/gpt-oss-tot)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)
Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

github-actions · 2025-11-25T19:47:25Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 8e7c503 (PR #1452 from ashors/gpt-oss-tot)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)
Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

github-actions · 2025-11-25T21:48:57Z

✅ Submodule Fast-Forward Check Results

Check based on commit: f3e880b (PR #1452 from ashors/gpt-oss-tot)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)
Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

github-actions · 2025-11-27T00:08:45Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 716f05a (PR #1452 from ashors/gpt-oss-tot)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)
Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

coderabbitai · 2025-12-04T00:42:55Z

📝 Walkthrough

Walkthrough

This PR adds GPT-OSS model support to NeMo RL with Megatron-LM integration. It updates submodule references to OSS-specific branches, introduces a new GRPO configuration file for 20B models on 8-node clusters, modifies vLLM and Megatron worker initialization to support GPT-OSS model naming conventions, applies a TransformerEngine import-time patch for Triton dtype compatibility, and adds corresponding test infrastructure.

Changes

Cohort / File(s)	Summary
Submodule Updates `.gitmodules`, `3rdparty/Megatron-Bridge-workspace/Megatron-Bridge`, `3rdparty/Megatron-LM-workspace/Megatron-LM`	Updated submodule branch references and commit pointers to GPT-OSS-specific branches (yuya/nemo-rl-use-dev→ashors/dev-with-gpt-oss for Megatron-LM; main→ashors/gpt-oss-tot for Megatron-Bridge)
GRPO Configuration `examples/configs/recipes/llm/grpo-gptoss-20b-8n8g-megatron.yaml`	New GRPO recipe for 20B GPT-OSS models with 64 prompts, 32 generations per prompt, Megatron tensor-parallel (8 experts, 4 tensor-parallel), vLLM generation backend, and 8-node/8-GPU cluster setup
vLLM Worker Enhancement `nemo_rl/models/generation/vllm/vllm_worker.py`	Added model name rewriting logic to convert `openai/gpt-oss-*` references to unsloth BF16 variants for Megatron export compatibility
Megatron Worker Enhancement `nemo_rl/models/policy/megatron_policy_worker.py`	Added module-level `_apply_transformer_engine_patch()` function that patches TransformerEngine's permutation.py at import time to fix Triton int dtype handling
Test Infrastructure `tests/test_suites/llm/grpo-gptoss-20b-8n8g-megatron.sh`, `tests/test_suites/release.txt`	New GRPO experiment test script with metrics validation (token_mult_prob_error, reward, step time thresholds) and test suite registry entry

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Megatron policy worker patch: Verify TransformerEngine file patching logic, guard conditions (ImportError/FileNotFoundError/AttributeError), and side effects at module import time
vLLM model name translation: Confirm model naming conversion logic correctly handles openai/gpt-oss variants and maps to unsloth BF16 equivalents
GRPO configuration parameters: Validate tensor-parallel strategy (8 experts, 4 tensor-model-parallel), batch sizes, sequence lengths, and cluster resource allocation
Test metrics thresholds: Ensure metric validation bounds (token_mult_prob_error < 1.05, reward > 0.60, step time quantiles < 210 ms) are appropriate for the model/hardware configuration

Possibly related PRs

PR #1334 — Modifies nemo_rl/models/generation/vllm/vllm_worker.py with vLLM initialization/logprobs handling changes
PR #1389 — Modifies nemo_rl/models/policy/megatron_policy_worker.py with Megatron tensor-parallel and collective averaging helpers
PR #1514 — Updates same Megatron submodule references and applies changes to nemo_rl/models/policy/megatron_policy_worker.py

Suggested labels

CI, Run CICD

Suggested reviewers

yaoyu-33
terrykong
chtruong814

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR introduces major GPT-OSS support feature but lacks documented test results, performance metrics, convergence data, or validation evidence. Pre-checks checklist uncompleted.	Provide actual training results, performance metrics, numerics validation, evidence of TransformerEngine patch correctness, and complete pre-checks checklist with local testing confirmation.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: Add GPT-OSS support via mcore' directly and clearly summarizes the main objective of the pull request, matching the core change of adding GPT-OSS support through mcore.
Linked Issues check	✅ Passed	The PR successfully implements Mcore support for GPT-OSS as required by issue #869, evidenced by submodule updates, model configuration, worker implementation, and comprehensive test additions.
Out of Scope Changes check	✅ Passed	All changes are directly related to GPT-OSS Mcore support: submodule references, configuration files, worker modifications for model compatibility, and test infrastructure.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ashors/gpt-oss-tot

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c371a9 and 716f05a.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (8)

.gitmodules (1 hunks)
3rdparty/Megatron-Bridge-workspace/Megatron-Bridge (1 hunks)
3rdparty/Megatron-LM-workspace/Megatron-LM (1 hunks)
examples/configs/recipes/llm/grpo-gptoss-20b-8n8g-megatron.yaml (1 hunks)
nemo_rl/models/generation/vllm/vllm_worker.py (1 hunks)
nemo_rl/models/policy/megatron_policy_worker.py (1 hunks)
tests/test_suites/llm/grpo-gptoss-20b-8n8g-megatron.sh (1 hunks)
tests/test_suites/release.txt (1 hunks)

🧰 Additional context used

📓 Path-based instructions (8)

!(**/tests/**|**/test_*.py|**/test_*.sh)

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year

Files:

3rdparty/Megatron-Bridge-workspace/Megatron-Bridge
3rdparty/Megatron-LM-workspace/Megatron-LM
tests/test_suites/release.txt
tests/test_suites/llm/grpo-gptoss-20b-8n8g-megatron.sh
examples/configs/recipes/llm/grpo-gptoss-20b-8n8g-megatron.yaml
nemo_rl/models/generation/vllm/vllm_worker.py
.gitmodules
nemo_rl/models/policy/megatron_policy_worker.py

**/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.sh: Use uv run instead of python to execute scripts
Follow the Google Shell Style Guide for shell scripts

Files:

tests/test_suites/llm/grpo-gptoss-20b-8n8g-megatron.sh

tests/test_suites/**/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

tests/test_suites/**/*.sh: When adding support for a new model, create a corresponding driver shell script under tests/test_suites/ in the matching domain
Driver shell scripts should match the YAML base name with .sh extension and invoke training entrypoint with uv run

Files:

tests/test_suites/llm/grpo-gptoss-20b-8n8g-megatron.sh

**/*.{py,sh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

The NVIDIA copyright header should appear at the top of all Python files and shell scripts (excluding tests)

Files:

tests/test_suites/llm/grpo-gptoss-20b-8n8g-megatron.sh
nemo_rl/models/generation/vllm/vllm_worker.py
nemo_rl/models/policy/megatron_policy_worker.py

examples/configs/recipes/**/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

When adding support for a new model, create a recipe YAML under examples/configs/recipes/ in the appropriate domain subdirectory (llm, vlm, etc.)

Files:

examples/configs/recipes/llm/grpo-gptoss-20b-8n8g-megatron.yaml

examples/configs/recipes/llm/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Recipe YAML files should follow the naming pattern: --ng-[-modifiers][-long][.vN].yaml for LLM recipes

Files:

examples/configs/recipes/llm/grpo-gptoss-20b-8n8g-megatron.yaml

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Conform code to Python 3.12+
Indent code with 4 spaces. Do not use tabs
Use snake_case for file names
Use PascalCase for class names
Use snake_case for function and method names
Use snake_case for local variables
Prefix variable names that start with a number with 'k' (e.g., k_99th_percentile)
Use upper snake_case with 'G' prefix for global variables (e.g., G_MY_GLOBAL)
Use upper snake_case for constants
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
Prefer docstrings over comments for interfaces that may be used outside a file
Reserve comments for code within a function or interfaces that are local to a file
If a piece of code is commented out, include a comment describing its usage and why it's commented out. Remove debug comments before merging
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx
Avoid using reflection when functionality can be easily achieved without reflection
When using try-except blocks, limit the except clause to the smallest set of specific errors possible
When using try-except blocks for duck-typing, keep the body of the try as small as possible and use the else block for logic
YAML is the single source of truth for configuration defaults. Do not set non-None defaults in code for configuration values
For required configuration attributes, access config directly and expect presence (e.g., policy_cfg['precision']) without hidden defaults
Use typing.NotRequired to mark optional attributes in TypedDict for configuration
When adding a new config key to a TypedDict subclass, document the key's purpose, valid values/types, and recommended default, and reflect the default in exemplar YAMLs under examples/configs/*.yaml
Follow the Google Python Style Guide for Python code

Files:

nemo_rl/models/generation/vllm/vllm_worker.py
nemo_rl/models/policy/megatron_policy_worker.py

nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

For any source file under nemo_rl/*.py that defines a class or function decorated with @ray.remote, add a coverage pragma (# pragma: no cover) because these run in separate Ray processes

Files:

nemo_rl/models/generation/vllm/vllm_worker.py
nemo_rl/models/policy/megatron_policy_worker.py

🧠 Learnings (6)

📚 Learning: 2025-10-12T14:46:57.171Z

Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:6-11
Timestamp: 2025-10-12T14:46:57.171Z
Learning: Test scripts in tests/test_suites/llm/ follow a standard configuration pattern that includes NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS (calculated as `$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))`), and NUM_MINUTES. These variables are part of the test infrastructure's standard interface and should not be flagged as unused even if not directly referenced within the individual script, as they are consumed by external launch tooling or common.env.

Applied to files:

tests/test_suites/llm/grpo-gptoss-20b-8n8g-megatron.sh

📚 Learning: 2025-10-12T14:46:55.513Z

Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30
Timestamp: 2025-10-12T14:46:55.513Z
Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.

Applied to files:

tests/test_suites/llm/grpo-gptoss-20b-8n8g-megatron.sh

📚 Learning: 2025-11-24T17:24:41.976Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to tests/test_suites/**/*.sh : Driver shell scripts should match the YAML base name with .sh extension and invoke training entrypoint with uv run

Applied to files:

tests/test_suites/llm/grpo-gptoss-20b-8n8g-megatron.sh

📚 Learning: 2025-11-24T17:24:41.976Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to examples/configs/recipes/llm/*.yaml : Recipe YAML files should follow the naming pattern: <algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>[-modifiers][-long][.vN].yaml for LLM recipes

Applied to files:

examples/configs/recipes/llm/grpo-gptoss-20b-8n8g-megatron.yaml

📚 Learning: 2025-11-24T17:24:41.976Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to examples/configs/recipes/vlm/*.yaml : Recipe YAML files should follow the naming pattern: vlm_<algo>-<model>-<nodes>n<gpus>g-<strategy>[-modifiers][.vN].yaml for VLM recipes

Applied to files:

examples/configs/recipes/llm/grpo-gptoss-20b-8n8g-megatron.yaml

📚 Learning: 2025-09-19T03:00:58.662Z

Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:85-101
Timestamp: 2025-09-19T03:00:58.662Z
Learning: In distillation and GRPO configurations, max_new_tokens is intentionally set to the full context window (max_total_sequence_length) for consistency across the codebase. Overflow cases when prompt + generation tokens exceed max_model_len are handled by safeguards implemented in vllm_worker.py.

Applied to files:

examples/configs/recipes/llm/grpo-gptoss-20b-8n8g-megatron.yaml

🪛 GitHub Actions: CICD NeMo RL

examples/configs/recipes/llm/grpo-gptoss-20b-8n8g-megatron.yaml

[error] 1-1: minimize-check-llm: YAML not minimized. Suggested fix: minimize the file in-place.

🪛 Ruff (0.14.7)

nemo_rl/models/policy/megatron_policy_worker.py

81-81: Do not catch blind exception: Exception

(BLE001)

🪛 Shellcheck (0.11.0)

tests/test_suites/llm/grpo-gptoss-20b-8n8g-megatron.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

🔇 Additional comments (5)

3rdparty/Megatron-LM-workspace/Megatron-LM (1)

1-1: Submodule pointer update only.

This file represents a git submodule reference update with no reviewable code logic. The AI summary indicates the PR adds GPT-OSS support through multiple implementation files (GRPO config, vLLM worker modifications, Megatron worker modifications, TransformerEngine patch, and test infrastructure), but only the submodule pointer is visible in this review context.

To provide a comprehensive review of the PR's changes, please include the following files mentioned in the AI summary:

examples/configs/recipes/llm/grpo-gptoss-20b-8n8g-megatron.yaml

nemo_rl/models/generation/vllm/vllm_worker.py

nemo_rl/models/policy/megatron_policy_worker.py

tests/test_suites/llm/grpo-gptoss-20b-8n8g-megatron.sh

.gitmodules

tests/test_suites/release.txt (1)

26-28: ✓ Test entry follows the established pattern.

The new GPT-OSS GRPO test entry is properly formatted and consistently integrated into the release test suite. The section header, script reference, and spacing align with existing entries.

3rdparty/Megatron-Bridge-workspace/Megatron-Bridge (1)

1-1: The submodule is correctly configured for the GPT-OSS branch.

The .gitmodules configuration confirms the Megatron-Bridge submodule is tracking the ashors/gpt-oss-tot branch as intended. The commit hash 1f1ba22275ab66433c6a61bcb93502432a91443f is a valid Git object properly registered as a submodule reference. No action is required.
tests/test_suites/llm/grpo-gptoss-20b-8n8g-megatron.sh (2)
1-40: Script structure and execution approach are sound.

The overall structure follows the established test suite pattern: proper environment sourcing, standard configuration variables (NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS, NUM_MINUTES), correct use of uv run for both main execution and post-processing, and metrics validation via jq. The metrics thresholds and evaluation logic align with the GRPO evaluation strategy. Minor fixes above aside, the script provides good coverage for validating the GPT-OSS GRPO configuration.

The SC2034 warnings (unused variables on lines 6, 9, 10) are expected and documented as false positives per established patterns—these variables are consumed by external test infrastructure and common.env sourcing, so no action is needed on those.

16-16: Add error handling to cd command.

Line 16 performs a directory change without error handling. If the directory does not exist or the change fails, the script continues executing commands in the wrong location, potentially causing cascading failures.

Apply this diff to add error handling:
-cd $PROJECT_ROOT
+cd $PROJECT_ROOT || exit 1
⛔ Skipped due to learnings
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30
Timestamp: 2025-10-12T14:46:55.513Z
Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:6-11
Timestamp: 2025-10-12T14:46:57.171Z
Learning: Test scripts in tests/test_suites/llm/ follow a standard configuration pattern that includes NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS (calculated as `$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))`), and NUM_MINUTES. These variables are part of the test infrastructure's standard interface and should not be flagged as unused even if not directly referenced within the individual script, as they are consumed by external launch tooling or common.env.

.gitmodules

examples/configs/recipes/llm/grpo-gptoss-20b-8n8g-megatron.yaml

nemo_rl/models/generation/vllm/vllm_worker.py

nemo_rl/models/policy/megatron_policy_worker.py

tests/test_suites/llm/grpo-gptoss-20b-8n8g-megatron.sh

github-actions · 2025-12-05T01:17:07Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 763c73f (PR #1452 from ashors/gpt-oss-tot)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)
Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

github-actions · 2025-12-05T17:27:53Z

✅ Submodule Fast-Forward Check Results

Check based on commit: d328112 (PR #1452 from ashors/gpt-oss-tot)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)
Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

github-actions · 2025-12-12T19:04:20Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 0ec5c16 (PR #1452 from ashors/gpt-oss-tot)

✅ Submodules that are properly updated:

Megatron-Bridge: ✅ PR branch is ahead of main branch (fast-forward)
Megatron-LM: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

github-actions · 2025-12-15T18:38:12Z

❌ Submodule Fast-Forward Check Failed

Check based on commit: 5249e11 (PR #1452 from ashors/gpt-oss-tot)

❌ Submodules that need attention:

Megatron-LM: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/terrykong/Megatron-LM/commits/25a62edf77b5130f888328ca8119d6a76117cf23/
CURRENT (PR #1452 from ashors/gpt-oss-tot): https://github.com/terrykong/Megatron-LM/commits/7933a950e879cd1d5af35840be86f547901dd9ed/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

Signed-off-by: ashors1 <ashors@nvidia.com>

Signed-off-by: Terry Kong <terryk@nvidia.com>

Signed-off-by: ashors1 <ashors@nvidia.com>

Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

Signed-off-by: ashors1 <ashors@nvidia.com>

github-actions bot added documentation Improvements or additions to documentation and removed documentation Improvements or additions to documentation labels Oct 31, 2025

ashors1 marked this pull request as ready for review December 4, 2025 00:35

ashors1 requested review from a team as code owners December 4, 2025 00:35

coderabbitai bot reviewed Dec 4, 2025

View reviewed changes

snowmanwwg mentioned this pull request Dec 5, 2025

Megatron-Bridge submodule version incompatible with GPT-OSS #1146

Closed

Add GPT-OSS Mcore support

68e62c4

Signed-off-by: ashors1 <ashors@nvidia.com>

ashors1 force-pushed the ashors/gpt-oss-tot branch from b30ddbe to 68e62c4 Compare December 17, 2025 00:57

terrykong enabled auto-merge (squash) December 17, 2025 01:17

more robust handling of quantized GPT-OSS base models

9e83eb4

Signed-off-by: ashors1 <ashors@nvidia.com>

terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Dec 17, 2025

terrykong had a problem deploying to nemo-ci December 17, 2025 06:20 — with GitHub Actions Error

lint

e5a4058

Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong approved these changes Dec 17, 2025

View reviewed changes

terrykong added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Dec 17, 2025

terrykong temporarily deployed to nemo-ci December 17, 2025 06:35 — with GitHub Actions Inactive

terrykong temporarily deployed to nemo-ci December 17, 2025 06:49 — with GitHub Actions Inactive

terrykong temporarily deployed to nemo-ci December 17, 2025 08:38 — with GitHub Actions Inactive

terrykong merged commit 441f745 into main Dec 17, 2025
40 of 41 checks passed

terrykong deleted the ashors/gpt-oss-tot branch December 17, 2025 09:43

ashors1 mentioned this pull request Dec 18, 2025

gpt-oss Mcore support #869

Closed

DeL-TaiseiOzaki pushed a commit to DeL-TaiseiOzaki/RL that referenced this pull request Jan 8, 2026

feat: Add GPT-OSS support via mcore (NVIDIA-NeMo#1452)

f40f350

Signed-off-by: ashors1 <ashors@nvidia.com>

parthmannan pushed a commit to parthmannan/RL that referenced this pull request Jan 15, 2026

feat: Add GPT-OSS support via mcore (NVIDIA-NeMo#1452)

285a329

Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>

yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 12, 2026

feat: Add GPT-OSS support via mcore (NVIDIA-NeMo#1452)

f8e7f0e

Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026

feat: Add GPT-OSS support via mcore (NVIDIA-NeMo#1452)

1fbc75d

Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 8, 2026

feat: Add GPT-OSS support via mcore (#1452)

b67493c

Signed-off-by: ashors1 <ashors@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 8, 2026

feat: Add GPT-OSS support via mcore (#1452)

70f4b44

Signed-off-by: ashors1 <ashors@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 9, 2026

feat: Add GPT-OSS support via mcore (#1452)

5299392

Signed-off-by: ashors1 <ashors@nvidia.com>

Conversation

ashors1 commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

github-actions bot commented Oct 31, 2025

❌ Submodule Fast-Forward Check Failed

✅ Submodules that are properly updated:

❌ Submodules that need attention:

Uh oh!

github-actions bot commented Nov 4, 2025

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

github-actions bot commented Nov 21, 2025

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

github-actions bot commented Nov 25, 2025

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

github-actions bot commented Nov 25, 2025

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

github-actions bot commented Nov 25, 2025

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

github-actions bot commented Nov 25, 2025

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

github-actions bot commented Nov 25, 2025

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

github-actions bot commented Nov 25, 2025

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

github-actions bot commented Nov 27, 2025

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

coderabbitai bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 5, 2025

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

github-actions bot commented Dec 5, 2025

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

github-actions bot commented Dec 12, 2025

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

ashors1 commented Oct 31, 2025 •

edited

Loading

coderabbitai bot commented Dec 4, 2025 •

edited

Loading