Skip to content

test: Perf recipe for v0.5#1667

Merged
terrykong merged 10 commits intomainfrom
guyueh/perf_recipe_for_v0.5
Dec 20, 2025
Merged

test: Perf recipe for v0.5#1667
terrykong merged 10 commits intomainfrom
guyueh/perf_recipe_for_v0.5

Conversation

@guyueh1
Copy link
Contributor

@guyueh1 guyueh1 commented Dec 19, 2025

What does this PR do ?

Add new performance tests for v0.5

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • New Features

    • Added performance test configurations for multiple LLM models (DeepSeek v3, LLaMA 3.1, Qwen3).
    • Introduced FP8 quantization support for select model configurations.
    • Added new performance test scripts for automated benchmarking.
  • Chores

    • Updated test suite inventory with new performance test entries.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
@guyueh1 guyueh1 requested review from a team as code owners December 19, 2025 21:31
@guyueh1 guyueh1 mentioned this pull request Dec 19, 2025
4 tasks
@guyueh1 guyueh1 added the r0.5.0 label Dec 19, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 19, 2025

📝 Walkthrough

Walkthrough

This PR adds new GRPO performance recipe configurations for multiple model architectures (DeepSeek v3, LLaMA 3.1, Qwen3) with various cluster sizes and FP8 quantization variants, along with corresponding test scripts. It also removes deprecated configuration options from an existing recipe and updates the test inventory.

Changes

Cohort / File(s) Summary
Legacy Configuration Updates
examples/configs/recipes/llm/grpo-moonlight-16ba3b-4n8g-megatron-fp8-e2e.yaml
Removes deprecated policy.sequence_packing (algorithm: modified_ffd) and generation.vllm_cfg.expert_parallel_size: 4 entries
DeepSeek v3 Performance Recipes
examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n4g.yaml, grpo-deepseek-v3-64n4g-async-1off.yaml, grpo-deepseek-v3-64n8g-fp8-async-1off.yaml
Adds new DeepSeek v3 configurations with pipeline/expert parallelism, vLLM tensor parallelism (16-32), and FP8 quantization settings. Variants cover 32/64 nodes with 4/8 GPUs per node.
LLaMA 3.1 Performance Recipes
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml, grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.yaml
Adds LLaMA 3.1 8B configurations with Megatron pipeline parallelism settings and FP8 variant with blockwise quantization and vLLM FP8 generation config.
Qwen3 Performance Recipes
examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n4g.yaml, grpo-qwen3-235b-32n4g-async-1off.yaml
Adds Qwen3-235b configurations with 4-way pipeline parallelism, 23 layers in first/last stages, and vLLM tensor parallelism. Covers 16/32 nodes with 4 GPUs per node.
DeepSeek v3 Test Scripts
tests/test_suites/llm/performance/grpo-deepseek-v3-32n4g.sh, grpo-deepseek-v3-64n4g-async-1off.sh, grpo-deepseek-v3-64n8g-fp8-async-1off.sh
Adds test harnesses for DeepSeek v3 GRPO runs with environment setup, model loading, TensorBoard conversion, and conditional metrics evaluation.
LLaMA 3.1 Test Scripts
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.sh
Adds test harness for LLaMA 3.1 FP8 async performance run with TensorBoard and metrics collection.
Qwen3 Test Scripts
tests/test_suites/llm/performance/grpo-qwen3-235b-16n4g.sh, grpo-qwen3-235b-32n4g-async-1off.sh
Adds test harnesses for Qwen3-235b GRPO performance runs with logging, W&B integration, and conditional metrics checks.
Test Inventory
tests/test_suites/performance.txt
Updates test suite inventory: removes one legacy entry, adds new GRPO performance tests for DeepSeek v3, LLaMA 3.1, and Qwen3 variants organized by H100 BF16 and GB200 BF16 sections with SYNC/ASYNC configurations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~20 minutes

  • Configuration consistency: Verify that all new YAML configs follow the same structure and parameter patterns across similar variants
  • Test script consistency: Check that shell scripts consistently implement environment setup, experiment execution, log conversion, and metrics evaluation patterns
  • DeepSeek v3 FP8 configs: Ensure FP8 settings (fp8_type: e4m3, blockwise recipe, NVTE_FP8_BLOCK_SCALING_FP32_SCALES) are correctly applied
  • Parallelism parameters: Verify pipeline/expert parallelism and layer distribution settings are appropriate for each model/node configuration
  • Test inventory alignment: Confirm that all new test scripts are properly registered in performance.txt

Possibly related PRs

Suggested labels

CI:L2, Run CICD

Suggested reviewers

  • terrykong

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR modifies existing configuration file removing options (sequence_packing, expert_parallel_size) but PR description lacks documentation of changes, regression testing, or performance impact analysis. Update PR description to document why configuration options were removed and include regression testing results or performance comparisons demonstrating no negative impact.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'test: Perf recipe for v0.5' is directly related to the main changes, which add new performance test configurations and scripts for v0.5, though it could be more specific about which models/configurations are covered.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch guyueh/perf_recipe_for_v0.5

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n4g-async-1off.yaml (1)

17-19: Consider aligning logger directory naming with checkpoint directory.

The checkpoint directory uses grpo-deepseek-v3-64n4g-async-1off but the logger directory and W&B run name use grpo-deepseek-v3-64n4g-async-32T32G-1off. If "32T32G" refers to trajectory/generation configuration rather than cluster topology, consider documenting this naming convention to avoid confusion.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 91658c8 and fedf770.

📒 Files selected for processing (15)
  • examples/configs/recipes/llm/grpo-moonlight-16ba3b-4n8g-megatron-fp8-e2e.yaml (0 hunks)
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n4g.yaml (1 hunks)
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n4g-async-1off.yaml (1 hunks)
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.yaml (1 hunks)
  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml (1 hunks)
  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.yaml (1 hunks)
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n4g.yaml (1 hunks)
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n4g-async-1off.yaml (1 hunks)
  • tests/test_suites/llm/performance/grpo-deepseek-v3-32n4g.sh (1 hunks)
  • tests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.sh (1 hunks)
  • tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.sh (1 hunks)
  • tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.sh (1 hunks)
  • tests/test_suites/llm/performance/grpo-qwen3-235b-16n4g.sh (1 hunks)
  • tests/test_suites/llm/performance/grpo-qwen3-235b-32n4g-async-1off.sh (1 hunks)
  • tests/test_suites/performance.txt (1 hunks)
💤 Files with no reviewable changes (1)
  • examples/configs/recipes/llm/grpo-moonlight-16ba3b-4n8g-megatron-fp8-e2e.yaml
🧰 Additional context used
📓 Path-based instructions (5)
examples/configs/recipes/**/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

When adding support for a new model, create a recipe YAML under examples/configs/recipes/ in the appropriate domain subdirectory (llm, vlm, etc.)

Files:

  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n4g-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n4g.yaml
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n4g.yaml
  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n4g-async-1off.yaml
!(**/tests/**|**/test_*.py|**/test_*.sh)

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year

Files:

  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n4g-async-1off.yaml
  • tests/test_suites/llm/performance/grpo-qwen3-235b-32n4g-async-1off.sh
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n4g.yaml
  • tests/test_suites/performance.txt
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n4g.yaml
  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.yaml
  • tests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.sh
  • tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.sh
  • tests/test_suites/llm/performance/grpo-deepseek-v3-32n4g.sh
  • tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.sh
  • tests/test_suites/llm/performance/grpo-qwen3-235b-16n4g.sh
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n4g-async-1off.yaml
**/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.sh: Use uv run instead of python to execute scripts
Follow the Google Shell Style Guide for shell scripts

Files:

  • tests/test_suites/llm/performance/grpo-qwen3-235b-32n4g-async-1off.sh
  • tests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.sh
  • tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.sh
  • tests/test_suites/llm/performance/grpo-deepseek-v3-32n4g.sh
  • tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.sh
  • tests/test_suites/llm/performance/grpo-qwen3-235b-16n4g.sh
tests/test_suites/**/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

tests/test_suites/**/*.sh: When adding support for a new model, create a corresponding driver shell script under tests/test_suites/ in the matching domain
Driver shell scripts should match the YAML base name with .sh extension and invoke training entrypoint with uv run

Files:

  • tests/test_suites/llm/performance/grpo-qwen3-235b-32n4g-async-1off.sh
  • tests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.sh
  • tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.sh
  • tests/test_suites/llm/performance/grpo-deepseek-v3-32n4g.sh
  • tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.sh
  • tests/test_suites/llm/performance/grpo-qwen3-235b-16n4g.sh
**/*.{py,sh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

The NVIDIA copyright header should appear at the top of all Python files and shell scripts (excluding tests)

Files:

  • tests/test_suites/llm/performance/grpo-qwen3-235b-32n4g-async-1off.sh
  • tests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.sh
  • tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.sh
  • tests/test_suites/llm/performance/grpo-deepseek-v3-32n4g.sh
  • tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.sh
  • tests/test_suites/llm/performance/grpo-qwen3-235b-16n4g.sh
🧠 Learnings (9)
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to examples/configs/recipes/llm/*.yaml : Recipe YAML files should follow the naming pattern: <algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>[-modifiers][-long][.vN].yaml for LLM recipes

Applied to files:

  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n4g-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n4g.yaml
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n4g.yaml
  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n4g-async-1off.yaml
📚 Learning: 2025-09-18T14:20:36.297Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.yaml:113-120
Timestamp: 2025-09-18T14:20:36.297Z
Learning: In distillation workflows, the teacher policy does not perform generation - it only does inference/logprob computation on sequences generated by the student policy. Therefore, teacher generation configuration mismatches (like vLLM tensor parallelism settings) and colocation concerns are not relevant.

Applied to files:

  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to examples/configs/recipes/vlm/*.yaml : Recipe YAML files should follow the naming pattern: vlm_<algo>-<model>-<nodes>n<gpus>g-<strategy>[-modifiers][.vN].yaml for VLM recipes

Applied to files:

  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n4g-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n4g.yaml
  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.yaml
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n4g-async-1off.yaml
📚 Learning: 2025-10-12T14:46:57.171Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:6-11
Timestamp: 2025-10-12T14:46:57.171Z
Learning: Test scripts in tests/test_suites/llm/ follow a standard configuration pattern that includes NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS (calculated as `$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))`), and NUM_MINUTES. These variables are part of the test infrastructure's standard interface and should not be flagged as unused even if not directly referenced within the individual script, as they are consumed by external launch tooling or common.env.

Applied to files:

  • tests/test_suites/llm/performance/grpo-qwen3-235b-32n4g-async-1off.sh
  • tests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.sh
  • tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.sh
  • tests/test_suites/llm/performance/grpo-deepseek-v3-32n4g.sh
  • tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.sh
  • tests/test_suites/llm/performance/grpo-qwen3-235b-16n4g.sh
📚 Learning: 2025-10-12T14:46:55.513Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30
Timestamp: 2025-10-12T14:46:55.513Z
Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.

Applied to files:

  • tests/test_suites/llm/performance/grpo-qwen3-235b-32n4g-async-1off.sh
  • tests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.sh
  • tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.sh
  • tests/test_suites/llm/performance/grpo-deepseek-v3-32n4g.sh
  • tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.sh
  • tests/test_suites/llm/performance/grpo-qwen3-235b-16n4g.sh
📚 Learning: 2025-09-19T07:28:29.887Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh:1-4
Timestamp: 2025-09-19T07:28:29.887Z
Learning: The NVIDIA-NeMo/RL project prefers to maintain consistent formatting across test scripts rather than applying individual bash hardening improvements like `set -euo pipefail` or proper quoting for sourcing files.

Applied to files:

  • tests/test_suites/llm/performance/grpo-qwen3-235b-32n4g-async-1off.sh
  • tests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.sh
  • tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.sh
  • tests/test_suites/llm/performance/grpo-deepseek-v3-32n4g.sh
  • tests/test_suites/llm/performance/grpo-qwen3-235b-16n4g.sh
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to tests/test_suites/**/*.sh : Driver shell scripts should match the YAML base name with .sh extension and invoke training entrypoint with uv run

Applied to files:

  • tests/test_suites/llm/performance/grpo-qwen3-235b-32n4g-async-1off.sh
  • tests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.sh
  • tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.sh
  • tests/test_suites/llm/performance/grpo-qwen3-235b-16n4g.sh
📚 Learning: 2025-11-24T17:24:47.707Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0
Timestamp: 2025-11-24T17:24:47.707Z
Learning: If a change could affect performance, the PR description should include before-and-after performance numbers, as well as the configuration and context in which they apply

Applied to files:

  • tests/test_suites/performance.txt
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to tests/test_suites/**/*.sh : When adding support for a new model, create a corresponding driver shell script under tests/test_suites/ in the matching domain

Applied to files:

  • tests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.sh
🪛 Shellcheck (0.11.0)
tests/test_suites/llm/performance/grpo-qwen3-235b-32n4g-async-1off.sh

[warning] 8-8: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 11-11: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 12-12: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 18-18: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 29-29: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.sh

[warning] 10-10: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 13-13: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 14-14: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 20-20: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 34-34: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.sh

[warning] 10-10: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 13-13: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 14-14: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 20-20: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 34-34: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-deepseek-v3-32n4g.sh

[warning] 10-10: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 13-13: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 14-14: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 20-20: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 34-34: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-qwen3-235b-16n4g.sh

[warning] 8-8: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 11-11: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 12-12: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 18-18: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 29-29: Double quote array expansions to avoid re-splitting elements.

(SC2068)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Lint check
  • GitHub Check: Lint check
  • GitHub Check: Post automodel integration comment / Comment on PR
  • GitHub Check: Post submodule check comment / Comment on PR
🔇 Additional comments (13)
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml (1)

12-13: LGTM!

The pipeline parallelism configuration is appropriate for an 8B model on a 2-node, 8-GPU-per-node cluster.

examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n4g.yaml (1)

1-20: LGTM!

The configuration is internally consistent with appropriate parallelism settings for a 235B model on a 16-node, 4-GPU-per-node cluster.

examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.yaml (1)

1-26: LGTM!

The FP8 quantization configuration follows best practices, including appropriate layer exclusions for MoE architectures.

tests/test_suites/performance.txt (1)

5-38: Well-organized test inventory structure.

The categorization by hardware platform (H100, GB200) and precision (BF16, FP8) with sync/async subsections makes the test suite structure clear and maintainable.

Note: The AI summary indicated that grpo-qwen3-30ba3b-24n8g-async-8off.sh was removed, but it appears at Line 22 under the "ASYNC many-off" section, suggesting it was reorganized rather than removed.

examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n4g-async-1off.yaml (1)

1-20: LGTM!

The configuration is internally consistent and appropriately scales the 16n4g configuration to 32 nodes.

examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.yaml (1)

1-20: LGTM!

The FP8 quantization configuration is appropriate for the LLaMA 3.1 8B model with consistent naming throughout.

examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n4g.yaml (1)

1-21: LGTM!

The configuration is internally consistent with appropriate parallelism settings for DeepSeek v3 on a 32-node, 4-GPU-per-node cluster.

tests/test_suites/llm/performance/grpo-qwen3-235b-32n4g-async-1off.sh (1)

1-40: LGTM! Script follows established test infrastructure patterns.

This test script correctly implements the standard performance test pattern for GRPO, including:

  • Proper use of uv run for Python invocations
  • Standard configuration variables (NUM_NODES, NUM_RUNS, NUM_MINUTES) consumed by external launch tooling
  • Consistent patterns for directory navigation and argument forwarding
  • TensorBoard log conversion and conditional metrics evaluation

Based on learnings, the shellcheck warnings about unused variables and unquoted expansions are expected and can be safely ignored.

tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.sh (1)

1-45: LGTM! DeepSeek v3 FP8 performance test correctly configured.

This script properly extends the standard test pattern with DeepSeek-specific configuration:

  • Allows custom HF checkpoint via NRL_DEEPSEEK_V3_HF_CKPT environment variable
  • Disables NVLS to prevent OOM issues
  • Correctly passes model name to both policy and tokenizer configuration
  • Enables TensorBoard logging for comprehensive observability
  • Uses uv run throughout as required

Based on learnings, the shellcheck warnings are expected for this test infrastructure pattern.

tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.sh (1)

1-39: LGTM! LLaMA 3.1 FP8 performance test follows standard pattern.

This script correctly implements the performance test pattern for the smaller 2-node LLaMA configuration:

  • Standard configuration variables for test infrastructure
  • Proper uv run usage throughout
  • TensorBoard and WandB integration enabled
  • Conditional metrics evaluation based on step completion

The simpler configuration without custom model name override is appropriate for this test scenario. Based on learnings, shellcheck warnings are expected.

tests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.sh (1)

1-45: LGTM! DeepSeek v3 64n4g async configuration properly implemented.

This script correctly implements the 64-node, 4 GPUs-per-node configuration:

  • Consistent with other DeepSeek v3 variants in this PR
  • Allows flexible checkpoint override via environment variable
  • Disables NVLS to prevent OOM issues
  • Proper TensorBoard and WandB integration
  • Uses uv run throughout as required

Based on learnings, the shellcheck warnings about unused variables and unquoted expansions are expected for this test infrastructure.

tests/test_suites/llm/performance/grpo-qwen3-235b-16n4g.sh (1)

1-40: LGTM! Qwen3-235b 16n4g performance test follows standard pattern.

This script correctly implements the standard performance test pattern:

  • Proper configuration for 16-node setup
  • Uses uv run throughout as required
  • Disables NVLS to prevent OOM
  • Includes TensorBoard log conversion and conditional metrics evaluation

Note: Unlike some other scripts in this PR (e.g., grpo-deepseek-v3-64n4g-async-1off.sh line 31), this script doesn't explicitly set logger.tensorboard_enabled=True. If TensorBoard logging is required for the log conversion at line 33 to work, verify that it's enabled by default in the configuration or common.env.

Based on learnings, the shellcheck warnings are expected.

tests/test_suites/llm/performance/grpo-deepseek-v3-32n4g.sh (1)

1-45: LGTM! DeepSeek v3 32n4g performance test correctly implemented.

This script properly implements the 32-node configuration:

  • Consistent with other DeepSeek v3 variants (64n4g, 64n8g-fp8)
  • Flexible checkpoint configuration via NRL_DEEPSEEK_V3_HF_CKPT
  • Disables NVLS to prevent OOM
  • Explicit TensorBoard and WandB integration
  • Proper uv run usage throughout

Based on learnings, the shellcheck warnings are expected as these variables are consumed by external launch tooling.

@guyueh1 guyueh1 added the CI:L0 Run doctests and unit tests label Dec 19, 2025
Guyue Huang and others added 3 commits December 19, 2025 15:41
Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com>
Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com>
@guyueh1 guyueh1 added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Dec 20, 2025
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
@guyueh1 guyueh1 added CI:L0 Run doctests and unit tests CI:L1 Run doctests, unit tests, and functional tests and removed CI:L0 Run doctests and unit tests labels Dec 20, 2025
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
@guyueh1 guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Dec 20, 2025
@terrykong terrykong enabled auto-merge (squash) December 20, 2025 05:57
@terrykong terrykong merged commit fab6234 into main Dec 20, 2025
40 of 41 checks passed
@terrykong terrykong deleted the guyueh/perf_recipe_for_v0.5 branch December 20, 2025 07:48
chtruong814 pushed a commit that referenced this pull request Dec 20, 2025
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com>
Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com>
Co-authored-by: Seonjin <sna@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
DeL-TaiseiOzaki pushed a commit to DeL-TaiseiOzaki/RL that referenced this pull request Jan 8, 2026
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com>
Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com>
Co-authored-by: Seonjin <sna@nvidia.com>
parthmannan pushed a commit to parthmannan/RL that referenced this pull request Jan 15, 2026
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com>
Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com>
Co-authored-by: Seonjin <sna@nvidia.com>
Signed-off-by: Parth Mannan <pmannan@nvidia.com>
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 12, 2026
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com>
Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com>
Co-authored-by: Seonjin <sna@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com>
Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com>
Co-authored-by: Seonjin <sna@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
seonjinn added a commit that referenced this pull request Mar 8, 2026
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com>
Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com>
Co-authored-by: Seonjin <sna@nvidia.com>
seonjinn added a commit that referenced this pull request Mar 8, 2026
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com>
Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com>
Co-authored-by: Seonjin <sna@nvidia.com>
seonjinn added a commit that referenced this pull request Mar 9, 2026
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com>
Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com>
Co-authored-by: Seonjin <sna@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L2 Run doctests, unit tests, functional tests, and convergence tests r0.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants