Conversation
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
📝 WalkthroughWalkthroughThis PR adds new GRPO performance recipe configurations for multiple model architectures (DeepSeek v3, LLaMA 3.1, Qwen3) with various cluster sizes and FP8 quantization variants, along with corresponding test scripts. It also removes deprecated configuration options from an existing recipe and updates the test inventory. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~20 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n4g-async-1off.yaml (1)
17-19: Consider aligning logger directory naming with checkpoint directory.The checkpoint directory uses
grpo-deepseek-v3-64n4g-async-1offbut the logger directory and W&B run name usegrpo-deepseek-v3-64n4g-async-32T32G-1off. If "32T32G" refers to trajectory/generation configuration rather than cluster topology, consider documenting this naming convention to avoid confusion.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
examples/configs/recipes/llm/grpo-moonlight-16ba3b-4n8g-megatron-fp8-e2e.yaml(0 hunks)examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n4g.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n4g-async-1off.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n4g.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n4g-async-1off.yaml(1 hunks)tests/test_suites/llm/performance/grpo-deepseek-v3-32n4g.sh(1 hunks)tests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.sh(1 hunks)tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.sh(1 hunks)tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.sh(1 hunks)tests/test_suites/llm/performance/grpo-qwen3-235b-16n4g.sh(1 hunks)tests/test_suites/llm/performance/grpo-qwen3-235b-32n4g-async-1off.sh(1 hunks)tests/test_suites/performance.txt(1 hunks)
💤 Files with no reviewable changes (1)
- examples/configs/recipes/llm/grpo-moonlight-16ba3b-4n8g-megatron-fp8-e2e.yaml
🧰 Additional context used
📓 Path-based instructions (5)
examples/configs/recipes/**/*.yaml
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
When adding support for a new model, create a recipe YAML under examples/configs/recipes/ in the appropriate domain subdirectory (llm, vlm, etc.)
Files:
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-235b-32n4g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-235b-16n4g.yamlexamples/configs/recipes/llm/performance/grpo-deepseek-v3-32n4g.yamlexamples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-deepseek-v3-64n4g-async-1off.yaml
!(**/tests/**|**/test_*.py|**/test_*.sh)
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year
Files:
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-235b-32n4g-async-1off.yamltests/test_suites/llm/performance/grpo-qwen3-235b-32n4g-async-1off.shexamples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-235b-16n4g.yamltests/test_suites/performance.txtexamples/configs/recipes/llm/performance/grpo-deepseek-v3-32n4g.yamlexamples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.yamltests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-32n4g.shtests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.shtests/test_suites/llm/performance/grpo-qwen3-235b-16n4g.shexamples/configs/recipes/llm/performance/grpo-deepseek-v3-64n4g-async-1off.yaml
**/*.sh
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.sh: Use uv run instead of python to execute scripts
Follow the Google Shell Style Guide for shell scripts
Files:
tests/test_suites/llm/performance/grpo-qwen3-235b-32n4g-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-32n4g.shtests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.shtests/test_suites/llm/performance/grpo-qwen3-235b-16n4g.sh
tests/test_suites/**/*.sh
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
tests/test_suites/**/*.sh: When adding support for a new model, create a corresponding driver shell script under tests/test_suites/ in the matching domain
Driver shell scripts should match the YAML base name with .sh extension and invoke training entrypoint with uv run
Files:
tests/test_suites/llm/performance/grpo-qwen3-235b-32n4g-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-32n4g.shtests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.shtests/test_suites/llm/performance/grpo-qwen3-235b-16n4g.sh
**/*.{py,sh}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
The NVIDIA copyright header should appear at the top of all Python files and shell scripts (excluding tests)
Files:
tests/test_suites/llm/performance/grpo-qwen3-235b-32n4g-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-32n4g.shtests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.shtests/test_suites/llm/performance/grpo-qwen3-235b-16n4g.sh
🧠 Learnings (9)
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to examples/configs/recipes/llm/*.yaml : Recipe YAML files should follow the naming pattern: <algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>[-modifiers][-long][.vN].yaml for LLM recipes
Applied to files:
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-235b-32n4g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-235b-16n4g.yamlexamples/configs/recipes/llm/performance/grpo-deepseek-v3-32n4g.yamlexamples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-deepseek-v3-64n4g-async-1off.yaml
📚 Learning: 2025-09-18T14:20:36.297Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.yaml:113-120
Timestamp: 2025-09-18T14:20:36.297Z
Learning: In distillation workflows, the teacher policy does not perform generation - it only does inference/logprob computation on sequences generated by the student policy. Therefore, teacher generation configuration mismatches (like vLLM tensor parallelism settings) and colocation concerns are not relevant.
Applied to files:
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to examples/configs/recipes/vlm/*.yaml : Recipe YAML files should follow the naming pattern: vlm_<algo>-<model>-<nodes>n<gpus>g-<strategy>[-modifiers][.vN].yaml for VLM recipes
Applied to files:
examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n4g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-deepseek-v3-32n4g.yamlexamples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-deepseek-v3-64n4g-async-1off.yaml
📚 Learning: 2025-10-12T14:46:57.171Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:6-11
Timestamp: 2025-10-12T14:46:57.171Z
Learning: Test scripts in tests/test_suites/llm/ follow a standard configuration pattern that includes NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS (calculated as `$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))`), and NUM_MINUTES. These variables are part of the test infrastructure's standard interface and should not be flagged as unused even if not directly referenced within the individual script, as they are consumed by external launch tooling or common.env.
Applied to files:
tests/test_suites/llm/performance/grpo-qwen3-235b-32n4g-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-32n4g.shtests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.shtests/test_suites/llm/performance/grpo-qwen3-235b-16n4g.sh
📚 Learning: 2025-10-12T14:46:55.513Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30
Timestamp: 2025-10-12T14:46:55.513Z
Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.
Applied to files:
tests/test_suites/llm/performance/grpo-qwen3-235b-32n4g-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-32n4g.shtests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.shtests/test_suites/llm/performance/grpo-qwen3-235b-16n4g.sh
📚 Learning: 2025-09-19T07:28:29.887Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh:1-4
Timestamp: 2025-09-19T07:28:29.887Z
Learning: The NVIDIA-NeMo/RL project prefers to maintain consistent formatting across test scripts rather than applying individual bash hardening improvements like `set -euo pipefail` or proper quoting for sourcing files.
Applied to files:
tests/test_suites/llm/performance/grpo-qwen3-235b-32n4g-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-32n4g.shtests/test_suites/llm/performance/grpo-qwen3-235b-16n4g.sh
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to tests/test_suites/**/*.sh : Driver shell scripts should match the YAML base name with .sh extension and invoke training entrypoint with uv run
Applied to files:
tests/test_suites/llm/performance/grpo-qwen3-235b-32n4g-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.shtests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.shtests/test_suites/llm/performance/grpo-qwen3-235b-16n4g.sh
📚 Learning: 2025-11-24T17:24:47.707Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0
Timestamp: 2025-11-24T17:24:47.707Z
Learning: If a change could affect performance, the PR description should include before-and-after performance numbers, as well as the configuration and context in which they apply
Applied to files:
tests/test_suites/performance.txt
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to tests/test_suites/**/*.sh : When adding support for a new model, create a corresponding driver shell script under tests/test_suites/ in the matching domain
Applied to files:
tests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.sh
🪛 Shellcheck (0.11.0)
tests/test_suites/llm/performance/grpo-qwen3-235b-32n4g-async-1off.sh
[warning] 8-8: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 11-11: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 12-12: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 18-18: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 29-29: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.sh
[warning] 10-10: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 13-13: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 14-14: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 20-20: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 34-34: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.sh
[warning] 10-10: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 13-13: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 14-14: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 20-20: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 34-34: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/performance/grpo-deepseek-v3-32n4g.sh
[warning] 10-10: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 13-13: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 14-14: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 20-20: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 34-34: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.sh
[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 28-28: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/performance/grpo-qwen3-235b-16n4g.sh
[warning] 8-8: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 11-11: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 12-12: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 18-18: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 29-29: Double quote array expansions to avoid re-splitting elements.
(SC2068)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Lint check
- GitHub Check: Lint check
- GitHub Check: Post automodel integration comment / Comment on PR
- GitHub Check: Post submodule check comment / Comment on PR
🔇 Additional comments (13)
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml (1)
12-13: LGTM!The pipeline parallelism configuration is appropriate for an 8B model on a 2-node, 8-GPU-per-node cluster.
examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n4g.yaml (1)
1-20: LGTM!The configuration is internally consistent with appropriate parallelism settings for a 235B model on a 16-node, 4-GPU-per-node cluster.
examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.yaml (1)
1-26: LGTM!The FP8 quantization configuration follows best practices, including appropriate layer exclusions for MoE architectures.
tests/test_suites/performance.txt (1)
5-38: Well-organized test inventory structure.The categorization by hardware platform (H100, GB200) and precision (BF16, FP8) with sync/async subsections makes the test suite structure clear and maintainable.
Note: The AI summary indicated that
grpo-qwen3-30ba3b-24n8g-async-8off.shwas removed, but it appears at Line 22 under the "ASYNC many-off" section, suggesting it was reorganized rather than removed.examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n4g-async-1off.yaml (1)
1-20: LGTM!The configuration is internally consistent and appropriately scales the 16n4g configuration to 32 nodes.
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.yaml (1)
1-20: LGTM!The FP8 quantization configuration is appropriate for the LLaMA 3.1 8B model with consistent naming throughout.
examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n4g.yaml (1)
1-21: LGTM!The configuration is internally consistent with appropriate parallelism settings for DeepSeek v3 on a 32-node, 4-GPU-per-node cluster.
tests/test_suites/llm/performance/grpo-qwen3-235b-32n4g-async-1off.sh (1)
1-40: LGTM! Script follows established test infrastructure patterns.This test script correctly implements the standard performance test pattern for GRPO, including:
- Proper use of
uv runfor Python invocations- Standard configuration variables (NUM_NODES, NUM_RUNS, NUM_MINUTES) consumed by external launch tooling
- Consistent patterns for directory navigation and argument forwarding
- TensorBoard log conversion and conditional metrics evaluation
Based on learnings, the shellcheck warnings about unused variables and unquoted expansions are expected and can be safely ignored.
tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-fp8-async-1off.sh (1)
1-45: LGTM! DeepSeek v3 FP8 performance test correctly configured.This script properly extends the standard test pattern with DeepSeek-specific configuration:
- Allows custom HF checkpoint via
NRL_DEEPSEEK_V3_HF_CKPTenvironment variable- Disables NVLS to prevent OOM issues
- Correctly passes model name to both policy and tokenizer configuration
- Enables TensorBoard logging for comprehensive observability
- Uses
uv runthroughout as requiredBased on learnings, the shellcheck warnings are expected for this test infrastructure pattern.
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-fp8-async-1off.sh (1)
1-39: LGTM! LLaMA 3.1 FP8 performance test follows standard pattern.This script correctly implements the performance test pattern for the smaller 2-node LLaMA configuration:
- Standard configuration variables for test infrastructure
- Proper
uv runusage throughout- TensorBoard and WandB integration enabled
- Conditional metrics evaluation based on step completion
The simpler configuration without custom model name override is appropriate for this test scenario. Based on learnings, shellcheck warnings are expected.
tests/test_suites/llm/performance/grpo-deepseek-v3-64n4g-async-1off.sh (1)
1-45: LGTM! DeepSeek v3 64n4g async configuration properly implemented.This script correctly implements the 64-node, 4 GPUs-per-node configuration:
- Consistent with other DeepSeek v3 variants in this PR
- Allows flexible checkpoint override via environment variable
- Disables NVLS to prevent OOM issues
- Proper TensorBoard and WandB integration
- Uses
uv runthroughout as requiredBased on learnings, the shellcheck warnings about unused variables and unquoted expansions are expected for this test infrastructure.
tests/test_suites/llm/performance/grpo-qwen3-235b-16n4g.sh (1)
1-40: LGTM! Qwen3-235b 16n4g performance test follows standard pattern.This script correctly implements the standard performance test pattern:
- Proper configuration for 16-node setup
- Uses
uv runthroughout as required- Disables NVLS to prevent OOM
- Includes TensorBoard log conversion and conditional metrics evaluation
Note: Unlike some other scripts in this PR (e.g., grpo-deepseek-v3-64n4g-async-1off.sh line 31), this script doesn't explicitly set
logger.tensorboard_enabled=True. If TensorBoard logging is required for the log conversion at line 33 to work, verify that it's enabled by default in the configuration or common.env.Based on learnings, the shellcheck warnings are expected.
tests/test_suites/llm/performance/grpo-deepseek-v3-32n4g.sh (1)
1-45: LGTM! DeepSeek v3 32n4g performance test correctly implemented.This script properly implements the 32-node configuration:
- Consistent with other DeepSeek v3 variants (64n4g, 64n8g-fp8)
- Flexible checkpoint configuration via
NRL_DEEPSEEK_V3_HF_CKPT- Disables NVLS to prevent OOM
- Explicit TensorBoard and WandB integration
- Proper
uv runusage throughoutBased on learnings, the shellcheck warnings are expected as these variables are consumed by external launch tooling.
Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com>
Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
…-NeMo/RL into guyueh/perf_recipe_for_v0.5
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Seonjin <sna@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Seonjin <sna@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Seonjin <sna@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Seonjin <sna@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Seonjin <sna@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Seonjin <sna@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Seonjin <sna@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Guyue Huang <guyueh@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by: Seonjin <sna@nvidia.com>
What does this PR do ?
Add new performance tests for v0.5
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information
Summary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.