perf: [Perf script] QWEN3 30B-A3B tensor_parallel_size from 4 to 2#1558
perf: [Perf script] QWEN3 30B-A3B tensor_parallel_size from 4 to 2#1558
Conversation
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
📝 WalkthroughWalkthroughA single configuration file in the GRPO Qwen model recipe is updated, reducing the tensor parallelism setting for the vLLM generation backend from 4 to 2. This is a parameter adjustment with no logic changes. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0
Timestamp: 2025-09-20T14:59:08.052Z
Learning: If a change could affect performance, include before-and-after performance numbers in the PR description, along with configuration and context.
📚 Learning: 2025-10-30T20:50:44.126Z
Learnt from: adil-a
Repo: NVIDIA-NeMo/RL PR: 1440
File: examples/configs/sft_automodel.yaml:48-58
Timestamp: 2025-10-30T20:50:44.126Z
Learning: In DTensor configurations for MoE (Mixture of Experts) models, expert_parallel_size and data_parallel_size can be applied together without multiplying the GPU requirements. Expert Parallelism (EP) only applies to MoE layers, while Data Parallelism/FSDP applies to non-MoE layers. Therefore, configurations like expert_parallel_size: 8 and data_parallel_size: 8 are valid on an 8-GPU cluster for MoE models.
Applied to files:
examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: build-container / main
- GitHub Check: Lint check
- GitHub Check: Lint check
- GitHub Check: Post submodule check comment / Comment on PR
- GitHub Check: Post automodel integration comment / Comment on PR
|
Hi @terrykong, can I ask for your help to merge this PR? |
…VIDIA-NeMo#1558) Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
…VIDIA-NeMo#1558) Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
…1558) Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
…1558) Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
…1558) Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.