perf: [Perf script] QWEN3 30B-A3B tensor_parallel_size from 4 to 2 by youngeunkwon0405 · Pull Request #1558 · NVIDIA-NeMo/RL

youngeunkwon0405 · 2025-11-21T17:55:30Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

Chores
- Updated example configuration parameters for GRPO performance optimization scenarios.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

coderabbitai · 2025-11-21T18:00:08Z

📝 Walkthrough

Walkthrough

A single configuration file in the GRPO Qwen model recipe is updated, reducing the tensor parallelism setting for the vLLM generation backend from 4 to 2. This is a parameter adjustment with no logic changes.

Changes

Cohort / File(s)	Summary
Configuration Adjustment `examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml`	Reduced `generation.vllm_cfg.tensor_parallel_size` from 4 to 2

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Minimal change scope: single parameter update in a configuration file

Possibly related PRs

perf: perf script change for qwen30b-a3b #1526: Modifies the same generation.vllm_cfg.tensor_parallel_size parameter (4 → 2) in the identical configuration file

Suggested reviewers

guyueh1
terrykong

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR modifies performance configuration (tensor_parallel_size 4→2) but lacks before-and-after metrics, test results, and workload context required for performance changes.	Add concrete before-and-after performance metrics, hardware/workload configuration, and benchmark evidence demonstrating the optimization improves performance.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: reducing tensor_parallel_size from 4 to 2 in the QWEN3 30B configuration file.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch youngeunkwon0405-patch-2

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c371a9 and 44fba45.

📒 Files selected for processing (1)

examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0
Timestamp: 2025-09-20T14:59:08.052Z
Learning: If a change could affect performance, include before-and-after performance numbers in the PR description, along with configuration and context.

📚 Learning: 2025-10-30T20:50:44.126Z

Learnt from: adil-a
Repo: NVIDIA-NeMo/RL PR: 1440
File: examples/configs/sft_automodel.yaml:48-58
Timestamp: 2025-10-30T20:50:44.126Z
Learning: In DTensor configurations for MoE (Mixture of Experts) models, expert_parallel_size and data_parallel_size can be applied together without multiplying the GPU requirements. Expert Parallelism (EP) only applies to MoE layers, while Data Parallelism/FSDP applies to non-MoE layers. Therefore, configurations like expert_parallel_size: 8 and data_parallel_size: 8 are valid on an 8-GPU cluster for MoE models.

Applied to files:

examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: build-container / main
GitHub Check: Lint check
GitHub Check: Lint check
GitHub Check: Post submodule check comment / Comment on PR
GitHub Check: Post automodel integration comment / Comment on PR

examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml

youngeunkwon0405 · 2025-11-24T09:18:39Z

Hi @terrykong, can I ask for your help to merge this PR?

…VIDIA-NeMo#1558) Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

…VIDIA-NeMo#1558) Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

…1558) Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

perf: [Perf script] QWEN3 30B-A3B tensor_parallel_size from 4 to 2

44fba45

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

youngeunkwon0405 requested a review from guyueh1 November 21, 2025 17:55

youngeunkwon0405 self-assigned this Nov 21, 2025

youngeunkwon0405 requested a review from a team as a code owner November 21, 2025 17:55

youngeunkwon0405 added Performance Related to improving performance CI:docs Run doctest labels Nov 21, 2025

youngeunkwon0405 temporarily deployed to nemo-ci November 21, 2025 17:57 — with GitHub Actions Inactive

youngeunkwon0405 requested a review from terrykong November 21, 2025 17:57

youngeunkwon0405 temporarily deployed to nemo-ci November 21, 2025 17:57 — with GitHub Actions Inactive

coderabbitai bot reviewed Nov 21, 2025

View reviewed changes

examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml Show resolved Hide resolved

guyueh1 approved these changes Nov 21, 2025

View reviewed changes

terrykong approved these changes Nov 24, 2025

View reviewed changes

terrykong merged commit 5f6cfc7 into main Nov 24, 2025
54 of 56 checks passed

terrykong deleted the youngeunkwon0405-patch-2 branch November 24, 2025 17:25

This was referenced Dec 11, 2025

test: Add grpo-qwen3-30ba3b-4n8g-40k config to performance test suite. #1623

Merged

perf: Add qwen3 30b-a3b async-8-off recipe #1642

Merged

DeL-TaiseiOzaki pushed a commit to DeL-TaiseiOzaki/RL that referenced this pull request Jan 8, 2026

perf: [Perf script] QWEN3 30B-A3B tensor_parallel_size from 4 to 2 (N…

cef05b1

…VIDIA-NeMo#1558) Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 8, 2026

perf: [Perf script] QWEN3 30B-A3B tensor_parallel_size from 4 to 2 (#…

a36b4b0

…1558) Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 8, 2026

perf: [Perf script] QWEN3 30B-A3B tensor_parallel_size from 4 to 2 (#…

7988f5b

…1558) Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 9, 2026

perf: [Perf script] QWEN3 30B-A3B tensor_parallel_size from 4 to 2 (#…

c60cec9

…1558) Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: [Perf script] QWEN3 30B-A3B tensor_parallel_size from 4 to 2#1558

perf: [Perf script] QWEN3 30B-A3B tensor_parallel_size from 4 to 2#1558
terrykong merged 1 commit intomainfrom
youngeunkwon0405-patch-2

youngeunkwon0405 commented Nov 21, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 21, 2025

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

youngeunkwon0405 commented Nov 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

youngeunkwon0405 commented Nov 21, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 21, 2025

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

youngeunkwon0405 commented Nov 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

youngeunkwon0405 commented Nov 21, 2025 •

edited by coderabbitai bot

Loading