Skip to content

fix: perf configs 2#2393

Merged
ko3n1g merged 5 commits intomainfrom
ko3n1g/fix/perf-configs-2
Feb 17, 2026
Merged

fix: perf configs 2#2393
ko3n1g merged 5 commits intomainfrom
ko3n1g/fix/perf-configs-2

Conversation

@ko3n1g
Copy link
Copy Markdown
Contributor

@ko3n1g ko3n1g commented Feb 16, 2026

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

  • Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

  • Chores
    • Updated pretraining configuration settings to enable token dispatch improvements across multiple model variants.

Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 16, 2026

📝 Walkthrough

Walkthrough

This pull request adds token-level MOE dispatcher configuration to multiple Qwen3 pretraining model configurations. Specifically, cfg.model.moe_token_dispatcher_type is set to "flex" across several configuration functions (gb200, gb300, b300, b200, h100, and variants), complementing existing MOE dispatcher backend settings.

Changes

Cohort / File(s) Summary
Qwen3 MOE Dispatcher Configuration
scripts/performance/configs/qwen/qwen3_llm_pretrain.py
Adds token-level MOE dispatcher type assignment ("flex") to multiple Qwen3 pretraining configuration functions alongside existing dispatcher backend settings.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Suggested labels

performance

Suggested reviewers

  • yaoyu-33
  • erhoo82
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR adds MOE token dispatcher type configuration changes affecting performance-critical behavior, but PR description lacks test results, performance metrics, or regression analysis. Update PR description with before-and-after performance metrics and convergence data for affected Qwen3 configurations to validate no regressions.
Title check ❓ Inconclusive The title 'fix: perf configs 2' is vague and generic, lacking specificity about which performance configurations are being fixed. Consider a more descriptive title such as 'fix: enable flex token dispatcher for Qwen3 pretrain configs' to clearly communicate the primary change.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ko3n1g/fix/perf-configs-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
scripts/performance/configs/qwen/qwen3_llm_pretrain.py (1)

78-84: ⚠️ Potential issue | 🟠 Major

moe_token_dispatcher_type is not set in the gb300 config for qwen3_235b, unlike all other sibling configs.

Every other qwen3_235b_a22b_pretrain_config_* and qwen3_30b_a3b_pretrain_config_* function explicitly sets cfg.model.moe_token_dispatcher_type (to either "flex" or "alltoall"). This function sets moe_flex_dispatcher_backend but omits the dispatcher type, so it will silently use whatever default the framework provides.

If this is intentional, a comment explaining the omission would help. Otherwise, it likely needs "flex" to match the gb200 variant:

Proposed fix
     cfg.model.moe_flex_dispatcher_backend = base_cfg.moe_flex_dispatcher_backend
+    cfg.model.moe_token_dispatcher_type = "flex"
 
     set_qwen3_common_configs(cfg)

As per coding guidelines: "Do not add arbitrary defaults for configs, be as explicit as possible."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants