Skip to content

Fix: Missing args in allreduce_fusion MOE finalize call#3046

Closed
samuellees wants to merge 2 commits intoflashinfer-ai:mainfrom
samuellees:fix/moe-finalize-missing-args
Closed

Fix: Missing args in allreduce_fusion MOE finalize call#3046
samuellees wants to merge 2 commits intoflashinfer-ai:mainfrom
samuellees:fix/moe-finalize-missing-args

Conversation

@samuellees
Copy link
Copy Markdown
Collaborator

@samuellees samuellees commented Apr 13, 2026

Problem

On main branch, allreduce_fusion(pattern=kMoEFinalizeARResidualRMSNorm) crashes with TypeError: missing positional arguments. mypy pre-commit also fails.

Root Cause

Two PRs merged to main in sequence, modifying different ends of the same call chain:

Order PR What changed File
1st #2966 (Fused moe all-reduce routed scaling factor + quant support) Added quant_out, scale_out, routed_scaling_factor to trtllm_moe_finalize_allreduce_fusion() signature flashinfer/comm/trtllm_ar.py
2nd #2982 (Add MOE patterns to unified allreduce_fusion API, closes #2823) Added kMoEFinalizeARResidualRMSNorm pattern that calls trtllm_moe_finalize_allreduce_fusion() flashinfer/comm/allreduce.py

PR #2982 was developed before #2966 merged. Git merge produced no conflict since they touched different files, but the call in allreduce_fusion() was left with the old signature — missing the three new positional args added by #2966.

Impact

Scope Status
allreduce_fusion(pattern=kMoEFinalizeARResidualRMSNorm) (pattern 7) Broken — TypeError at runtime
allreduce_fusion(pattern=kMoEReductionARResidualRMSNorm) (pattern 6) Not affected (calls a different function)
allreduce_fusion(pattern=0-5) (standard allreduce) Not affected
trtllm_moe_finalize_allreduce_fusion() direct callers Not affected (low-level API is correct)
mypy pre-commit Fails
test_allreduce_fusion_moe_unified_api.py finalize tests Would fail if run on multi-GPU CI

Practical impact is limited — pattern 7 was just added in #2982 and has no downstream consumers yet.

Fix

flashinfer/comm/allreduce.py:

  • Pass quant_out, scale_out, routed_scaling_factor to the trtllm_moe_finalize_allreduce_fusion() call
  • Add routed_scaling_factor: Optional[float] = None to allreduce_fusion() signature
  • Update docstring

No test changes needed — existing tests pass None for the new args (default values), which is the correct behavior for non-quantized finalize.

PR

  • Branch: fix/moe-finalize-missing-args
  • Changes: 1 file, +5 lines
  • Title: fix: add missing args to moe_finalize call in unified allreduce_fusion API

Summary by CodeRabbit

  • New Features
    • Enhanced Mixture of Experts finalization for TRTLLM backend by introducing a configurable global scaling parameter. This enables fine-grained control over routed expert output scaling during finalization operations, improving flexibility for advanced inference configurations.

…n API

PR flashinfer-ai#2966 added quant_out, scale_out, and routed_scaling_factor params
to trtllm_moe_finalize_allreduce_fusion(). PR flashinfer-ai#2982 (unified API) was
developed before flashinfer-ai#2966 merged, and git merge produced no conflict since
they touched different files (trtllm_ar.py vs allreduce.py). However
the call in allreduce_fusion() was missing the three new positional
args, causing TypeError at runtime for kMoEFinalizeARResidualRMSNorm
pattern and mypy failure in pre-commit.

Fix:
- Add quant_out, scale_out, routed_scaling_factor to the finalize call
- Add routed_scaling_factor to allreduce_fusion() function signature
- Update docstring

AI-assisted

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 13, 2026

Caution

Review failed

Pull request was closed or merged during review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d40714a4-e5d1-4bc3-b333-8555b87f4031

📥 Commits

Reviewing files that changed from the base of the PR and between e64ae8b and 8610120.

📒 Files selected for processing (1)
  • flashinfer/comm/allreduce.py

📝 Walkthrough

Walkthrough

The PR adds an optional routed_scaling_factor parameter to the allreduce_fusion function, enabling configurable global scaling for routed expert outputs in MoE finalize operations. The TRTLLM backend MOE finalize dispatch path now forwards this parameter instead of passing a hardcoded None.

Changes

Cohort / File(s) Summary
MoE Finalize AllReduce Scaling
flashinfer/comm/allreduce.py
Added optional routed_scaling_factor parameter to allreduce_fusion signature and propagated it to the trtllm_moe_finalize_allreduce_fusion call in the MOE finalize dispatch path, replacing hardcoded None.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

  • #3040 — Adds missing routed_scaling_factor alongside quant_out/scale_out arguments to the same TRTLLM MoE finalize callsite in allreduce.py.
  • #2982 — Modifies allreduce_fusion API and TRTLLM MoE finalize dispatch to introduce and wire routed_scaling_factor parameter handling.
  • #2966 — Modifies MoE finalize all-reduce fusion path to add and propagate routed_scaling_factor parameter across function signatures.

Suggested reviewers

  • aleozlx
  • yzh119
  • bkryu
  • jimmyzho
  • nv-yunzheq

Poem

🐰 A scaling factor hops through the code,
Making MoE experts lighter their load,
No more hardcoded None in sight,
Just a parameter, perfectly right! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and concisely identifies the specific problem being fixed: missing arguments in a MOE finalize call within the allreduce_fusion function.
Description check ✅ Passed The description comprehensively covers the problem statement, root cause analysis, impact assessment, and implemented fix with clear context about prior PRs.
Linked Issues check ✅ Passed The changes correctly implement the MOE finalize pattern from #2823 by fixing the call signature to trtllm_moe_finalize_allreduce_fusion() and exposing routed_scaling_factor parameter.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the MOE finalize call signature and exposing the routed_scaling_factor parameter, with no unrelated modifications.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Timed out fetching pipeline failures after 30000ms


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…ing-args

# Conflicts:
#	flashinfer/comm/allreduce.py
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the allreduce_fusion function in flashinfer/comm/allreduce.py to include a new routed_scaling_factor parameter and its corresponding documentation. Additionally, it updates the internal operation call to include quant_out, scale_out, and routed_scaling_factor. I have no feedback to provide.

@samuellees
Copy link
Copy Markdown
Collaborator Author

Close because of #3040

@samuellees samuellees closed this Apr 13, 2026
@samuellees samuellees deleted the fix/moe-finalize-missing-args branch April 13, 2026 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TRTLLM fused MoE Finalize+ResidualAdd + AR+Norm for DSV3.2

2 participants