[vLLM IR][RMSNorm] Port Mixer2RMSNormGated to vLLM IR Ops by wxsIcey · Pull Request #39262 · vllm-project/vllm

wxsIcey · 2026-04-08T03:37:50Z

Purpose

Register Mixer2RMSNormGated as a vllm IR op and rewrite Mixer2RMSNormGated.forward_native to dispatch correctly across all tensor-parallel configurations.

The implementation handles four cases:

Case	Condition	Issue	Solution
1	`n_groups=1`, `tp_size>1`	Variance must be computed across all ranks (one global norm group, each rank holds only a slice)	AllReduce local sum-of-squares → compute global variance
2	`n_groups=1`, `tp_size=1`	No TP, local data is complete	Use IR op directly
3	`n_groups>1`, `n_groups % tp_size != 0`	Group boundaries straddle rank boundaries (a rank may hold half a group), local norm is incorrect	AllGather full tensor → normalize locally → slice back to local rank
4	`n_groups>1`, `n_groups % tp_size == 0`	Each rank holds an integer number of complete groups, variance can be computed independently	Use IR op directly

Cases 2 and 4 require no collective communication and are handled by the IR op. Cases 1 and 3 require cross-rank communication that cannot be fused into a single kernel, so they are handled with explicit AllReduce / AllGather before calling into local computation.

Because forward_native now covers all cases (including the optimized IR op paths for cases 2 and 4), forward_cuda is fully redundant and can be removed.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Icey <1790571317@qq.com>

gemini-code-assist

Code Review

This pull request introduces the mixer2_rms_norm_gated IR operator and its Triton implementation, refactoring the Mamba Mixer2 layer to utilize this new operator. The changes include updating kernel configurations, platform-specific priorities, and the underlying Triton kernel for gated layer normalization. A potential TypeError was identified in the native implementation of mixer2_rms_norm_gated when the weight is None, and a code suggestion was provided to handle this case.

Signed-off-by: Icey <1790571317@qq.com>

wxsIcey · 2026-04-08T07:00:21Z

This pr has the same issue as #38798: when using wrap_triton rather than custom op, garbled characters are output. However, I found that setting enforce_eager=True resulted in normal output. It seems this issue is related to torch.compile().

I hope to get some help. cc@zou3519

Signed-off-by: Icey <1790571317@qq.com>

ProExpertProg · 2026-04-08T14:49:26Z

@wxsIcey I think you're missing run_functional_passes=False in the lowering pass I think - for some reason that flag removes the triton kernel from the replacement if set to True. See

vllm/vllm/compilation/passes/ir/lowering_pass.py

Lines 95 to 97 in 06de5e1

    
           match.replace_by_example( 
        
               ir_op_impl.impl_fn, bound_args.args, run_functional_passes=False 
        
           )

wxsIcey · 2026-04-10T09:03:18Z

@wxsIcey I think you're missing run_functional_passes=False in the lowering pass I think - for some reason that flag removes the triton kernel from the replacement if set to True. See

vllm/vllm/compilation/passes/ir/lowering_pass.py

Lines 95 to 97 in 06de5e1

match.replace_by_example(

ir_op_impl.impl_fn, bound_args.args, run_functional_passes=False

)

It seems to have no effect.

tomeras91

Left a nit comment
Also - do you plan to post benchmark results before/after this change? I understand we don't expect any perf diff (IR ops still go through torch.compile), but would like to verify that since this PR changes the code path significantly..

tomeras91 · 2026-04-12T12:04:13Z

    """Priority list for vllm.ir.ops.rms_norm"""

+    mixer2_rms_norm_gated: list[str] = Field(default_factory=list)
+    """Priority list for vllm.ir.ops.rms_norm_gated"""


nit: docstring should have vllm.ir.ops.mixer2_rms_norm_gated instead of vllm.ir.ops.rms_norm_gated.
(Or change the op name to rms_norm_gated)

Thanks for your review. I will change it.

mergify · 2026-04-14T20:45:52Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wxsIcey.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

wxsIcey · 2026-04-15T09:30:04Z

Left a nit comment Also - do you plan to post benchmark results before/after this change? I understand we don't expect any perf diff (IR ops still go through torch.compile), but would like to verify that since this PR changes the code path significantly..

Thanks for the review. This PR is currently on low priority. I will add benchmark tests once it's ready.

Signed-off-by: Chaojun Zhang <chaojun.zhang@intel.com>

chaojun-zhang · 2026-04-21T07:49:20Z

@wxsIcey I added this IR on XPU, please review wxsIcey#12

wxsIcey · 2026-04-21T11:16:08Z

@wxsIcey I added this IR on XPU, please review wxsIcey#12

Thanks for your work, I will merge it.

chaojun-zhang · 2026-04-22T00:26:01Z

@wxsIcey I added this IR on XPU, please review wxsIcey#12

Thanks for your work, I will merge it.

@wxsIcey, I found that the Triton provider fails to pass IR lowering. I attempted to fix this by registering the Triton kernel as a torch custom op. Please take a look wxsIcey#14, thanks

wxsIcey · 2026-04-22T03:50:40Z

@wxsIcey I added this IR on XPU, please review wxsIcey#12

Thanks for your work, I will merge it.

@wxsIcey, I found that the Triton provider fails to pass IR lowering. I attempted to fix this by registering the Triton kernel as a torch custom op. Please take a look wxsIcey#14, thanks

This is a known issue. make_fx does not handle the triton operator correctly. You can see the discussion in #38798. We need to figure out why wrap_trion cannot be used.

mixer2_gated_rms_norm

cf8c816

Signed-off-by: Icey <1790571317@qq.com>

mergify Bot added nvidia rocm Related to AMD ROCm labels Apr 8, 2026

github-project-automation Bot added this to NVIDIA and AMD Apr 8, 2026

github-project-automation Bot moved this to Todo in AMD Apr 8, 2026

gemini-code-assist Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread vllm/ir/ops/layernorm.py Outdated

wrap triton

2c5bcac

Signed-off-by: Icey <1790571317@qq.com>

wxsIcey changed the title ~~[vLLM IR] mixer2_gated_rms_norm~~ [vLLM IR] Port Mixer2RMSNormGated to vLLM IR Ops Apr 8, 2026

wxsIcey changed the title ~~[vLLM IR] Port Mixer2RMSNormGated to vLLM IR Ops~~ [vLLM IR][RMSNorm] Port Mixer2RMSNormGated to vLLM IR Ops Apr 8, 2026

wxsIcey marked this pull request as ready for review April 8, 2026 07:01

wxsIcey requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, tdoublep, tjtanaa, tlrmchlsmth, tomeras91, yewentao256 and youkaichao as code owners April 8, 2026 07:01

fix native impl

bcb8bb7

Signed-off-by: Icey <1790571317@qq.com>

tomeras91 reviewed Apr 12, 2026

View reviewed changes

mergify Bot added the needs-rebase label Apr 14, 2026

tomeras91 mentioned this pull request Apr 15, 2026

[Mamba] Flashinfer selective_state_update #36162

Merged

5 tasks

Support mixer2_rms_norm_gated IR on XPU platform

a5b4f98

Signed-off-by: Chaojun Zhang <chaojun.zhang@intel.com>

Support mixer2_rms_norm_gated IR on XPU platform

44e2444

wxsIcey requested review from jikunshang and xuechendi as code owners April 21, 2026 11:18

mergify Bot added the intel-gpu Related to Intel GPU label Apr 21, 2026

Uh oh!

Conversation

wxsIcey commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

wxsIcey commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProExpertProg commented Apr 8, 2026

Uh oh!

wxsIcey commented Apr 10, 2026

Uh oh!

tomeras91 left a comment

Choose a reason for hiding this comment

Uh oh!

tomeras91 Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

wxsIcey Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Apr 14, 2026

Uh oh!

wxsIcey commented Apr 15, 2026

Uh oh!

chaojun-zhang commented Apr 21, 2026

Uh oh!

wxsIcey commented Apr 21, 2026

Uh oh!

chaojun-zhang commented Apr 22, 2026

Uh oh!

wxsIcey commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wxsIcey commented Apr 8, 2026 •

edited

Loading

wxsIcey commented Apr 8, 2026 •

edited

Loading