moe support clamp by majian4work · Pull Request #395 · vllm-project/vllm-xpu-kernels

majian4work · 2026-06-04T07:49:30Z

DSV4 needs a clamp before activation.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds an optional GEMM1 (gate/up) clamping feature to fused MoE, intended to limit activation magnitude before the activation/multiply step.

Changes:

Extend ref_fused_moe, FusedMoe init, and xpu_fused_moe public API to accept gemm1_clamp_limit.
Apply in-place clamping to gate/up outputs before activation in both reference and kernel paths.
Thread the new parameter through _apply_ref and kernel execution paths.

Comments suppressed due to low confidence (1)

vllm_xpu_kernels/fused_moe_interface.py:542

The public API adds gemm1_clamp_limit but the function docstring doesn’t describe it (expected type/units, what exactly is clamped, and that gate is clamped only on max while up is clamped symmetrically). Please document gemm1_clamp_limit behavior here (and anywhere else this is a public/configurable parameter) so callers understand the numerical impact and intended use.

def xpu_fused_moe(hidden_states,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+                  is_block_fp8=False,
+                  gemm1_clamp_limit=None):
    '''
    hidden_states: [num_rows, hidden_size]
    w13: [num_experts, 2*inter_size, hidden_size]


+        # Apply swiglu_limit clamping before activation
+        if self.gemm1_clamp_limit is not None and self.gemm1_clamp_limit > 0:


+        # Apply swiglu_limit clamping before activation
+        if self.gemm1_clamp_limit is not None and self.gemm1_clamp_limit > 0:
+            gate = gemm1_output[:, :self.inter_size]
+            up = gemm1_output[:, self.inter_size:]
+            gate.clamp_(max=self.gemm1_clamp_limit)
+            up.clamp_(min=-self.gemm1_clamp_limit, max=self.gemm1_clamp_limit)


xinyu-intel

please add a test case.

wuxun-zhang

LGTM. It aligns with DeepSeek v4 model config and what NV/AMD did: https://github.com/vllm-project/vllm/blob/main/vllm/models/deepseek_v4/nvidia/model.py#L466.

Maybe later we can fuse this clamp with activation kernel.

jikunshang

LGTM.
cc @mayuyuace PTAL, thanks.

Signed-off-by: Ma Jian <jian1.ma@intel.com>

Copilot AI review requested due to automatic review settings June 4, 2026 07:49

Copilot AI reviewed Jun 4, 2026

View reviewed changes

majian4work mentioned this pull request Jun 4, 2026

[DSV4][XPU] Pass gemm1_clamp_limit to XpuFusedMoe vllm-project/vllm#44517

Draft

xinyu-intel reviewed Jun 5, 2026

View reviewed changes

Comment thread vllm_xpu_kernels/fused_moe_interface.py Outdated

Comment thread vllm_xpu_kernels/fused_moe_interface.py

xinyu-intel added the API-compatible PR with this label means it modify API behavior. we will definetly need code change in vLLM side. label Jun 5, 2026

wuxun-zhang approved these changes Jun 5, 2026

View reviewed changes

majian4work force-pushed the fix-moe branch 2 times, most recently from 01a6c8a to 09160ea Compare June 5, 2026 08:43

xinyu-intel approved these changes Jun 5, 2026

View reviewed changes

jikunshang approved these changes Jun 8, 2026

View reviewed changes

Comment thread tests/fused_moe/test_fused_moe.py Outdated

mayuyuace approved these changes Jun 8, 2026

View reviewed changes

majian4work force-pushed the fix-moe branch 2 times, most recently from e3b3082 to 4125b47 Compare June 8, 2026 02:35

moe support clamp

4125b47

Signed-off-by: Ma Jian <jian1.ma@intel.com>

jikunshang merged commit 874e3d6 into vllm-project:main Jun 8, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

moe support clamp#395

moe support clamp#395
jikunshang merged 1 commit into
vllm-project:mainfrom
majian4work:fix-moe

majian4work commented Jun 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

xinyu-intel left a comment

Uh oh!

Uh oh!

Uh oh!

wuxun-zhang left a comment

Uh oh!

jikunshang left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

		# Apply swiglu_limit clamping before activation
		if self.gemm1_clamp_limit is not None and self.gemm1_clamp_limit > 0:

Conversation

majian4work commented Jun 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

xinyu-intel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wuxun-zhang left a comment

Choose a reason for hiding this comment

Uh oh!

jikunshang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants