Skip to content

moe support clamp#395

Merged
jikunshang merged 1 commit into
vllm-project:mainfrom
majian4work:fix-moe
Jun 8, 2026
Merged

moe support clamp#395
jikunshang merged 1 commit into
vllm-project:mainfrom
majian4work:fix-moe

Conversation

@majian4work

Copy link
Copy Markdown
Contributor

DSV4 needs a clamp before activation.

Copilot AI review requested due to automatic review settings June 4, 2026 07:49

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds an optional GEMM1 (gate/up) clamping feature to fused MoE, intended to limit activation magnitude before the activation/multiply step.

Changes:

  • Extend ref_fused_moe, FusedMoe init, and xpu_fused_moe public API to accept gemm1_clamp_limit.
  • Apply in-place clamping to gate/up outputs before activation in both reference and kernel paths.
  • Thread the new parameter through _apply_ref and kernel execution paths.
Comments suppressed due to low confidence (1)

vllm_xpu_kernels/fused_moe_interface.py:542

  • The public API adds gemm1_clamp_limit but the function docstring doesn’t describe it (expected type/units, what exactly is clamped, and that gate is clamped only on max while up is clamped symmetrically). Please document gemm1_clamp_limit behavior here (and anywhere else this is a public/configurable parameter) so callers understand the numerical impact and intended use.
def xpu_fused_moe(hidden_states,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +562 to 566
is_block_fp8=False,
gemm1_clamp_limit=None):
'''
hidden_states: [num_rows, hidden_size]
w13: [num_experts, 2*inter_size, hidden_size]
Comment on lines +506 to +507
# Apply swiglu_limit clamping before activation
if self.gemm1_clamp_limit is not None and self.gemm1_clamp_limit > 0:
Comment on lines +506 to +511
# Apply swiglu_limit clamping before activation
if self.gemm1_clamp_limit is not None and self.gemm1_clamp_limit > 0:
gate = gemm1_output[:, :self.inter_size]
up = gemm1_output[:, self.inter_size:]
gate.clamp_(max=self.gemm1_clamp_limit)
up.clamp_(min=-self.gemm1_clamp_limit, max=self.gemm1_clamp_limit)

@xinyu-intel xinyu-intel left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a test case.

Comment thread vllm_xpu_kernels/fused_moe_interface.py Outdated
Comment thread vllm_xpu_kernels/fused_moe_interface.py
@xinyu-intel xinyu-intel added the API-compatible PR with this label means it modify API behavior. we will definetly need code change in vLLM side. label Jun 5, 2026

@wuxun-zhang wuxun-zhang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It aligns with DeepSeek v4 model config and what NV/AMD did: https://github.com/vllm-project/vllm/blob/main/vllm/models/deepseek_v4/nvidia/model.py#L466.

Maybe later we can fuse this clamp with activation kernel.

@majian4work majian4work force-pushed the fix-moe branch 2 times, most recently from 01a6c8a to 09160ea Compare June 5, 2026 08:43

@jikunshang jikunshang left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
cc @mayuyuace PTAL, thanks.

Comment thread tests/fused_moe/test_fused_moe.py Outdated
@majian4work majian4work force-pushed the fix-moe branch 2 times, most recently from e3b3082 to 4125b47 Compare June 8, 2026 02:35
Signed-off-by: Ma Jian <jian1.ma@intel.com>
@jikunshang jikunshang merged commit 874e3d6 into vllm-project:main Jun 8, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API-compatible PR with this label means it modify API behavior. we will definetly need code change in vLLM side.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants