moe support clamp#395
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds an optional GEMM1 (gate/up) clamping feature to fused MoE, intended to limit activation magnitude before the activation/multiply step.
Changes:
- Extend
ref_fused_moe,FusedMoeinit, andxpu_fused_moepublic API to acceptgemm1_clamp_limit. - Apply in-place clamping to gate/up outputs before activation in both reference and kernel paths.
- Thread the new parameter through
_apply_refand kernel execution paths.
Comments suppressed due to low confidence (1)
vllm_xpu_kernels/fused_moe_interface.py:542
- The public API adds
gemm1_clamp_limitbut the function docstring doesn’t describe it (expected type/units, what exactly is clamped, and that gate is clamped only on max while up is clamped symmetrically). Please documentgemm1_clamp_limitbehavior here (and anywhere else this is a public/configurable parameter) so callers understand the numerical impact and intended use.
def xpu_fused_moe(hidden_states,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| is_block_fp8=False, | ||
| gemm1_clamp_limit=None): | ||
| ''' | ||
| hidden_states: [num_rows, hidden_size] | ||
| w13: [num_experts, 2*inter_size, hidden_size] |
| # Apply swiglu_limit clamping before activation | ||
| if self.gemm1_clamp_limit is not None and self.gemm1_clamp_limit > 0: |
| # Apply swiglu_limit clamping before activation | ||
| if self.gemm1_clamp_limit is not None and self.gemm1_clamp_limit > 0: | ||
| gate = gemm1_output[:, :self.inter_size] | ||
| up = gemm1_output[:, self.inter_size:] | ||
| gate.clamp_(max=self.gemm1_clamp_limit) | ||
| up.clamp_(min=-self.gemm1_clamp_limit, max=self.gemm1_clamp_limit) |
xinyu-intel
left a comment
There was a problem hiding this comment.
please add a test case.
wuxun-zhang
left a comment
There was a problem hiding this comment.
LGTM. It aligns with DeepSeek v4 model config and what NV/AMD did: https://github.com/vllm-project/vllm/blob/main/vllm/models/deepseek_v4/nvidia/model.py#L466.
Maybe later we can fuse this clamp with activation kernel.
01a6c8a to
09160ea
Compare
jikunshang
left a comment
There was a problem hiding this comment.
LGTM.
cc @mayuyuace PTAL, thanks.
e3b3082 to
4125b47
Compare
Signed-off-by: Ma Jian <jian1.ma@intel.com>
DSV4 needs a clamp before activation.