Skip to content

add swiglu limits for shared experts activation#29

Closed
zyongye wants to merge 2 commits intoivanium:feat/dsv4-supportfrom
zyongye:silu_mul_with_clamp
Closed

add swiglu limits for shared experts activation#29
zyongye wants to merge 2 commits intoivanium:feat/dsv4-supportfrom
zyongye:silu_mul_with_clamp

Conversation

@zyongye
Copy link
Copy Markdown

@zyongye zyongye commented Apr 27, 2026

No description provided.

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces clamping functionality to the silu_and_mul activation kernel and the SiluAndMul layer, primarily to support DeepSeek-V4. The changes include updates to the CUDA kernels to handle clamping logic, modifications to the C++ bindings, and the addition of a swiglu_limit parameter in the Python layer. A potential inconsistency was identified between the native PyTorch implementation and the CUDA implementation when the limit is set to zero, which could lead to divergent behavior.

Comment on lines +145 to +147
if self.swiglu_limit is not None:
gate = torch.clamp(gate, max=self.swiglu_limit)
up = torch.clamp(up, min=-self.swiglu_limit, max=self.swiglu_limit)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a potential inconsistency between the native and CUDA implementations when swiglu_limit is set to 0.0. In forward_native, any value of swiglu_limit that is not None (including 0.0) will trigger clamping, effectively zeroing out the output. However, in forward_cuda (and the underlying CUDA kernel), clamping is only enabled if limit > 0.0. While swiglu_limit is typically a positive value, it's safer to align the logic to avoid discrepancies.

Suggested change
if self.swiglu_limit is not None:
gate = torch.clamp(gate, max=self.swiglu_limit)
up = torch.clamp(up, min=-self.swiglu_limit, max=self.swiglu_limit)
if self.swiglu_limit is not None and self.swiglu_limit > 0:
gate = torch.clamp(gate, max=self.swiglu_limit)
up = torch.clamp(up, min=-self.swiglu_limit, max=self.swiglu_limit)

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
@ivanium
Copy link
Copy Markdown
Owner

ivanium commented Apr 27, 2026

merged in vllm-project#40950

@ivanium ivanium closed this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants