add swiglu limits for shared experts activation#29
add swiglu limits for shared experts activation#29zyongye wants to merge 2 commits intoivanium:feat/dsv4-supportfrom
Conversation
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces clamping functionality to the silu_and_mul activation kernel and the SiluAndMul layer, primarily to support DeepSeek-V4. The changes include updates to the CUDA kernels to handle clamping logic, modifications to the C++ bindings, and the addition of a swiglu_limit parameter in the Python layer. A potential inconsistency was identified between the native PyTorch implementation and the CUDA implementation when the limit is set to zero, which could lead to divergent behavior.
| if self.swiglu_limit is not None: | ||
| gate = torch.clamp(gate, max=self.swiglu_limit) | ||
| up = torch.clamp(up, min=-self.swiglu_limit, max=self.swiglu_limit) |
There was a problem hiding this comment.
There is a potential inconsistency between the native and CUDA implementations when swiglu_limit is set to 0.0. In forward_native, any value of swiglu_limit that is not None (including 0.0) will trigger clamping, effectively zeroing out the output. However, in forward_cuda (and the underlying CUDA kernel), clamping is only enabled if limit > 0.0. While swiglu_limit is typically a positive value, it's safer to align the logic to avoid discrepancies.
| if self.swiglu_limit is not None: | |
| gate = torch.clamp(gate, max=self.swiglu_limit) | |
| up = torch.clamp(up, min=-self.swiglu_limit, max=self.swiglu_limit) | |
| if self.swiglu_limit is not None and self.swiglu_limit > 0: | |
| gate = torch.clamp(gate, max=self.swiglu_limit) | |
| up = torch.clamp(up, min=-self.swiglu_limit, max=self.swiglu_limit) |
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
|
merged in vllm-project#40950 |
No description provided.