[Qwen][Bugfix] Fixes sigmoid activation in torch impl of RMSNormGated.#40245
Conversation
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
There was a problem hiding this comment.
Code Review
This pull request introduces support for configurable activation functions in the RMSNormGated layer, specifically adding 'sigmoid' alongside the existing 'silu'/'swish' options. It also updates the GDNLinearAttention module to handle these gate types from the model configuration. Feedback points out that the assertion in layernorm.py is too restrictive as it excludes 'swish' and suggests using torch.sigmoid instead of the deprecated F.sigmoid.
| assert self.activation in ["silu", "sigmoid"] | ||
| act_fn = F.sigmoid if self.activation == "sigmoid" else F.silu |
There was a problem hiding this comment.
The assertion is too restrictive as it excludes "swish", which is the default activation for this class (defined at line 429). This will cause a runtime error for any model using the default configuration when running in forward_native. Additionally, torch.sigmoid is preferred over F.sigmoid as the latter is deprecated in modern PyTorch versions.
| assert self.activation in ["silu", "sigmoid"] | |
| act_fn = F.sigmoid if self.activation == "sigmoid" else F.silu | |
| assert self.activation in ["silu", "sigmoid", "swish"] | |
| act_fn = torch.sigmoid if self.activation == "sigmoid" else F.silu |
|
@youkaichao @Tib-Gridello @ZJY0516 can you folks take a look at this PR? The test failure seems not related to this PR. |
| weight = self.weight.float() | ||
| z = z.float() if z is not None else None | ||
|
|
||
| assert self.activation in ["silu", "sigmoid", "swish"] |
There was a problem hiding this comment.
I suggest doing this in __init__ to avoid overhead during forward
vllm-project#40245) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
vllm-project#40245) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
vllm-project#40245) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
vllm-project#40245) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com> Signed-off-by: Adrian <info@zzit.ch>
The sigmoid activation in RMSNormGated was added to the
forward_cuda, but notforward_native.