Skip to content

[Qwen][Bugfix] Fixes sigmoid activation in torch impl of RMSNormGated.#40245

Merged
sighingnow merged 4 commits intovllm-project:mainfrom
sighingnow:fixes-sigmoid-gated-rmsnorm
Apr 20, 2026
Merged

[Qwen][Bugfix] Fixes sigmoid activation in torch impl of RMSNormGated.#40245
sighingnow merged 4 commits intovllm-project:mainfrom
sighingnow:fixes-sigmoid-gated-rmsnorm

Conversation

@sighingnow
Copy link
Copy Markdown
Collaborator

The sigmoid activation in RMSNormGated was added to the forward_cuda, but not forward_native.

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added qwen Related to Qwen models bug Something isn't working labels Apr 18, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for configurable activation functions in the RMSNormGated layer, specifically adding 'sigmoid' alongside the existing 'silu'/'swish' options. It also updates the GDNLinearAttention module to handle these gate types from the model configuration. Feedback points out that the assertion in layernorm.py is too restrictive as it excludes 'swish' and suggests using torch.sigmoid instead of the deprecated F.sigmoid.

Comment thread vllm/model_executor/layers/layernorm.py Outdated
Comment on lines +481 to +482
assert self.activation in ["silu", "sigmoid"]
act_fn = F.sigmoid if self.activation == "sigmoid" else F.silu
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The assertion is too restrictive as it excludes "swish", which is the default activation for this class (defined at line 429). This will cause a runtime error for any model using the default configuration when running in forward_native. Additionally, torch.sigmoid is preferred over F.sigmoid as the latter is deprecated in modern PyTorch versions.

Suggested change
assert self.activation in ["silu", "sigmoid"]
act_fn = F.sigmoid if self.activation == "sigmoid" else F.silu
assert self.activation in ["silu", "sigmoid", "swish"]
act_fn = torch.sigmoid if self.activation == "sigmoid" else F.silu

@sighingnow sighingnow enabled auto-merge (squash) April 18, 2026 14:48
@github-actions github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 18, 2026
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
@sighingnow
Copy link
Copy Markdown
Collaborator Author

sighingnow commented Apr 20, 2026

@youkaichao @Tib-Gridello @ZJY0516 can you folks take a look at this PR? The test failure seems not related to this PR.

weight = self.weight.float()
z = z.float() if z is not None else None

assert self.activation in ["silu", "sigmoid", "swish"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest doing this in __init__ to avoid overhead during forward

@sighingnow sighingnow merged commit 8936118 into vllm-project:main Apr 20, 2026
60 checks passed
bnellnm pushed a commit to neuralmagic/vllm that referenced this pull request Apr 20, 2026
baonudesifeizhai pushed a commit to baonudesifeizhai/vllm that referenced this pull request Apr 23, 2026
avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Apr 27, 2026
vllm-project#40245)

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
Lafunamor pushed a commit to Lafunamor/vllm that referenced this pull request May 1, 2026
vllm-project#40245)

Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Signed-off-by: Adrian <info@zzit.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants