Qwen3 MoE Preliminary: add intermediate_size argument to MLP modules #2046

ysjprojects · 2025-05-17T06:43:02Z

class Qwen3MoeMLP(nn.Module):
    def __init__(self, config, intermediate_size=None):
        super().__init__()
        self.config = config
        self.hidden_size = config.hidden_size
        self.intermediate_size = intermediate_size if intermediate_size is not None else config.intermediate_size
        self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
        self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=False)
        self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=False)
        self.act_fn = ACT2FN[config.hidden_act]

    def forward(self, x):
        down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
        return down_proj

In Qwen3 MoE, the MLP module is created with two possible intermediate sizes. The sparse MoE block uses MLP with size config.moe_intermediate_size while the decoder layer uses MLP with size config.intermediate_size.

This is also observed in DeepseekV3 and likely many more MoE models to come. Therefore we extend this flexibility to LitGPT's own MLP modules.

for more information, see https://pre-commit.ci

initial

a3f2ca0

ysjprojects requested review from Borda, lantiga and t-vi as code owners May 17, 2025 06:43

pre-commit-ci bot and others added 2 commits May 17, 2025 06:43

[pre-commit.ci] auto fixes from pre-commit.com hooks

c2b96be

for more information, see https://pre-commit.ci

Merge branch 'main' into preliminary-qwen3_moe

73d247e

Borda approved these changes May 22, 2025

View reviewed changes

Borda enabled auto-merge (squash) May 22, 2025 12:15

Borda and others added 3 commits May 23, 2025 16:15

Merge branch 'main' into preliminary-qwen3_moe

2c888ed

Merge branch 'main' into preliminary-qwen3_moe

d2dec89

Merge branch 'main' into preliminary-qwen3_moe

23d821b

Borda merged commit f99ca4e into Lightning-AI:main May 28, 2025
21 of 23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen3 MoE Preliminary: add intermediate_size argument to MLP modules #2046

Qwen3 MoE Preliminary: add intermediate_size argument to MLP modules #2046

Uh oh!

ysjprojects commented May 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Qwen3 MoE Preliminary: add intermediate_size argument to MLP modules #2046

Qwen3 MoE Preliminary: add intermediate_size argument to MLP modules #2046

Uh oh!

Conversation

ysjprojects commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ysjprojects commented May 17, 2025 •

edited

Loading