[Model] Add Qwen3 and Qwen3MoE #15289

YamPengLi · 2025-03-21T11:27:17Z

Description

Recently, I have submitted a pull request to Hugging Face Transformers containing the implementation of the Qwen3 and Qwen3MoE model. I would also like to contribute these new modelsto vLLM.

In this PR, I have provided the implementation of the Qwen3 and Qwen3MoE model structure through two files: qwen3.py and qwen3_moe.py.

Related Tasks

Qwen3 PR in transformers (Pending Review)

github-actions · 2025-03-21T11:27:28Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

DarkLight1337

Some initial comments, PTAL!

DarkLight1337 · 2025-03-21T11:50:28Z

vllm/model_executor/models/qwen3.py

Since this is identical as Qwen2MLP, can we simply from .qwen2 import Qwen2MLP as Qwen3MLP?

DarkLight1337 · 2025-03-21T11:50:50Z

vllm/model_executor/models/qwen3.py

These arguments aren't used anymore, see #13555. Same for the outer modules.

vllm/model_executor/models/qwen3.py

DarkLight1337 · 2025-03-21T11:54:02Z

vllm/model_executor/models/qwen3.py

This embedding model is not necessary as we can automatically instantiate it using vllm.model_executor.models.adapters.as_embedding_model. The one in Qwen2 file is a special case for backward compatibility.

Thank you for catching that issue! I've updated the code accordingly and pushed the changes. Let me know if there's anything else you'd like me to address."

DarkLight1337 · 2025-03-21T11:55:08Z

Please add this model to the following pages as well:

Supported Models page
tests/models/registry.py (set is_available_online=False to pass CI until the model repo is released on HF)

jeejeelee · 2025-03-21T13:00:24Z

vllm/model_executor/models/qwen3.py

supported_lora_modules has been deprecated. Please remove it

jeejeelee · 2025-03-21T13:30:58Z

vllm/model_executor/models/qwen3.py

plz remove embedding_modules and embedding_padding_modules

jeejeelee · 2025-03-21T13:34:09Z

vllm/model_executor/models/qwen3_moe.py

Suggested change

reduce_results: bool = True,

reduce_results: bool = True,

prefix: str = "",

jeejeelee · 2025-03-21T13:38:25Z

vllm/model_executor/models/qwen3_moe.py

It looks like that many layers in this model are missing prefix. Please add it.

Suggested change

reduce_results=reduce_results)

reduce_results=reduce_results,

prefix=f"{prefix}.down_proj")

jeejeelee · 2025-03-21T13:41:10Z

vllm/model_executor/models/registry.py

QQ：shouldn't it be qwen3?

jeejeelee · 2025-03-21T13:45:00Z

vllm/model_executor/models/qwen3_moe.py

This doesn't significantly affect the current PR. I suggest adding this arch to https://github.com/vllm-project/vllm/blob/main/benchmarks/kernels/benchmark_moe.py#L528C38-L528C57

Thank you for your detailed feedback! I have carefully reviewed and addressed each of the suggestions you provided. The corresponding changes have been implemented and pushed to the PR. Please let me know if there are any additional concerns or areas you'd like me to refine further.

Isotr0py · 2025-03-21T14:03:30Z

vllm/model_executor/models/qwen3.py

I think we can also add a _init_model method to Qwen2ForCausalLM then inherit from it for qwen3 to reduce duplicate codes as well (just like Llama).

Thank you for your thorough feedback! I have carefully reviewed and implemented all the suggested changes, which have now been pushed to the PR.

DarkLight1337 · 2025-03-25T15:03:46Z

vllm/model_executor/models/qwen2.py

Suggested change

decoder_layer_type=None):

decoder_layer_type: type[nn.Module] = Qwen2DecoderLayer):

That way we don't need to set the default inside the function body

Did you miss this?

vllm/model_executor/models/qwen3.py

vllm/model_executor/models/registry.py

jeejeelee

Overall LGTM, thank you for your amazing work !!!

ywang96

Thanks for the contribution! @YamPengLi

Could you address the final round of the comments from the reviewers? Also since this model hasn't been released, there's no way for us to verify the implementation so please let us know if the evals look reasonable to you so that we can merge the PR. Thanks!

YamPengLi · 2025-04-01T08:35:29Z

Thanks for the contribution! @YamPengLi

Could you address the final round of the comments from the reviewers? Also since this model hasn't been released, there's no way for us to verify the implementation so please let us know if the evals look reasonable to you so that we can merge the PR. Thanks!

Thank you for your feedback! @ywang96 I've gone through the final round of comments and addressed them accordingly. Please take a look when you have time. Thanks!

Signed-off-by: YamPengLi <[email protected]>

…tifiers for Qwen3 and Qwen3MoE models. Signed-off-by: YamPengLi <[email protected]>

…ustom decoder layer types and simplifying the architecture. Removed unused imports and parameters for clarity. Signed-off-by: YamPengLi <[email protected]>

…efix support for various components, enhancing clarity and maintainability. Removed unused imports and parameters to simplify the codebase. Signed-off-by: YamPengLi <[email protected]>

…ability set to false, enhancing the model catalog. Signed-off-by: YamPengLi <[email protected]>

Signed-off-by: YamPengLi <[email protected]>

docs/source/models/supported_models.md

Co-authored-by: Cyrus Leung <[email protected]>

DarkLight1337 · 2025-04-03T06:32:37Z

Is your team planning to release the model repo on HF before or after merging this PR?

YamPengLi · 2025-04-03T06:35:51Z

Is your team planning to release the model repo on HF before or after merging this PR?

We are planning to release the model repository on HF after merging this PR. @DarkLight1337

DarkLight1337 · 2025-04-03T06:43:55Z

Nice, can you address the remaining comment from me?

…yer for improved clarity and usability. Signed-off-by: YamPengLi <[email protected]>

DarkLight1337

Thanks for upstreaming this!

DarkLight1337 · 2025-04-03T07:09:58Z

Can you merge from main to fix docker build?

tests/models/registry.py

heheda12345 · 2025-04-03T12:52:08Z

@YamPengLi In huggingface transformer implementation, there are some lines to check self.config.max_window_layers but I don't find such logic in this PR. Are you doing that intentionally?
https://github.com/huggingface/transformers/blob/782d7d945dbcfc8184264ea24aae31c15147e132/src/transformers/models/qwen3/modeling_qwen3.py#L198-L203

Signed-off-by: YamPengLi <[email protected]>

DarkLight1337 · 2025-04-07T11:06:34Z

As per offline discussion, this will not be a problem for the official release. Merging since the failing tests are unrelated to this PR.

Signed-off-by: YamPengLi <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

EwoutH · 2025-04-09T20:16:58Z

Very excited, thanks for your hard work!

We are planning to release the model repository on HF after merging this PR.

Do you have any idea about a potential release date?

Signed-off-by: YamPengLi <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Yang Wang <[email protected]>

Signed-off-by: YamPengLi <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

Signed-off-by: YamPengLi <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Mu Huai <[email protected]>

DarkLight1337 reviewed Mar 21, 2025

View reviewed changes

DarkLight1337 changed the title ~~Add Qwen3 and Qwen3MoE~~ [Model] Add Qwen3 and Qwen3MoE Mar 21, 2025

jeejeelee reviewed Mar 21, 2025

View reviewed changes

vllm/model_executor/models/qwen3.py Outdated

Copy link

Collaborator

jeejeelee Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supported_lora_modules has been deprecated. Please remove it

jeejeelee reviewed Mar 21, 2025

View reviewed changes

vllm/model_executor/models/qwen3.py Outdated

Copy link

Collaborator

jeejeelee Mar 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz remove embedding_modules and embedding_padding_modules

jeejeelee reviewed Mar 21, 2025

View reviewed changes

Isotr0py reviewed Mar 21, 2025

View reviewed changes

yhyang201 mentioned this pull request Mar 23, 2025

[Model] Adding Qwen3 and Qwen3MoE sgl-project/sglang#4693

Merged

jeejeelee requested review from jeejeelee and removed request for jeejeelee March 24, 2025 14:27

YamPengLi requested a review from ywang96 as a code owner March 25, 2025 14:53

DarkLight1337 reviewed Mar 25, 2025

View reviewed changes

jeejeelee reviewed Mar 31, 2025

View reviewed changes

vllm/model_executor/models/qwen3.py Outdated Show resolved Hide resolved

jeejeelee reviewed Mar 31, 2025

View reviewed changes

vllm/model_executor/models/registry.py Outdated Show resolved Hide resolved

jeejeelee approved these changes Mar 31, 2025

View reviewed changes

ywang96 approved these changes Apr 1, 2025

View reviewed changes

YamPengLi added 7 commits April 3, 2025 10:39

Initial commit for Qwen3

b173c35

Signed-off-by: YamPengLi <[email protected]>

Update Qwen3 model registration in the model registry to correct iden…

e40109c

…tifiers for Qwen3 and Qwen3MoE models. Signed-off-by: YamPengLi <[email protected]>

Refactor Qwen3 model to inherit from Qwen2Model, adding support for c…

0f90cdc

…ustom decoder layer types and simplifying the architecture. Removed unused imports and parameters for clarity. Signed-off-by: YamPengLi <[email protected]>

Refactor Qwen3MoE model to streamline parameter handling by adding pr…

b4894fc

…efix support for various components, enhancing clarity and maintainability. Removed unused imports and parameters to simplify the codebase. Signed-off-by: YamPengLi <[email protected]>

Add Qwen3 and Qwen3MoE models to the model registry with online avail…

e944ce3

…ability set to false, enhancing the model catalog. Signed-off-by: YamPengLi <[email protected]>

Add packed_modules_mapping to Qwen3ForCausalLM for LoRA supporting

c5178eb

Signed-off-by: YamPengLi <[email protected]>

Updata docs

5431faf

Signed-off-by: YamPengLi <[email protected]>

YamPengLi force-pushed the qwen3 branch from ac7cc6d to 5431faf Compare April 3, 2025 03:24

mergify bot added the documentation Improvements or additions to documentation label Apr 3, 2025

DarkLight1337 reviewed Apr 3, 2025

View reviewed changes

docs/source/models/supported_models.md Show resolved Hide resolved

Update docs/source/models/supported_models.md

0ce39a5

Co-authored-by: Cyrus Leung <[email protected]>

Update Qwen2Model to set default decoder_layer_type to Qwen2DecoderLa…

6d8da4e

…yer for improved clarity and usability. Signed-off-by: YamPengLi <[email protected]>

DarkLight1337 approved these changes Apr 3, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) April 3, 2025 07:09

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 3, 2025

Isotr0py approved these changes Apr 3, 2025

View reviewed changes

Merge remote-tracking branch 'origin/main' into qwen3

76c145d

DarkLight1337 reviewed Apr 3, 2025

View reviewed changes

tests/models/registry.py Outdated Show resolved Hide resolved

Add min_transformers_version

25e4a80

Signed-off-by: YamPengLi <[email protected]>

auto-merge was automatically disabled April 7, 2025 09:01
Head branch was pushed to by a user without write access

vllm-bot merged commit 7699258 into vllm-project:main Apr 7, 2025
40 of 46 checks passed

lengrongfu pushed a commit to lengrongfu/vllm that referenced this pull request Apr 7, 2025

[Model] Add Qwen3 and Qwen3MoE (vllm-project#15289)

19e3c56

Signed-off-by: YamPengLi <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

yangw-dev pushed a commit to yangw-dev/vllm that referenced this pull request Apr 21, 2025

[Model] Add Qwen3 and Qwen3MoE (vllm-project#15289)

9b3a0cf

Signed-off-by: YamPengLi <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Yang Wang <[email protected]>

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

[Model] Add Qwen3 and Qwen3MoE (vllm-project#15289)

eb5b4c1

Signed-off-by: YamPengLi <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[Model] Add Qwen3 and Qwen3MoE (vllm-project#15289)

07e085e

Signed-off-by: YamPengLi <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Mu Huai <[email protected]>

	reduce_results: bool = True,
	reduce_results: bool = True,
	prefix: str = "",

	reduce_results=reduce_results)
	reduce_results=reduce_results,
	prefix=f"{prefix}.down_proj")

	decoder_layer_type=None):
	decoder_layer_type: type[nn.Module] = Qwen2DecoderLayer):

Uh oh!

[Model] Add Qwen3 and Qwen3MoE #15289

[Model] Add Qwen3 and Qwen3MoE #15289

Conversation

YamPengLi commented Mar 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Tasks

Uh oh!

github-actions bot commented Mar 21, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Mar 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jeejeelee left a comment

Choose a reason for hiding this comment

Uh oh!

ywang96 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

YamPengLi commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Apr 3, 2025

Uh oh!

YamPengLi commented Apr 3, 2025

Uh oh!

DarkLight1337 commented Apr 3, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Apr 3, 2025

Uh oh!

Uh oh!

heheda12345 commented Apr 3, 2025

Uh oh!

DarkLight1337 commented Apr 7, 2025

Uh oh!

Uh oh!

EwoutH commented Apr 9, 2025

YamPengLi commented Mar 21, 2025 •

edited by github-actions bot

Loading

ywang96 left a comment •

edited

Loading

YamPengLi commented Apr 1, 2025 •

edited

Loading