Skip to content

Conversation

@YamPengLi
Copy link
Contributor

@YamPengLi YamPengLi commented Mar 21, 2025

Description

Recently, I have submitted a pull request to Hugging Face Transformers containing the implementation of the Qwen3 and Qwen3MoE model. I would also like to contribute these new modelsto vLLM.

In this PR, I have provided the implementation of the Qwen3 and Qwen3MoE model structure through two files: qwen3.py and qwen3_moe.py.

Related Tasks

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments, PTAL!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is identical as Qwen2MLP, can we simply from .qwen2 import Qwen2MLP as Qwen3MLP?

Comment on lines 176 to 177
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These arguments aren't used anymore, see #13555. Same for the outer modules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This embedding model is not necessary as we can automatically instantiate it using vllm.model_executor.models.adapters.as_embedding_model. The one in Qwen2 file is a special case for backward compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching that issue! I've updated the code accordingly and pushed the changes. Let me know if there's anything else you'd like me to address."

@DarkLight1337 DarkLight1337 changed the title Add Qwen3 and Qwen3MoE [Model] Add Qwen3 and Qwen3MoE Mar 21, 2025
@DarkLight1337
Copy link
Member

Please add this model to the following pages as well:

  • Supported Models page
  • tests/models/registry.py (set is_available_online=False to pass CI until the model repo is released on HF)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supported_lora_modules has been deprecated. Please remove it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz remove embedding_modules and embedding_padding_modules

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
reduce_results: bool = True,
reduce_results: bool = True,
prefix: str = "",

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like that many layers in this model are missing prefix. Please add it.

Suggested change
reduce_results=reduce_results)
reduce_results=reduce_results,
prefix=f"{prefix}.down_proj")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ:shouldn't it be qwen3?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't significantly affect the current PR. I suggest adding this arch to https://github.com/vllm-project/vllm/blob/main/benchmarks/kernels/benchmark_moe.py#L528C38-L528C57

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your detailed feedback! I have carefully reviewed and addressed each of the suggestions you provided. The corresponding changes have been implemented and pushed to the PR. Please let me know if there are any additional concerns or areas you'd like me to refine further.

Comment on lines +466 to +271
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also add a _init_model method to Qwen2ForCausalLM then inherit from it for qwen3 to reduce duplicate codes as well (just like Llama).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your thorough feedback! I have carefully reviewed and implemented all the suggested changes, which have now been pushed to the PR.

@jeejeelee jeejeelee requested review from jeejeelee and removed request for jeejeelee March 24, 2025 14:27
@YamPengLi YamPengLi requested a review from ywang96 as a code owner March 25, 2025 14:53
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
decoder_layer_type=None):
decoder_layer_type: type[nn.Module] = Qwen2DecoderLayer):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That way we don't need to set the default inside the function body

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you miss this?

Copy link
Collaborator

@jeejeelee jeejeelee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, thank you for your amazing work !!!

Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! @YamPengLi

Could you address the final round of the comments from the reviewers? Also since this model hasn't been released, there's no way for us to verify the implementation so please let us know if the evals look reasonable to you so that we can merge the PR. Thanks!

@YamPengLi
Copy link
Contributor Author

YamPengLi commented Apr 1, 2025

Thanks for the contribution! @YamPengLi

Could you address the final round of the comments from the reviewers? Also since this model hasn't been released, there's no way for us to verify the implementation so please let us know if the evals look reasonable to you so that we can merge the PR. Thanks!

Thank you for your feedback! @ywang96 I've gone through the final round of comments and addressed them accordingly. Please take a look when you have time. Thanks! 

Signed-off-by: YamPengLi <[email protected]>
…tifiers for Qwen3 and Qwen3MoE models.

Signed-off-by: YamPengLi <[email protected]>
…ustom decoder layer types and simplifying the architecture. Removed unused imports and parameters for clarity.

Signed-off-by: YamPengLi <[email protected]>
…efix support for various components, enhancing clarity and maintainability. Removed unused imports and parameters to simplify the codebase.

Signed-off-by: YamPengLi <[email protected]>
…ability set to false, enhancing the model catalog.

Signed-off-by: YamPengLi <[email protected]>
Signed-off-by: YamPengLi <[email protected]>
@mergify mergify bot added the documentation Improvements or additions to documentation label Apr 3, 2025
@DarkLight1337
Copy link
Member

Is your team planning to release the model repo on HF before or after merging this PR?

@YamPengLi
Copy link
Contributor Author

Is your team planning to release the model repo on HF before or after merging this PR?

We are planning to release the model repository on HF after merging this PR. @DarkLight1337

@DarkLight1337
Copy link
Member

Nice, can you address the remaining comment from me?

…yer for improved clarity and usability.

Signed-off-by: YamPengLi <[email protected]>
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for upstreaming this!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) April 3, 2025 07:09
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 3, 2025
@DarkLight1337
Copy link
Member

Can you merge from main to fix docker build?

@heheda12345
Copy link
Collaborator

@YamPengLi In huggingface transformer implementation, there are some lines to check self.config.max_window_layers but I don't find such logic in this PR. Are you doing that intentionally?
https://github.com/huggingface/transformers/blob/782d7d945dbcfc8184264ea24aae31c15147e132/src/transformers/models/qwen3/modeling_qwen3.py#L198-L203

auto-merge was automatically disabled April 7, 2025 09:01

Head branch was pushed to by a user without write access

@DarkLight1337
Copy link
Member

As per offline discussion, this will not be a problem for the official release. Merging since the failing tests are unrelated to this PR.

@vllm-bot vllm-bot merged commit 7699258 into vllm-project:main Apr 7, 2025
40 of 46 checks passed
lengrongfu pushed a commit to lengrongfu/vllm that referenced this pull request Apr 7, 2025
@EwoutH
Copy link
Contributor

EwoutH commented Apr 9, 2025

Very excited, thanks for your hard work!

We are planning to release the model repository on HF after merging this PR.

Do you have any idea about a potential release date?

yangw-dev pushed a commit to yangw-dev/vllm that referenced this pull request Apr 21, 2025
Signed-off-by: YamPengLi <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: Yang Wang <[email protected]>
lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
Signed-off-by: YamPengLi <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: Mu Huai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants