-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[Model] Add Qwen3 and Qwen3MoE #15289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
DarkLight1337
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments, PTAL!
vllm/model_executor/models/qwen3.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is identical as Qwen2MLP, can we simply from .qwen2 import Qwen2MLP as Qwen3MLP?
vllm/model_executor/models/qwen3.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These arguments aren't used anymore, see #13555. Same for the outer modules.
vllm/model_executor/models/qwen3.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This embedding model is not necessary as we can automatically instantiate it using vllm.model_executor.models.adapters.as_embedding_model. The one in Qwen2 file is a special case for backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for catching that issue! I've updated the code accordingly and pushed the changes. Let me know if there's anything else you'd like me to address."
|
Please add this model to the following pages as well:
|
vllm/model_executor/models/qwen3.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
supported_lora_modules has been deprecated. Please remove it
vllm/model_executor/models/qwen3.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plz remove embedding_modules and embedding_padding_modules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| reduce_results: bool = True, | |
| reduce_results: bool = True, | |
| prefix: str = "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like that many layers in this model are missing prefix. Please add it.
| reduce_results=reduce_results) | |
| reduce_results=reduce_results, | |
| prefix=f"{prefix}.down_proj") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ:shouldn't it be qwen3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't significantly affect the current PR. I suggest adding this arch to https://github.com/vllm-project/vllm/blob/main/benchmarks/kernels/benchmark_moe.py#L528C38-L528C57
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your detailed feedback! I have carefully reviewed and addressed each of the suggestions you provided. The corresponding changes have been implemented and pushed to the PR. Please let me know if there are any additional concerns or areas you'd like me to refine further.
vllm/model_executor/models/qwen3.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can also add a _init_model method to Qwen2ForCausalLM then inherit from it for qwen3 to reduce duplicate codes as well (just like Llama).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your thorough feedback! I have carefully reviewed and implemented all the suggested changes, which have now been pushed to the PR.
vllm/model_executor/models/qwen2.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| decoder_layer_type=None): | |
| decoder_layer_type: type[nn.Module] = Qwen2DecoderLayer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That way we don't need to set the default inside the function body
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you miss this?
jeejeelee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, thank you for your amazing work !!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! @YamPengLi
Could you address the final round of the comments from the reviewers? Also since this model hasn't been released, there's no way for us to verify the implementation so please let us know if the evals look reasonable to you so that we can merge the PR. Thanks!
Thank you for your feedback! @ywang96 I've gone through the final round of comments and addressed them accordingly. Please take a look when you have time. Thanks! |
Signed-off-by: YamPengLi <[email protected]>
…tifiers for Qwen3 and Qwen3MoE models. Signed-off-by: YamPengLi <[email protected]>
…ustom decoder layer types and simplifying the architecture. Removed unused imports and parameters for clarity. Signed-off-by: YamPengLi <[email protected]>
…efix support for various components, enhancing clarity and maintainability. Removed unused imports and parameters to simplify the codebase. Signed-off-by: YamPengLi <[email protected]>
…ability set to false, enhancing the model catalog. Signed-off-by: YamPengLi <[email protected]>
Signed-off-by: YamPengLi <[email protected]>
Signed-off-by: YamPengLi <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
|
Is your team planning to release the model repo on HF before or after merging this PR? |
We are planning to release the model repository on HF after merging this PR. @DarkLight1337 |
|
Nice, can you address the remaining comment from me? |
…yer for improved clarity and usability. Signed-off-by: YamPengLi <[email protected]>
DarkLight1337
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for upstreaming this!
|
Can you merge from main to fix docker build? |
|
@YamPengLi In huggingface transformer implementation, there are some lines to check |
Signed-off-by: YamPengLi <[email protected]>
Head branch was pushed to by a user without write access
|
As per offline discussion, this will not be a problem for the official release. Merging since the failing tests are unrelated to this PR. |
Signed-off-by: YamPengLi <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
|
Very excited, thanks for your hard work!
Do you have any idea about a potential release date? |
Signed-off-by: YamPengLi <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Yang Wang <[email protected]>
Signed-off-by: YamPengLi <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: YamPengLi <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Mu Huai <[email protected]>
Description
Recently, I have submitted a pull request to Hugging Face Transformers containing the implementation of the Qwen3 and Qwen3MoE model. I would also like to contribute these new modelsto vLLM.
In this PR, I have provided the implementation of the Qwen3 and Qwen3MoE model structure through two files:
qwen3.pyandqwen3_moe.py.Related Tasks