[Fix] Add default rope theta for qwen1 model#30369
[Fix] Add default rope theta for qwen1 model#30369iwzbi wants to merge 1 commit intovllm-project:mainfrom
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
There was a problem hiding this comment.
Code Review
This pull request aims to fix a KeyError for Qwen1 models that are missing the rope_theta parameter in their configuration. The fix involves adding a default value for rope_theta. While the change correctly addresses the issue, the implementation can be improved for efficiency and code quality by moving the function call to a more appropriate location where it is executed only once during model initialization, instead of for every layer.
| @@ -149,6 +150,7 @@ def __init__( | |||
| prefix: str = "", | |||
| ): | |||
| super().__init__() | |||
| set_default_rope_theta(config, default_theta=10000) | |||
There was a problem hiding this comment.
Calling set_default_rope_theta in QWenBlock.__init__ is inefficient as it's executed for every model layer. This function should be called only once per model initialization.
Please move this call to the beginning of QWenModel.__init__ (e.g., after line 201) to ensure it runs only once. This will improve model loading efficiency and align with best practices seen in other models in this repository.
|
cc @hmellor |
|
Could you please share the checkpoint you are using so I can try to reproduce the error? The latest vllm/vllm/transformers_utils/config.py Lines 309 to 318 in 2dcbac9 Which means that |
please try this model: https://huggingface.co/Qwen/Qwen-1_8B-Chat |
|
I've confirmed that this issue is not present when using vLLM with Transformers 4.57.3. It is only present when using Transformesr main branch, which vLLM does not currently support. The reason for the difference is that in Transformers v4 we hand roll our own standardisation, but for v5 we use the new built in standardisation: vllm/vllm/transformers_utils/config.py Lines 309 to 328 in aa3c250 The v5 standardisation doesn't check the non-standard names ( Since these non-standard names only appear in custom models, it doesn't make sense to check them in Transformers itself. I'll make a PR which ensures that these old custom models are forward compatible. |
Need to set default rope theta for qwen1 model.
error:
Qwen1 config:

Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.