[Model] use AutoWeightsLoader for solar#18113
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
f1c836e to
d7e6366
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
d7e6366 to
b9c79ad
Compare
vllm/model_executor/models/olmo2.py
Outdated
There was a problem hiding this comment.
Although this won't affect too much, I think rotary_emb.inv_freq, rotary_emb.cos_cached and rotary_emb.sin_cached are not prefixes.
There was a problem hiding this comment.
So what good suggestions do you have?
There was a problem hiding this comment.
We can filter out the weights to skip loading in advance before calling AutoWeightsLoader, just like what Phi-4-MM does:
vllm/vllm/model_executor/models/phi4mm.py
Lines 1241 to 1246 in 5418176
There was a problem hiding this comment.
Does the situation you are talking about occur? Currently, I have tested that it works properly with prefix
There was a problem hiding this comment.
Yea, this situation seldom occurred in most cases, because only some models checkpoint fine-tuned by ColossalAI may include these tensors, and the models you tested are likely not having these tensors.
['model.layers.1.self_attn.rotary_emb.inv_freq', 'model.layers.9.self_attn.rotary_emb.inv_freq', 'model.layers.22.self_attn.rotary_emb.inv_freq', 'model.layers.24.self_attn.rotary_emb.inv_freq', ... ]
But it's still reasonable to make the weights loading more robust to include this rare case.
Isotr0py
left a comment
There was a problem hiding this comment.
Anyway, since the RoPE buffer tensors in checkpoint is not a bug issue. Let's merge this PR and leave it to be handled with other models' modified loading logic together in a following PR.
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
b9c79ad to
922302f
Compare
Signed-off-by: Yuqi Zhang <yuqizhang@google.com>
Issue: #15697