[Model] use AutoWeightsLoader for solar by lengrongfu · Pull Request #18113 · vllm-project/vllm

lengrongfu · 2025-05-14T03:38:47Z

olmo.py

python3 -m vllm.entrypoints.cli.main serve allenai/OLMo-1B-hf --trust-remote-code --gpu-memory-utilization 0.95

olmo2.py

python3 -m vllm.entrypoints.cli.main serve allenai/OLMo-2-0425-1B --trust-remote-code --gpu-memory-utilization 0.95

mixtral_quant.py

python3 -m vllm.entrypoints.cli.main serve TitanML/tiny-mixtral --trust-remote-code --gpu-memory-utilization 0.95

solar.py

python3 -m vllm.entrypoints.cli.main serve upstage/solar-pro-preview-instruct --trust-remote-code --gpu-memory-utilization 0.95

nemotron.py

python3 -m vllm.entrypoints.cli.main serve nvidia/Minitron-4B-Base --trust-remote-code --gpu-memory-utilization 0.95

github-actions · 2025-05-14T03:38:55Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mergify · 2025-05-15T05:27:55Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @lengrongfu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Isotr0py · 2025-05-16T04:27:46Z

vllm/model_executor/models/olmo2.py

Although this won't affect too much, I think rotary_emb.inv_freq, rotary_emb.cos_cached and rotary_emb.sin_cached are not prefixes.

So what good suggestions do you have?

We can filter out the weights to skip loading in advance before calling AutoWeightsLoader, just like what Phi-4-MM does:

vllm/vllm/model_executor/models/phi4mm.py

Lines 1241 to 1246 in 5418176

def load_weights(self, weights: Iterable[tuple[str,

torch.Tensor]]) -> None:

weights = ((name, data) for name, data in weights

if "lora" not in name)

loader = AutoWeightsLoader(self)

return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)

Does the situation you are talking about occur? Currently, I have tested that it works properly with prefix

Yea, this situation seldom occurred in most cases, because only some models checkpoint fine-tuned by ColossalAI may include these tensors, and the models you tested are likely not having these tensors.

['model.layers.1.self_attn.rotary_emb.inv_freq', 'model.layers.9.self_attn.rotary_emb.inv_freq', 'model.layers.22.self_attn.rotary_emb.inv_freq', 'model.layers.24.self_attn.rotary_emb.inv_freq', ... ]

But it's still reasonable to make the weights loading more robust to include this rare case.

Isotr0py

Anyway, since the RoPE buffer tensors in checkpoint is not a bug issue. Let's merge this PR and leave it to be handled with other models' modified loading logic together in a following PR.

vllm/model_executor/models/olmo2.py

Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>

Signed-off-by: Yuqi Zhang <yuqizhang@google.com>

lengrongfu force-pushed the feat/new1-use-autoweights branch 2 times, most recently from f1c836e to d7e6366 Compare May 14, 2025 16:56

mergify bot added the needs-rebase label May 15, 2025

lengrongfu force-pushed the feat/new1-use-autoweights branch from d7e6366 to b9c79ad Compare May 15, 2025 10:44

mergify bot removed the needs-rebase label May 15, 2025

lengrongfu marked this pull request as ready for review May 15, 2025 10:46

Isotr0py reviewed May 16, 2025

View reviewed changes

Isotr0py approved these changes May 16, 2025

View reviewed changes

Isotr0py added the ready ONLY add when PR is ready to merge/full CI is needed label May 16, 2025

Isotr0py reviewed May 16, 2025

View reviewed changes

vllm/model_executor/models/olmo2.py Outdated Show resolved Hide resolved

[Model] use AutoWeightsLoader for solar

922302f

Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>

lengrongfu force-pushed the feat/new1-use-autoweights branch from b9c79ad to 922302f Compare May 16, 2025 16:01

Isotr0py enabled auto-merge (squash) May 16, 2025 16:06

simon-mo merged commit 9214e60 into vllm-project:main May 17, 2025
59 of 62 checks passed

Isotr0py mentioned this pull request May 19, 2025

[Misc] Allow AutoWeightsLoader to skip loading weights with specific substr in name #18358

Merged

2015aroras mentioned this pull request May 21, 2025

[Bugfix][Model] Make Olmo2Model weight loading return loaded weights #18504

Merged

zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025

[Model] use AutoWeightsLoader for solar (vllm-project#18113)

d5c12dc

Signed-off-by: Yuqi Zhang <yuqizhang@google.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Model] use AutoWeightsLoader for solar#18113

[Model] use AutoWeightsLoader for solar#18113
simon-mo merged 1 commit intovllm-project:mainfrom
lengrongfu:feat/new1-use-autoweights

lengrongfu commented May 14, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented May 14, 2025

Uh oh!

mergify bot commented May 15, 2025

Uh oh!

Isotr0py May 16, 2025

Uh oh!

lengrongfu May 16, 2025

Uh oh!

Isotr0py May 16, 2025

Uh oh!

lengrongfu May 16, 2025

Uh oh!

Isotr0py May 16, 2025 •

edited

Loading

Uh oh!

Isotr0py left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	def load_weights(self, weights: Iterable[tuple[str,
	torch.Tensor]]) -> None:
	weights = ((name, data) for name, data in weights
	if "lora" not in name)
	loader = AutoWeightsLoader(self)
	return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)

Uh oh!

Conversation

lengrongfu commented May 14, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 14, 2025

Uh oh!

mergify bot commented May 15, 2025

Uh oh!

Isotr0py May 16, 2025

Choose a reason for hiding this comment

Uh oh!

lengrongfu May 16, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py May 16, 2025

Choose a reason for hiding this comment

Uh oh!

lengrongfu May 16, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lengrongfu commented May 14, 2025 •

edited by github-actions bot

Loading

Isotr0py May 16, 2025 •

edited

Loading