Skip to content

Conversation

@xuechendi
Copy link
Contributor

@xuechendi xuechendi commented Jun 27, 2025

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Fix issue when loading deepseek model, current error log
Failing is introduced by #18343

		INFO 06-27 21:12:49 [default_loader.py:272] Loading weights took 7.88 seconds
		ERROR 06-27 21:12:49 [core.py:519] EngineCore failed to start.
		ERROR 06-27 21:12:49 [core.py:519] Traceback (most recent call last):
		ERROR 06-27 21:12:49 [core.py:519]   File "/workspace/vllm/vllm/v1/engine/core.py", line 510, in run_engine_core
		ERROR 06-27 21:12:49 [core.py:519]     engine_core = EngineCoreProc(*args, **kwargs)
		ERROR 06-27 21:12:49 [core.py:519]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
		ERROR 06-27 21:12:49 [core.py:519]   File "/workspace/vllm/vllm/v1/engine/core.py", line 394, in __init__
		ERROR 06-27 21:12:49 [core.py:519]     super().__init__(vllm_config, executor_class, log_stats,
		ERROR 06-27 21:12:49 [core.py:519]   File "/workspace/vllm/vllm/v1/engine/core.py", line 75, in __init__
		ERROR 06-27 21:12:49 [core.py:519]     self.model_executor = executor_class(vllm_config)
		ERROR 06-27 21:12:49 [core.py:519]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
		ERROR 06-27 21:12:49 [core.py:519]   File "/workspace/vllm/vllm/executor/executor_base.py", line 53, in __init__
		ERROR 06-27 21:12:49 [core.py:519]     self._init_executor()
		ERROR 06-27 21:12:49 [core.py:519]   File "/workspace/vllm/vllm/executor/uniproc_executor.py", line 48, in _init_executor
		ERROR 06-27 21:12:49 [core.py:519]     self.collective_rpc("load_model")
		ERROR 06-27 21:12:49 [core.py:519]   File "/workspace/vllm/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
		ERROR 06-27 21:12:49 [core.py:519]     answer = run_method(self.driver_worker, method, args, kwargs)
		ERROR 06-27 21:12:49 [core.py:519]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
		ERROR 06-27 21:12:49 [core.py:519]   File "/workspace/vllm/vllm/utils.py", line 2687, in run_method
		ERROR 06-27 21:12:49 [core.py:519]     return func(*args, **kwargs)
		ERROR 06-27 21:12:49 [core.py:519]            ^^^^^^^^^^^^^^^^^^^^^
		ERROR 06-27 21:12:49 [core.py:519]   File "/workspace/vllm-gaudi/vllm_hpu/v1/worker/hpu_worker.py", line 112, in load_model
		ERROR 06-27 21:12:49 [core.py:519]     self.model_runner.load_model()
		ERROR 06-27 21:12:49 [core.py:519]   File "/workspace/vllm-gaudi/vllm_hpu/v1/worker/hpu_model_runner.py", line 1688, in load_model
		ERROR 06-27 21:12:49 [core.py:519]     self.model = get_model(vllm_config=self.vllm_config)
		ERROR 06-27 21:12:49 [core.py:519]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
		ERROR 06-27 21:12:49 [core.py:519]   File "/workspace/vllm/vllm/model_executor/model_loader/__init__.py", line 59, in get_model
		ERROR 06-27 21:12:49 [core.py:519]     return loader.load_model(vllm_config=vllm_config,
		ERROR 06-27 21:12:49 [core.py:519]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
		ERROR 06-27 21:12:49 [core.py:519]   File "/workspace/vllm/vllm/model_executor/model_loader/base_loader.py", line 41, in load_model
		ERROR 06-27 21:12:49 [core.py:519]     self.load_weights(model, model_config)
		ERROR 06-27 21:12:49 [core.py:519]   File "/workspace/vllm/vllm/model_executor/model_loader/default_loader.py", line 281, in load_weights
		ERROR 06-27 21:12:49 [core.py:519]     raise ValueError("Following weights were not initialized from "
ERROR 06-27 21:12:49 [core.py:519] ValueError: Following weights were not initialized from checkpoint: {'model.layers.1.mlp.experts.w13_weight', 'model.layers.6.mlp.experts.w2_weight', 'model.layers.18.mlp.experts.w13_weight', 'model.layers.24.mlp.experts.w13_weight', 'model.layers.5.mlp.experts.w13_weight', 'model.layers.12.mlp.experts.w13_weight', 'model.layers.21.mlp.experts.w13_weight', 'model.layers.7.mlp.experts.w2_weight', 'model.layers.18.mlp.experts.w2_weight', 'model.layers.3.mlp.experts.w13_weight', 'model.layers.22.mlp.experts.w13_weight', 'model.layers.25.mlp.experts.w2_weight', 'model.layers.16.mlp.experts.w2_weight', 'model.layers.9.mlp.experts.w2_weight', 'model.layers.11.mlp.experts.w2_weight', 'model.layers.4.mlp.experts.w2_weight', 'model.layers.17.mlp.experts.w2_weight', 'model.layers.13.mlp.experts.w2_weight', 'model.layers.15.mlp.experts.w13_weight', 'model.layers.9.mlp.experts.w13_weight', 'model.layers.12.mlp.experts.w2_weight', 'model.layers.19.mlp.experts.w13_weight', 'model.layers.8.mlp.experts.w13_weight', 'model.layers.23.mlp.experts.w2_weight', 'model.layers.6.mlp.experts.w13_weight', 'model.layers.26.mlp.experts.w13_weight', 'model.layers.10.mlp.experts.w2_weight', 'model.layers.5.mlp.experts.w2_weight', 'model.layers.14.mlp.experts.w13_weight', 'model.layers.20.mlp.experts.w13_weight', 'model.layers.2.mlp.experts.w2_weight', 'model.layers.10.mlp.experts.w13_weight', 'model.layers.21.mlp.experts.w2_weight', 'model.layers.22.mlp.experts.w2_weight', 'model.layers.26.mlp.experts.w2_weight', 'model.layers.23.mlp.experts.w13_weight', 'model.layers.25.mlp.experts.w13_weight', 'model.layers.1.mlp.experts.w2_weight', 'model.layers.11.mlp.experts.w13_weight', 'model.layers.16.mlp.experts.w13_weight', 'model.layers.3.mlp.experts.w2_weight', 'model.layers.24.mlp.experts.w2_weight', 'model.layers.4.mlp.experts.w13_weight', 'model.layers.15.mlp.experts.w2_weight', 'model.layers.17.mlp.experts.w13_weight', 'model.layers.13.mlp.experts.w13_weight', 'model.layers.8.mlp.experts.w2_weight', 'model.layers.2.mlp.experts.w13_weight', 'model.layers.20.mlp.experts.w2_weight', 'model.layers.7.mlp.experts.w13_weight', 'model.layers.19.mlp.experts.w2_weight', 'model.layers.14.mlp.experts.w2_weight'}

Test Plan

Test Result

(Optional) Documentation Update

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @xuechendi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a specific model loading issue for the DeepSeek v2 architecture. The core problem was that certain expert weights were not being recognized and initialized during the model loading phase, causing the entire process to fail. My change ensures that the correct weight names are used throughout the loading mechanism, enabling DeepSeek models to be loaded without errors.

Highlights

  • DeepSeek Model Loading Fix: I've addressed a critical bug that prevented the DeepSeek v2 model from loading successfully. Specifically, the issue stemmed from w13_weight and w2_weight expert weights not being properly initialized from the checkpoint, leading to a ValueError. The fix ensures that when a weight name is mapped during the loading process, the mapped name is correctly propagated for subsequent operations, allowing these weights to be found and loaded.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The PR fixes an issue with uninitialized weights in DeepSeek models by updating the parameter name after checkpoint mapping. However, the current implementation incorrectly updates a loop variable, potentially skipping valid mappings. The suggestion ensures the name is updated only upon successful weight loading.

@xuechendi
Copy link
Contributor Author

@abmfy @WoosukKwon may you take a look of this quick fix.
I don't have context of why we now use 'name_mapped' instead of name for all w13_weight and w2_weight, from my local test, since load_weights function returns param_names which being loaded. And name_mapped is being recorded to loaded_params. Which caused above error.

@xuechendi xuechendi force-pushed the fix_deepseek_load_weights branch from caef4c2 to ca363c5 Compare June 27, 2025 21:25
@abmfy
Copy link
Member

abmfy commented Jun 27, 2025

@abmfy @WoosukKwon may you take a look of this quick fix. I don't have context of why we now use 'name_mapped' instead of name for all w13_weight and w2_weight, from my local test, since load_weights function returns param_names which being loaded. And name_mapped is being recorded to loaded_params. Which caused above error.

Hi @xuechendi, yes that's an overlook by me in #18343, we should add name_mapped to loaded_params.

Currently fix LGTM. Details come later

@xuechendi
Copy link
Contributor Author

@abmfy , thanks for quick reply, may you help to approve this PR and trigger CI.
So we can have a quick fix to unblock deepseek model, Thanks.

@abmfy
Copy link
Member

abmfy commented Jun 27, 2025

@abmfy , thanks for quick reply, may you help to approve this PR and trigger CI. So we can have a quick fix to unblock deepseek model, Thanks.

Hi @xuechendi, I’m not sure if I have permission to trigger the CI.

Here’s a detailed explanation of the issue:

TL;DR It was a bug, and the proposed fix is correct.

I made this change because, under EPLB settings, there are cases where a logical expert is replicated multiple times, with each replica corresponding to an entry in expert_params_mapping.

Previously, when weight_name in name was true, it meant either we needed to load this weight or discard it if the expert was not local. Therefore, the logic inside the for loop would run only once, and it was safe to modify name directly since we’d never enter the loop again.

However, under EPLB, weight_name in name now simply means we need to load this logical expert. But inside the loop, the current entry in expert_params_mapping might still refer to a physical expert that doesn’t belong to this rank. We must continue checking, since each logical expert can map to multiple physical experts. Thus, we can’t simply break out of the loop; instead, the weight loader must determine whether the weight belongs to this rank. If we modified name directly, subsequent iterations of the loop would fail to find the correct parameter name. That’s why we introduced a temporary variable to hold the mapped name instead.

Finally, it was my oversight that I didn’t update the logic for appending to loaded_params to use mapped_name. I missed this because, in the EPLB PR, I was testing with FP8 quantization, and default_loader.py#L278 only applies in the non-quantized case.

Copy link
Collaborator

@houseroad houseroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Could you add the test command and results to the PR description?

@houseroad houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 28, 2025
@houseroad houseroad enabled auto-merge (squash) June 28, 2025 01:42
@vllm-bot vllm-bot merged commit 5a52f38 into vllm-project:main Jun 30, 2025
78 of 82 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants