-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
[BUGFIX][DEEPSEEK][MODEL_LOAD] fix w13, w2 weight not initialized assert #20202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUGFIX][DEEPSEEK][MODEL_LOAD] fix w13, w2 weight not initialized assert #20202
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @xuechendi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request resolves a specific model loading issue for the DeepSeek v2 architecture. The core problem was that certain expert weights were not being recognized and initialized during the model loading phase, causing the entire process to fail. My change ensures that the correct weight names are used throughout the loading mechanism, enabling DeepSeek models to be loaded without errors.
Highlights
- DeepSeek Model Loading Fix: I've addressed a critical bug that prevented the DeepSeek v2 model from loading successfully. Specifically, the issue stemmed from
w13_weightandw2_weightexpert weights not being properly initialized from the checkpoint, leading to aValueError. The fix ensures that when a weight name is mapped during the loading process, the mapped name is correctly propagated for subsequent operations, allowing these weights to be found and loaded.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The PR fixes an issue with uninitialized weights in DeepSeek models by updating the parameter name after checkpoint mapping. However, the current implementation incorrectly updates a loop variable, potentially skipping valid mappings. The suggestion ensures the name is updated only upon successful weight loading.
|
@abmfy @WoosukKwon may you take a look of this quick fix. |
Signed-off-by: Chendi Xue <[email protected]>
caef4c2 to
ca363c5
Compare
Hi @xuechendi, yes that's an overlook by me in #18343, we should add Currently fix LGTM. Details come later |
|
@abmfy , thanks for quick reply, may you help to approve this PR and trigger CI. |
Hi @xuechendi, I’m not sure if I have permission to trigger the CI. Here’s a detailed explanation of the issue: TL;DR It was a bug, and the proposed fix is correct. I made this change because, under EPLB settings, there are cases where a logical expert is replicated multiple times, with each replica corresponding to an entry in Previously, when However, under EPLB, Finally, it was my oversight that I didn’t update the logic for appending to |
houseroad
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Could you add the test command and results to the PR description?
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
Fix issue when loading deepseek model, current error log
Failing is introduced by #18343
Test Plan
Test Result
(Optional) Documentation Update