-
Notifications
You must be signed in to change notification settings - Fork 702
[Bugfix] Fix output tensor shape in vanilla_chunked_prefill and update import paths for model_loader #773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Yizhou Liu <[email protected]>
…oader in NPUModelRunnerBase Signed-off-by: Yizhou Liu <[email protected]>
jianzs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make sure the vllm-ascend main branch works with vllm versions 0.8.5 and 0.8.5.post1.
3e53cb9 to
87badbf
Compare
|
Looks CI is incomplete, can you trigger it again with a new commit? |
10a64bc to
ef7fb88
Compare
046deab to
eb8ccd2
Compare
Signed-off-by: Yizhou Liu <[email protected]>
| ) -> None: | ||
| from vllm.model_executor.model_loader.loader import ShardedStateLoader | ||
| if vllm_version_is("0.8.5") or vllm_version_is("0.8.5.post1"): | ||
| from vllm.model_executor.model_loader.loader import ShardedStateLoader # type: ignore[import] # isort: skip # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a reasonable approach to skip code static checking. we could not install 2 version of vllm in the same python env, thus I agree to skip this in v0.8.5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we ensure vllm-ascend main branch is compatible with both vllm version 0.8.5 and 0.8.5.post1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but this is just skip code format static check of 0.8.5 in CI of main branch. This has no impact on the features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using vLLM v0.8.5 without checking this condition for v0.8.5 will trigger the else branch, causing a problem.
|
Merge this pull request ASAP; many others are blocked. @MengqingCao @Yikun |
…e import paths for model_loader (vllm-project#773) <!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> Fix output tensor shape in vanilla_chunked_prefill function. ### Does this PR introduce _any_ user-facing change? <!-- Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> None. ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> Run offline inference on DeepSeek models. --------- Signed-off-by: Yizhou Liu <[email protected]>
…e import paths for model_loader (vllm-project#773) <!-- Thanks for sending a pull request! BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing/overview.html --> ### What this PR does / why we need it? <!-- - Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. If possible, please consider writing useful notes for better and faster reviews in your PR. - Please clarify why the changes are needed. For instance, the use case and bug description. - Fixes # --> Fix output tensor shape in vanilla_chunked_prefill function. ### Does this PR introduce _any_ user-facing change? <!-- Note that it means *any* user-facing change including all aspects such as API, interface or other behavior changes. Documentation-only updates are not considered user-facing changes. --> None. ### How was this patch tested? <!-- CI passed with new added/existing test. If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future. If tests were not added, please describe why they were not added and/or why it was difficult to add. --> Run offline inference on DeepSeek models. --------- Signed-off-by: Yizhou Liu <[email protected]>
What this PR does / why we need it?
Fix output tensor shape in vanilla_chunked_prefill function.
Does this PR introduce any user-facing change?
None.
How was this patch tested?
Run offline inference on DeepSeek models.