[NPU] Upgrade to v0.14.0#820
Conversation
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
|
paste the comparison gain in performance |
|
also, please help optimize the log for metrics? currently it's very hard to understand |
|
@hsliuustc0106 I updated the benchmark result with #780 to compare before and after. But it's wired that enabling ACL graph makes performance regression... It needs more investigation. I will revert to eager mode by default. But this PR also fix the main branch breaking on NPU. So it's better to get it merged asap. Thanks!
I think after benchmark is done, metrics seems to be unnecessary. |
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
|
could you fix this PR and test the result again? we expect this PR merged by next week |
After vLLM-Ascend v0.14.0rc1 releases, I will also upgrade NPUModelRunner to align v0.14.0rc1 in this PR. |
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
|
@hsliuustc0106 @david6666666 Could you please take a look? This PR is ready. Docs and other things will be done in #671. Thanks! |
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
| self._shared_expert = shared_expert | ||
| self._shared_expert_gate = shared_expert_gate | ||
|
|
||
| @property |
There was a problem hiding this comment.
Why do we need the modification in the model file?
There was a problem hiding this comment.
We add those properties so the wrapper presents the same interface as a normal Qwen MLP. vLLM-Ascend’s SharedFusedMoE validation splits the shared expert into gate_up_proj → act_fn → down_proj and expects these attributes to exist. The wrapper only holds the real MLP, so we forward them to keep validation and split execution working on NPU. It doesn't affect GPU behaviour. When we upgrade to vLLM v0.15.0 or later, we will remove this hack wrapper because upstream has implemented shared expert in Qwen3-MoE that we can reuse directly.
Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: jzz <e1583181@u.nus.edu>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
This PR:
Test Plan
NPU:
GPU:
Test Result
use_audiouse_audiouse_mixed_modalitiesuse_mixed_modalitiesEssential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)