Commit 8bcc0cc
authored
[bugfix] fix shared expert dp with hybrid kvcache (#2964)
### What this PR does / why we need it?
#2849 moves the
implementation of `shared_expert_dp` to torchair deepseek_modeling.
However, the calling of `set_forward_context` with `enforce_eager` and
`shared_expert_dp` falls back to the implementation of
model_runner_v1.py and set the global attn_metadata as a dictionary. It
leads to a RuntimerError when attn_metadata is got from the forward
context and used in torchair_deepseek_v2.py. This PR fixes this problem
by introducing the transformation of attn_metadata in this file.
Note that current E2E testing lacks the case of deepseek with
`shared_expert_dp`. We need to add an ST with `shared_expert_dp` in
testing workflow.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
e2e vllm serving with `enable_shared_expert_dp: true` passed.
- vLLM version: v0.10.2
- vLLM main:
vllm-project/vllm@de3e53a
Signed-off-by: linfeng-yuan <[email protected]>1 parent 1f6465c commit 8bcc0cc
1 file changed
+2
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
813 | 813 | | |
814 | 814 | | |
815 | 815 | | |
| 816 | + | |
| 817 | + | |
816 | 818 | | |
817 | 819 | | |
818 | 820 | | |
| |||
0 commit comments