Skip to content

Commit 8bcc0cc

Browse files
authored
[bugfix] fix shared expert dp with hybrid kvcache (#2964)
### What this PR does / why we need it? #2849 moves the implementation of `shared_expert_dp` to torchair deepseek_modeling. However, the calling of `set_forward_context` with `enforce_eager` and `shared_expert_dp` falls back to the implementation of model_runner_v1.py and set the global attn_metadata as a dictionary. It leads to a RuntimerError when attn_metadata is got from the forward context and used in torchair_deepseek_v2.py. This PR fixes this problem by introducing the transformation of attn_metadata in this file. Note that current E2E testing lacks the case of deepseek with `shared_expert_dp`. We need to add an ST with `shared_expert_dp` in testing workflow. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? e2e vllm serving with `enable_shared_expert_dp: true` passed. - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@de3e53a Signed-off-by: linfeng-yuan <[email protected]>
1 parent 1f6465c commit 8bcc0cc

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

vllm_ascend/torchair/models/torchair_deepseek_v2.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -813,6 +813,8 @@ def forward(
813813
residual = get_tp_group().all_gather(residual, 0)
814814

815815
attn_metadata = get_forward_context().attn_metadata
816+
if attn_metadata is not None and isinstance(attn_metadata, dict):
817+
attn_metadata = attn_metadata['model.layers.0.self_attn.attn']
816818
if attn_metadata is not None:
817819
num_tokens = attn_metadata.num_actual_tokens
818820
else:

0 commit comments

Comments
 (0)