[bugfix] fix shared expert dp with hybrid kvcache (#2964)

linfeng-yuan · web-flow · commit 8bcc0ccd571a · 2025-09-17T20:01:47.000+08:00
### What this PR does / why we need it? #2849 moves the implementation of `shared_expert_dp` to torchair deepseek_modeling. However, the calling of `set_forward_context` with `enforce_eager` and `shared_expert_dp` falls back to the implementation of model_runner_v1.py and set the global attn_metadata as a dictionary. It leads to a RuntimerError when attn_metadata is got from the forward context and used in torchair_deepseek_v2.py. This PR fixes this problem by introducing the transformation of attn_metadata in this file. Note that current E2E testing lacks the case of deepseek with `shared_expert_dp`. We need to add an ST with `shared_expert_dp` in testing workflow. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? e2e vllm serving with `enable_shared_expert_dp: true` passed. - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@de3e53a Signed-off-by: linfeng-yuan <1102311262@qq.com>
diff --git a/vllm_ascend/torchair/models/torchair_deepseek_v2.py b/vllm_ascend/torchair/models/torchair_deepseek_v2.py
@@ -813,6 +813,8 @@ def forward(
             residual = get_tp_group().all_gather(residual, 0)
 
             attn_metadata = get_forward_context().attn_metadata
+            if attn_metadata is not None and isinstance(attn_metadata, dict):
+                attn_metadata = attn_metadata['model.layers.0.self_attn.attn']
             if attn_metadata is not None:
                 num_tokens = attn_metadata.num_actual_tokens
             else: