[BugFix] Fix aclgraph accu problem in A2.#3163
Conversation
Signed-off-by: whx-sjtu <2952154980@qq.com>
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request aims to fix an accuracy issue with aclgraph by refactoring the AscendSharedFusedMoE layer to shield some logic from torch.dynamo. The approach is to move the implementation from the forward method to forward_impl. However, the new forward method contains a critical bug that will cause a ValueError at runtime due to incorrect tuple unpacking. I've provided a comment with a suggested fix.
| shared_out, fused_out = AscendFusedMoE.forward( | ||
| self, | ||
| hidden_states=hidden_states, | ||
| router_logits=router_logits, | ||
| ) | ||
| return shared_out, fused_out |
There was a problem hiding this comment.
The call to AscendFusedMoE.forward is incorrect. The MRO for AscendSharedFusedMoE will cause this to resolve to vllm.model_executor.layers.fused_moe.layer.FusedMoE.forward, which returns a single tensor. Attempting to unpack this single tensor into two variables, shared_out and fused_out, will raise a ValueError at runtime.
Given that the implementation logic has been moved into forward_impl, which correctly returns a tuple of two tensors, the forward method should likely just call self.forward_impl.
return self.forward_impl(
hidden_states=hidden_states,
router_logits=router_logits,
)|
LGTM. |
|
LGTM |
Signed-off-by: whx-sjtu <2952154980@qq.com>
5fdaa50 to
f8aa50b
Compare
f8aa50b to
02da01a
Compare
Signed-off-by: whx-sjtu <2952154980@qq.com>
|
This reverts commit 14d4ed5. Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
This PR fixes accuracy problem of aclgraph on A2. The problem is introduced by PR vllm-project#2980, which makes the `all_reduce` of shared_experts exposed to torch dynamo. This PR moves all the codes into forward_impl to shiled from torch dynamo. - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@17b4c66 --------- Signed-off-by: whx-sjtu <2952154980@qq.com>
This PR fixes accuracy problem of aclgraph on A2. The problem is introduced by PR vllm-project#2980, which makes the `all_reduce` of shared_experts exposed to torch dynamo. This PR moves all the codes into forward_impl to shiled from torch dynamo. - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@17b4c66 --------- Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: luolun <luolun1995@cmbchina.com>
This PR fixes accuracy problem of aclgraph on A2. The problem is introduced by PR vllm-project#2980, which makes the `all_reduce` of shared_experts exposed to torch dynamo. This PR moves all the codes into forward_impl to shiled from torch dynamo. - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@17b4c66 --------- Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: luolun <luolun1995@cmbchina.com>
This PR fixes accuracy problem of aclgraph on A2. The problem is introduced by PR vllm-project#2980, which makes the `all_reduce` of shared_experts exposed to torch dynamo. This PR moves all the codes into forward_impl to shiled from torch dynamo. - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@17b4c66 --------- Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: hwhaokun <haokun0405@163.com>
This PR fixes accuracy problem of aclgraph on A2. The problem is introduced by PR vllm-project#2980, which makes the `all_reduce` of shared_experts exposed to torch dynamo. This PR moves all the codes into forward_impl to shiled from torch dynamo. - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@17b4c66 --------- Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: nsdie <yeyifan@huawei.com>
This PR fixes accuracy problem of aclgraph on A2. The problem is introduced by PR vllm-project#2980, which makes the `all_reduce` of shared_experts exposed to torch dynamo. This PR moves all the codes into forward_impl to shiled from torch dynamo. - vLLM version: v0.10.2 - vLLM main: vllm-project/vllm@17b4c66 --------- Signed-off-by: whx-sjtu <2952154980@qq.com>
This PR fixes accuracy problem of aclgraph on A2. The problem is introduced by PR #2980, which makes the
all_reduceof shared_experts exposed to torch dynamo. This PR moves all the codes into forward_impl to shiled from torch dynamo.