Reapply "[Refactor] Unify full-graph parameter update logic (#6041)" (#6227)#6231
Conversation
…ject#6041)" (vllm-project#6227) This reverts commit 9564934.
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request refactors the full-graph parameter update logic by moving implementation-specific details into their respective attention backend classes and using a unified dispatch function. This is a good architectural improvement that enhances modularity. However, I've identified a critical issue where draft_attn_metadatas is not correctly propagated in the new implementation, which could lead to incorrect behavior for draft models in graph mode. I've provided a suggestion to address this.
| update_full_graph_params( | ||
| self.runner.attn_backend, self.update_stream, forward_context, num_tokens, | ||
| self.vllm_config, self.vllm_config.speculative_config) |
There was a problem hiding this comment.
The draft_attn_metadatas parameter is passed into _update_full_graph_params but is not used. The new update_full_graph_params function does not accept it as an argument, and the underlying implementation in AscendAttentionBackendImpl.update_graph_params expects this data to be in forward_context.draft_attn_metadatas. By not setting it on the context, this crucial metadata for the draft model is lost, which will likely cause incorrect behavior or errors when running in graph mode. Please set draft_attn_metadatas on the forward_context before calling update_full_graph_params.
| update_full_graph_params( | |
| self.runner.attn_backend, self.update_stream, forward_context, num_tokens, | |
| self.vllm_config, self.vllm_config.speculative_config) | |
| forward_context.draft_attn_metadatas = draft_attn_metadatas | |
| update_full_graph_params( | |
| self.runner.attn_backend, self.update_stream, forward_context, num_tokens, | |
| self.vllm_config, self.vllm_config.speculative_config) |
…ject#6041)" (vllm-project#6227) (vllm-project#6231) This reverts commit 9564934. The CI failure doesn't related to this change. Let's reapply it. - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094
…ject#6041)" (vllm-project#6227) (vllm-project#6231) This reverts commit 9564934. The CI failure doesn't related to this change. Let's reapply it. - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…ject#6041)" (vllm-project#6227) (vllm-project#6231) This reverts commit 9564934. The CI failure doesn't related to this change. Let's reapply it. - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094
…ject#6041)" (vllm-project#6227) (vllm-project#6231) This reverts commit 9564934. The CI failure doesn't related to this change. Let's reapply it. - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…ject#6041)" (vllm-project#6227) (vllm-project#6231) This reverts commit 9564934. The CI failure doesn't related to this change. Let's reapply it. - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094
…ject#6041)" (vllm-project#6227) (vllm-project#6231) This reverts commit 9564934. The CI failure doesn't related to this change. Let's reapply it. - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094
…ject#6041)" (vllm-project#6227) (vllm-project#6231) This reverts commit 9564934. The CI failure doesn't related to this change. Let's reapply it. - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094
This reverts commit 9564934.
The CI failure doesn't related to this change. Let's reapply it.