[Refactor] Unify full-graph parameter update logic#6041
[Refactor] Unify full-graph parameter update logic#6041weijinqian0 merged 1 commit intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request introduces two significant improvements: a refactoring of the attention parameter update logic into a unified function update_full_graph_params, and the addition of full-graph support for Qwen3-Next-MTP, which should improve performance. The refactoring simplifies the codebase by removing duplicated logic, which is a great enhancement for maintainability. The changes to enable full-graph support for MTP are also well-implemented. I've found one critical issue where .contiguous() calls were omitted when inlining a helper function, which could lead to runtime errors. My feedback includes a specific code suggestion to fix this.
| positions = torch.ops.vllm.maybe_all_gather_and_maybe_unpad( | ||
| positions, True) | ||
| previous_hidden_states = torch.ops.vllm.maybe_all_gather_and_maybe_unpad( | ||
| previous_hidden_states, True) |
There was a problem hiding this comment.
The .contiguous() calls are missing for positions and previous_hidden_states before passing them to torch.ops.vllm.maybe_all_gather_and_maybe_unpad. Since these tensors are created by slicing larger tensors (self.positions and self.hidden_states), they may not be contiguous. This could lead to runtime errors or incorrect behavior in the custom op, as the original maybe_all_gather_and_unpad helper function included these calls to ensure contiguity.
| positions = torch.ops.vllm.maybe_all_gather_and_maybe_unpad( | |
| positions, True) | |
| previous_hidden_states = torch.ops.vllm.maybe_all_gather_and_maybe_unpad( | |
| previous_hidden_states, True) | |
| positions = torch.ops.vllm.maybe_all_gather_and_maybe_unpad( | |
| positions.contiguous(), True) | |
| previous_hidden_states = torch.ops.vllm.maybe_all_gather_and_maybe_unpad( | |
| previous_hidden_states.contiguous(), True) |
8ef7d10 to
37caebd
Compare
37caebd to
c2b289a
Compare
10d6495 to
d40f8c3
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
f9eaa8d to
02cc848
Compare
a223f0b to
d82177e
Compare
|
CI is broken after this PR merged: https://github.com/vllm-project/vllm-ascend/actions/runs/21317195670/job/61361650122?pr=6226 |
…6227) This reverts commit 8966a99. It breaks the test `tests/e2e/singlecard/spec_decode/test_mtp_eagle_correctness.py::test_deepseek_mtp_correctness[True-FULL_DECODE_ONLY-2-wemaster/deepseek_mtp_main_random_bf16]` - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094
…ject#6041)" (vllm-project#6227) This reverts commit 9564934.
### What this PR does / why we need it?
**Refactor: Unify full-graph parameter update logic**
This PR consolidates the scattered full-graph parameter update logic
into a unified approach, improving code architecture and eliminating
duplication.
**Key improvements:**
1. **Unified interface**
- Create `update_full_graph_params` as the single entry point for all
full-graph updates
- Replace multiple scattered update calls with one unified function
- Remove ~50 lines of duplicated if-else logic across
`model_runner_v1.py` and `eagle_proposer.py`
2. **Better architecture**
- Move update logic to respective Backend classes
(`AscendAttentionBackend`, `AscendMLABackend`)
- Each Backend manages its own parameter update logic internally
- Simplify caller code to just dispatch to the appropriate Backend
3. **Cleaner parameter handling**
- Remove unnecessary `pcp_size` and `dcp_size` parameter passing
- Get parallel configuration directly from distributed groups
- Consistent with how other parts of the codebase obtain these values
4. **Bug fix**
- Add missing `draft_attn_metadatas` parameter to fix MTP test failures
- The original refactor incorrectly accessed `forward_context.draft_attn_metadatas`
which doesn't exist; now properly passed as function parameter
**Why we need it:**
- **Maintainability**: Future changes only need to be made in one place
per Backend
- **Code quality**: Follows DRY principle and Single Responsibility
Principle
- **Readability**: Cleaner, more intuitive code structure
### Does this PR introduce _any_ user-facing change?
**No.** This is a pure refactoring with no functional changes - same
behavior, cleaner code.
### How was this patch tested?
- All existing unit tests pass with updated mocks
- No new tests needed (pure refactoring, no behavior changes)
- CI validates correctness
---
- vLLM version: v0.13.0
Signed-off-by: lico67373 <918688502@qq.com>
Co-authored-by: drslark <slarksblood@qq.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>
…6227) (#6231) This reverts commit 9564934. The CI failure doesn't related to this change. Let's reapply it. - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094
### What this PR does / why we need it? **Refactor: Unify full-graph parameter update logic** This PR consolidates the scattered full-graph parameter update logic into a unified approach, improving code architecture and eliminating duplication. **Key improvements:** 1. **Unified interface** - Create `update_full_graph_params` as the single entry point for all full-graph updates - Replace multiple scattered update calls with one unified function - Remove ~50 lines of duplicated if-else logic across `model_runner_v1.py` and `eagle_proposer.py` 2. **Better architecture** - Move update logic to respective Backend classes (`AscendAttentionBackend`, `AscendMLABackend`) - Each Backend manages its own parameter update logic internally - Simplify caller code to just dispatch to the appropriate Backend 3. **Cleaner parameter handling** - Remove unnecessary `pcp_size` and `dcp_size` parameter passing - Get parallel configuration directly from distributed groups - Consistent with how other parts of the codebase obtain these values **Why we need it:** - **Maintainability**: Future changes only need to be made in one place per Backend - **Code quality**: Follows DRY principle and Single Responsibility Principle - **Readability**: Cleaner, more intuitive code structure ### Does this PR introduce _any_ user-facing change? **No.** This is a pure refactoring with no functional changes - same behavior, cleaner code. ### How was this patch tested? - All existing unit tests pass with updated mocks - No new tests needed (pure refactoring, no behavior changes) - CI validates correctness --- - vLLM version: v0.13.0 Signed-off-by: lico67373 <918688502@qq.com> Co-authored-by: drslark <slarksblood@qq.com> Co-authored-by: weijinqian0 <1184188277@qq.com>
…ect#6041)" (vllm-project#6227) This reverts commit 8966a99. It breaks the test `tests/e2e/singlecard/spec_decode/test_mtp_eagle_correctness.py::test_deepseek_mtp_correctness[True-FULL_DECODE_ONLY-2-wemaster/deepseek_mtp_main_random_bf16]` - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094
…ject#6041)" (vllm-project#6227) (vllm-project#6231) This reverts commit 9564934. The CI failure doesn't related to this change. Let's reapply it. - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094
### What this PR does / why we need it? **Refactor: Unify full-graph parameter update logic** This PR consolidates the scattered full-graph parameter update logic into a unified approach, improving code architecture and eliminating duplication. **Key improvements:** 1. **Unified interface** - Create `update_full_graph_params` as the single entry point for all full-graph updates - Replace multiple scattered update calls with one unified function - Remove ~50 lines of duplicated if-else logic across `model_runner_v1.py` and `eagle_proposer.py` 2. **Better architecture** - Move update logic to respective Backend classes (`AscendAttentionBackend`, `AscendMLABackend`) - Each Backend manages its own parameter update logic internally - Simplify caller code to just dispatch to the appropriate Backend 3. **Cleaner parameter handling** - Remove unnecessary `pcp_size` and `dcp_size` parameter passing - Get parallel configuration directly from distributed groups - Consistent with how other parts of the codebase obtain these values **Why we need it:** - **Maintainability**: Future changes only need to be made in one place per Backend - **Code quality**: Follows DRY principle and Single Responsibility Principle - **Readability**: Cleaner, more intuitive code structure ### Does this PR introduce _any_ user-facing change? **No.** This is a pure refactoring with no functional changes - same behavior, cleaner code. ### How was this patch tested? - All existing unit tests pass with updated mocks - No new tests needed (pure refactoring, no behavior changes) - CI validates correctness --- - vLLM version: v0.13.0 Signed-off-by: lico67373 <918688502@qq.com> Co-authored-by: drslark <slarksblood@qq.com> Co-authored-by: weijinqian0 <1184188277@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…ect#6041)" (vllm-project#6227) This reverts commit 8966a99. It breaks the test `tests/e2e/singlecard/spec_decode/test_mtp_eagle_correctness.py::test_deepseek_mtp_correctness[True-FULL_DECODE_ONLY-2-wemaster/deepseek_mtp_main_random_bf16]` - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…ject#6041)" (vllm-project#6227) (vllm-project#6231) This reverts commit 9564934. The CI failure doesn't related to this change. Let's reapply it. - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? **Refactor: Unify full-graph parameter update logic** This PR consolidates the scattered full-graph parameter update logic into a unified approach, improving code architecture and eliminating duplication. **Key improvements:** 1. **Unified interface** - Create `update_full_graph_params` as the single entry point for all full-graph updates - Replace multiple scattered update calls with one unified function - Remove ~50 lines of duplicated if-else logic across `model_runner_v1.py` and `eagle_proposer.py` 2. **Better architecture** - Move update logic to respective Backend classes (`AscendAttentionBackend`, `AscendMLABackend`) - Each Backend manages its own parameter update logic internally - Simplify caller code to just dispatch to the appropriate Backend 3. **Cleaner parameter handling** - Remove unnecessary `pcp_size` and `dcp_size` parameter passing - Get parallel configuration directly from distributed groups - Consistent with how other parts of the codebase obtain these values **Why we need it:** - **Maintainability**: Future changes only need to be made in one place per Backend - **Code quality**: Follows DRY principle and Single Responsibility Principle - **Readability**: Cleaner, more intuitive code structure ### Does this PR introduce _any_ user-facing change? **No.** This is a pure refactoring with no functional changes - same behavior, cleaner code. ### How was this patch tested? - All existing unit tests pass with updated mocks - No new tests needed (pure refactoring, no behavior changes) - CI validates correctness --- - vLLM version: v0.13.0 Signed-off-by: lico67373 <918688502@qq.com> Co-authored-by: drslark <slarksblood@qq.com> Co-authored-by: weijinqian0 <1184188277@qq.com>
…ect#6041)" (vllm-project#6227) This reverts commit 8966a99. It breaks the test `tests/e2e/singlecard/spec_decode/test_mtp_eagle_correctness.py::test_deepseek_mtp_correctness[True-FULL_DECODE_ONLY-2-wemaster/deepseek_mtp_main_random_bf16]` - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094
…ject#6041)" (vllm-project#6227) (vllm-project#6231) This reverts commit 9564934. The CI failure doesn't related to this change. Let's reapply it. - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094
### What this PR does / why we need it? **Refactor: Unify full-graph parameter update logic** This PR consolidates the scattered full-graph parameter update logic into a unified approach, improving code architecture and eliminating duplication. **Key improvements:** 1. **Unified interface** - Create `update_full_graph_params` as the single entry point for all full-graph updates - Replace multiple scattered update calls with one unified function - Remove ~50 lines of duplicated if-else logic across `model_runner_v1.py` and `eagle_proposer.py` 2. **Better architecture** - Move update logic to respective Backend classes (`AscendAttentionBackend`, `AscendMLABackend`) - Each Backend manages its own parameter update logic internally - Simplify caller code to just dispatch to the appropriate Backend 3. **Cleaner parameter handling** - Remove unnecessary `pcp_size` and `dcp_size` parameter passing - Get parallel configuration directly from distributed groups - Consistent with how other parts of the codebase obtain these values **Why we need it:** - **Maintainability**: Future changes only need to be made in one place per Backend - **Code quality**: Follows DRY principle and Single Responsibility Principle - **Readability**: Cleaner, more intuitive code structure ### Does this PR introduce _any_ user-facing change? **No.** This is a pure refactoring with no functional changes - same behavior, cleaner code. ### How was this patch tested? - All existing unit tests pass with updated mocks - No new tests needed (pure refactoring, no behavior changes) - CI validates correctness --- - vLLM version: v0.13.0 Signed-off-by: lico67373 <918688502@qq.com> Co-authored-by: drslark <slarksblood@qq.com> Co-authored-by: weijinqian0 <1184188277@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…ect#6041)" (vllm-project#6227) This reverts commit 8966a99. It breaks the test `tests/e2e/singlecard/spec_decode/test_mtp_eagle_correctness.py::test_deepseek_mtp_correctness[True-FULL_DECODE_ONLY-2-wemaster/deepseek_mtp_main_random_bf16]` - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…ject#6041)" (vllm-project#6227) (vllm-project#6231) This reverts commit 9564934. The CI failure doesn't related to this change. Let's reapply it. - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094 Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? **Refactor: Unify full-graph parameter update logic** This PR consolidates the scattered full-graph parameter update logic into a unified approach, improving code architecture and eliminating duplication. **Key improvements:** 1. **Unified interface** - Create `update_full_graph_params` as the single entry point for all full-graph updates - Replace multiple scattered update calls with one unified function - Remove ~50 lines of duplicated if-else logic across `model_runner_v1.py` and `eagle_proposer.py` 2. **Better architecture** - Move update logic to respective Backend classes (`AscendAttentionBackend`, `AscendMLABackend`) - Each Backend manages its own parameter update logic internally - Simplify caller code to just dispatch to the appropriate Backend 3. **Cleaner parameter handling** - Remove unnecessary `pcp_size` and `dcp_size` parameter passing - Get parallel configuration directly from distributed groups - Consistent with how other parts of the codebase obtain these values **Why we need it:** - **Maintainability**: Future changes only need to be made in one place per Backend - **Code quality**: Follows DRY principle and Single Responsibility Principle - **Readability**: Cleaner, more intuitive code structure ### Does this PR introduce _any_ user-facing change? **No.** This is a pure refactoring with no functional changes - same behavior, cleaner code. ### How was this patch tested? - All existing unit tests pass with updated mocks - No new tests needed (pure refactoring, no behavior changes) - CI validates correctness --- - vLLM version: v0.13.0 Signed-off-by: lico67373 <918688502@qq.com> Co-authored-by: drslark <slarksblood@qq.com> Co-authored-by: weijinqian0 <1184188277@qq.com>
…ect#6041)" (vllm-project#6227) This reverts commit 8966a99. It breaks the test `tests/e2e/singlecard/spec_decode/test_mtp_eagle_correctness.py::test_deepseek_mtp_correctness[True-FULL_DECODE_ONLY-2-wemaster/deepseek_mtp_main_random_bf16]` - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094
…ject#6041)" (vllm-project#6227) (vllm-project#6231) This reverts commit 9564934. The CI failure doesn't related to this change. Let's reapply it. - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094
### What this PR does / why we need it? **Refactor: Unify full-graph parameter update logic** This PR consolidates the scattered full-graph parameter update logic into a unified approach, improving code architecture and eliminating duplication. **Key improvements:** 1. **Unified interface** - Create `update_full_graph_params` as the single entry point for all full-graph updates - Replace multiple scattered update calls with one unified function - Remove ~50 lines of duplicated if-else logic across `model_runner_v1.py` and `eagle_proposer.py` 2. **Better architecture** - Move update logic to respective Backend classes (`AscendAttentionBackend`, `AscendMLABackend`) - Each Backend manages its own parameter update logic internally - Simplify caller code to just dispatch to the appropriate Backend 3. **Cleaner parameter handling** - Remove unnecessary `pcp_size` and `dcp_size` parameter passing - Get parallel configuration directly from distributed groups - Consistent with how other parts of the codebase obtain these values **Why we need it:** - **Maintainability**: Future changes only need to be made in one place per Backend - **Code quality**: Follows DRY principle and Single Responsibility Principle - **Readability**: Cleaner, more intuitive code structure ### Does this PR introduce _any_ user-facing change? **No.** This is a pure refactoring with no functional changes - same behavior, cleaner code. ### How was this patch tested? - All existing unit tests pass with updated mocks - No new tests needed (pure refactoring, no behavior changes) - CI validates correctness --- - vLLM version: v0.13.0 Signed-off-by: lico67373 <918688502@qq.com> Co-authored-by: drslark <slarksblood@qq.com> Co-authored-by: weijinqian0 <1184188277@qq.com>
…ect#6041)" (vllm-project#6227) This reverts commit 8966a99. It breaks the test `tests/e2e/singlecard/spec_decode/test_mtp_eagle_correctness.py::test_deepseek_mtp_correctness[True-FULL_DECODE_ONLY-2-wemaster/deepseek_mtp_main_random_bf16]` - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094
…ject#6041)" (vllm-project#6227) (vllm-project#6231) This reverts commit 9564934. The CI failure doesn't related to this change. Let's reapply it. - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094
### What this PR does / why we need it? **Refactor: Unify full-graph parameter update logic** This PR consolidates the scattered full-graph parameter update logic into a unified approach, improving code architecture and eliminating duplication. **Key improvements:** 1. **Unified interface** - Create `update_full_graph_params` as the single entry point for all full-graph updates - Replace multiple scattered update calls with one unified function - Remove ~50 lines of duplicated if-else logic across `model_runner_v1.py` and `eagle_proposer.py` 2. **Better architecture** - Move update logic to respective Backend classes (`AscendAttentionBackend`, `AscendMLABackend`) - Each Backend manages its own parameter update logic internally - Simplify caller code to just dispatch to the appropriate Backend 3. **Cleaner parameter handling** - Remove unnecessary `pcp_size` and `dcp_size` parameter passing - Get parallel configuration directly from distributed groups - Consistent with how other parts of the codebase obtain these values **Why we need it:** - **Maintainability**: Future changes only need to be made in one place per Backend - **Code quality**: Follows DRY principle and Single Responsibility Principle - **Readability**: Cleaner, more intuitive code structure ### Does this PR introduce _any_ user-facing change? **No.** This is a pure refactoring with no functional changes - same behavior, cleaner code. ### How was this patch tested? - All existing unit tests pass with updated mocks - No new tests needed (pure refactoring, no behavior changes) - CI validates correctness --- - vLLM version: v0.13.0 Signed-off-by: lico67373 <918688502@qq.com> Co-authored-by: drslark <slarksblood@qq.com> Co-authored-by: weijinqian0 <1184188277@qq.com>
…ect#6041)" (vllm-project#6227) This reverts commit 8966a99. It breaks the test `tests/e2e/singlecard/spec_decode/test_mtp_eagle_correctness.py::test_deepseek_mtp_correctness[True-FULL_DECODE_ONLY-2-wemaster/deepseek_mtp_main_random_bf16]` - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094
…ject#6041)" (vllm-project#6227) (vllm-project#6231) This reverts commit 9564934. The CI failure doesn't related to this change. Let's reapply it. - vLLM version: v0.14.0 - vLLM main: vllm-project/vllm@d682094
What this PR does / why we need it?
Refactor: Unify full-graph parameter update logic
This PR consolidates the scattered full-graph parameter update logic into a unified approach, improving code architecture and eliminating duplication.
Key improvements:
Unified interface
update_full_graph_paramsas the single entry point for all full-graph updatesmodel_runner_v1.pyandeagle_proposer.pyBetter architecture
AscendAttentionBackend,AscendMLABackend)Cleaner parameter handling
pcp_sizeanddcp_sizeparameter passingWhy we need it:
Does this PR introduce any user-facing change?
No. This is a pure refactoring with no functional changes - same behavior, cleaner code.
How was this patch tested?