Skip to content

[Refactor] Unify full-graph parameter update logic#6041

Merged
weijinqian0 merged 1 commit intovllm-project:mainfrom
LICO1314:drslark-refactor
Jan 24, 2026
Merged

[Refactor] Unify full-graph parameter update logic#6041
weijinqian0 merged 1 commit intovllm-project:mainfrom
LICO1314:drslark-refactor

Conversation

@LICO1314
Copy link
Copy Markdown
Contributor

@LICO1314 LICO1314 commented Jan 20, 2026

What this PR does / why we need it?

Refactor: Unify full-graph parameter update logic

This PR consolidates the scattered full-graph parameter update logic into a unified approach, improving code architecture and eliminating duplication.

Key improvements:

  1. Unified interface

    • Create update_full_graph_params as the single entry point for all full-graph updates
    • Replace multiple scattered update calls with one unified function
    • Remove ~50 lines of duplicated if-else logic across model_runner_v1.py and eagle_proposer.py
  2. Better architecture

    • Move update logic to respective Backend classes (AscendAttentionBackend, AscendMLABackend)
    • Each Backend manages its own parameter update logic internally
    • Simplify caller code to just dispatch to the appropriate Backend
  3. Cleaner parameter handling

    • Remove unnecessary pcp_size and dcp_size parameter passing
    • Get parallel configuration directly from distributed groups
    • Consistent with how other parts of the codebase obtain these values

Why we need it:

  • Maintainability: Future changes only need to be made in one place per Backend
  • Code quality: Follows DRY principle and Single Responsibility Principle
  • Readability: Cleaner, more intuitive code structure

Does this PR introduce any user-facing change?

No. This is a pure refactoring with no functional changes - same behavior, cleaner code.

How was this patch tested?

  • All existing unit tests pass with updated mocks
  • No new tests needed (pure refactoring, no behavior changes)
  • CI validates correctness

  • vLLM version: v0.13.0

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two significant improvements: a refactoring of the attention parameter update logic into a unified function update_full_graph_params, and the addition of full-graph support for Qwen3-Next-MTP, which should improve performance. The refactoring simplifies the codebase by removing duplicated logic, which is a great enhancement for maintainability. The changes to enable full-graph support for MTP are also well-implemented. I've found one critical issue where .contiguous() calls were omitted when inlining a helper function, which could lead to runtime errors. My feedback includes a specific code suggestion to fix this.

Comment thread vllm_ascend/spec_decode/mtp_proposer.py Outdated
Comment on lines +195 to +198
positions = torch.ops.vllm.maybe_all_gather_and_maybe_unpad(
positions, True)
previous_hidden_states = torch.ops.vllm.maybe_all_gather_and_maybe_unpad(
previous_hidden_states, True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The .contiguous() calls are missing for positions and previous_hidden_states before passing them to torch.ops.vllm.maybe_all_gather_and_maybe_unpad. Since these tensors are created by slicing larger tensors (self.positions and self.hidden_states), they may not be contiguous. This could lead to runtime errors or incorrect behavior in the custom op, as the original maybe_all_gather_and_unpad helper function included these calls to ensure contiguity.

Suggested change
positions = torch.ops.vllm.maybe_all_gather_and_maybe_unpad(
positions, True)
previous_hidden_states = torch.ops.vllm.maybe_all_gather_and_maybe_unpad(
previous_hidden_states, True)
positions = torch.ops.vllm.maybe_all_gather_and_maybe_unpad(
positions.contiguous(), True)
previous_hidden_states = torch.ops.vllm.maybe_all_gather_and_maybe_unpad(
previous_hidden_states.contiguous(), True)

@LICO1314 LICO1314 force-pushed the drslark-refactor branch 5 times, most recently from 8ef7d10 to 37caebd Compare January 20, 2026 11:03
@LICO1314 LICO1314 changed the title [Refactor] Refactor update_attn_params logic and support full-graph for Qwen3-Next-MTP [Refactor] Unify full-graph parameter update logic to eliminate code duplication Jan 21, 2026
@LICO1314 LICO1314 requested a review from weijinqian0 as a code owner January 21, 2026 03:43
@LICO1314 LICO1314 changed the title [Refactor] Unify full-graph parameter update logic to eliminate code duplication [Refactor] Unify full-graph parameter update logic Jan 21, 2026
@LICO1314 LICO1314 force-pushed the drslark-refactor branch 3 times, most recently from 10d6495 to d40f8c3 Compare January 21, 2026 07:57
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@LICO1314 LICO1314 force-pushed the drslark-refactor branch 2 times, most recently from f9eaa8d to 02cc848 Compare January 21, 2026 08:28
@LICO1314 LICO1314 force-pushed the drslark-refactor branch 5 times, most recently from a223f0b to d82177e Compare January 22, 2026 01:59
@LICO1314 LICO1314 requested a review from LCAIZJ as a code owner January 22, 2026 01:59
@weijinqian0 weijinqian0 merged commit 8966a99 into vllm-project:main Jan 24, 2026
20 checks passed
@wangxiyuan
Copy link
Copy Markdown
Collaborator

wangxiyuan added a commit to wangxiyuan/vllm-ascend that referenced this pull request Jan 25, 2026
wangxiyuan added a commit that referenced this pull request Jan 25, 2026
…6227)

This reverts commit 8966a99.

It breaks the test
`tests/e2e/singlecard/spec_decode/test_mtp_eagle_correctness.py::test_deepseek_mtp_correctness[True-FULL_DECODE_ONLY-2-wemaster/deepseek_mtp_main_random_bf16]`

- vLLM version: v0.14.0
- vLLM main:
vllm-project/vllm@d682094
wangxiyuan added a commit to wangxiyuan/vllm-ascend that referenced this pull request Jan 25, 2026
LICO1314 added a commit to LICO1314/vllm-ascend that referenced this pull request Jan 25, 2026
### What this PR does / why we need it?

**Refactor: Unify full-graph parameter update logic**

This PR consolidates the scattered full-graph parameter update logic
into a unified approach, improving code architecture and eliminating
duplication.

**Key improvements:**

1. **Unified interface**
- Create `update_full_graph_params` as the single entry point for all
full-graph updates
   - Replace multiple scattered update calls with one unified function
- Remove ~50 lines of duplicated if-else logic across
`model_runner_v1.py` and `eagle_proposer.py`

2. **Better architecture**
- Move update logic to respective Backend classes
(`AscendAttentionBackend`, `AscendMLABackend`)
   - Each Backend manages its own parameter update logic internally
   - Simplify caller code to just dispatch to the appropriate Backend

3. **Cleaner parameter handling**
   - Remove unnecessary `pcp_size` and `dcp_size` parameter passing
   - Get parallel configuration directly from distributed groups
   - Consistent with how other parts of the codebase obtain these values

4. **Bug fix**
   - Add missing `draft_attn_metadatas` parameter to fix MTP test failures
   - The original refactor incorrectly accessed `forward_context.draft_attn_metadatas`
     which doesn't exist; now properly passed as function parameter

**Why we need it:**
- **Maintainability**: Future changes only need to be made in one place
per Backend
- **Code quality**: Follows DRY principle and Single Responsibility
Principle
- **Readability**: Cleaner, more intuitive code structure

### Does this PR introduce _any_ user-facing change?

**No.** This is a pure refactoring with no functional changes - same
behavior, cleaner code.

### How was this patch tested?

- All existing unit tests pass with updated mocks
- No new tests needed (pure refactoring, no behavior changes)
- CI validates correctness

---

- vLLM version: v0.13.0

Signed-off-by: lico67373 <918688502@qq.com>
Co-authored-by: drslark <slarksblood@qq.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>
wangxiyuan added a commit that referenced this pull request Jan 26, 2026
…6227) (#6231)

This reverts commit 9564934.

The CI failure doesn't related to this change. Let's reapply it.

- vLLM version: v0.14.0
- vLLM main:
vllm-project/vllm@d682094
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
### What this PR does / why we need it?

**Refactor: Unify full-graph parameter update logic**

This PR consolidates the scattered full-graph parameter update logic
into a unified approach, improving code architecture and eliminating
duplication.

**Key improvements:**

1. **Unified interface**
- Create `update_full_graph_params` as the single entry point for all
full-graph updates
   - Replace multiple scattered update calls with one unified function
- Remove ~50 lines of duplicated if-else logic across
`model_runner_v1.py` and `eagle_proposer.py`

2. **Better architecture**
- Move update logic to respective Backend classes
(`AscendAttentionBackend`, `AscendMLABackend`)
   - Each Backend manages its own parameter update logic internally
   - Simplify caller code to just dispatch to the appropriate Backend

3. **Cleaner parameter handling**
   - Remove unnecessary `pcp_size` and `dcp_size` parameter passing
   - Get parallel configuration directly from distributed groups
   - Consistent with how other parts of the codebase obtain these values

**Why we need it:**
- **Maintainability**: Future changes only need to be made in one place
per Backend
- **Code quality**: Follows DRY principle and Single Responsibility
Principle
- **Readability**: Cleaner, more intuitive code structure

### Does this PR introduce _any_ user-facing change?

**No.** This is a pure refactoring with no functional changes - same
behavior, cleaner code.

### How was this patch tested?

- All existing unit tests pass with updated mocks
- No new tests needed (pure refactoring, no behavior changes)
- CI validates correctness

---

- vLLM version: v0.13.0

Signed-off-by: lico67373 <918688502@qq.com>
Co-authored-by: drslark <slarksblood@qq.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
…ect#6041)" (vllm-project#6227)

This reverts commit 8966a99.

It breaks the test
`tests/e2e/singlecard/spec_decode/test_mtp_eagle_correctness.py::test_deepseek_mtp_correctness[True-FULL_DECODE_ONLY-2-wemaster/deepseek_mtp_main_random_bf16]`

- vLLM version: v0.14.0
- vLLM main:
vllm-project/vllm@d682094
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
…ject#6041)" (vllm-project#6227) (vllm-project#6231)

This reverts commit 9564934.

The CI failure doesn't related to this change. Let's reapply it.

- vLLM version: v0.14.0
- vLLM main:
vllm-project/vllm@d682094
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
### What this PR does / why we need it?

**Refactor: Unify full-graph parameter update logic**

This PR consolidates the scattered full-graph parameter update logic
into a unified approach, improving code architecture and eliminating
duplication.

**Key improvements:**

1. **Unified interface**
- Create `update_full_graph_params` as the single entry point for all
full-graph updates
   - Replace multiple scattered update calls with one unified function
- Remove ~50 lines of duplicated if-else logic across
`model_runner_v1.py` and `eagle_proposer.py`

2. **Better architecture**
- Move update logic to respective Backend classes
(`AscendAttentionBackend`, `AscendMLABackend`)
   - Each Backend manages its own parameter update logic internally
   - Simplify caller code to just dispatch to the appropriate Backend

3. **Cleaner parameter handling**
   - Remove unnecessary `pcp_size` and `dcp_size` parameter passing
   - Get parallel configuration directly from distributed groups
   - Consistent with how other parts of the codebase obtain these values

**Why we need it:**
- **Maintainability**: Future changes only need to be made in one place
per Backend
- **Code quality**: Follows DRY principle and Single Responsibility
Principle
- **Readability**: Cleaner, more intuitive code structure

### Does this PR introduce _any_ user-facing change?

**No.** This is a pure refactoring with no functional changes - same
behavior, cleaner code.

### How was this patch tested?

- All existing unit tests pass with updated mocks
- No new tests needed (pure refactoring, no behavior changes)
- CI validates correctness

---

- vLLM version: v0.13.0

Signed-off-by: lico67373 <918688502@qq.com>
Co-authored-by: drslark <slarksblood@qq.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
…ect#6041)" (vllm-project#6227)

This reverts commit 8966a99.

It breaks the test
`tests/e2e/singlecard/spec_decode/test_mtp_eagle_correctness.py::test_deepseek_mtp_correctness[True-FULL_DECODE_ONLY-2-wemaster/deepseek_mtp_main_random_bf16]`

- vLLM version: v0.14.0
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
…ject#6041)" (vllm-project#6227) (vllm-project#6231)

This reverts commit 9564934.

The CI failure doesn't related to this change. Let's reapply it.

- vLLM version: v0.14.0
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
### What this PR does / why we need it?

**Refactor: Unify full-graph parameter update logic**

This PR consolidates the scattered full-graph parameter update logic
into a unified approach, improving code architecture and eliminating
duplication.

**Key improvements:**

1. **Unified interface**
- Create `update_full_graph_params` as the single entry point for all
full-graph updates
   - Replace multiple scattered update calls with one unified function
- Remove ~50 lines of duplicated if-else logic across
`model_runner_v1.py` and `eagle_proposer.py`

2. **Better architecture**
- Move update logic to respective Backend classes
(`AscendAttentionBackend`, `AscendMLABackend`)
   - Each Backend manages its own parameter update logic internally
   - Simplify caller code to just dispatch to the appropriate Backend

3. **Cleaner parameter handling**
   - Remove unnecessary `pcp_size` and `dcp_size` parameter passing
   - Get parallel configuration directly from distributed groups
   - Consistent with how other parts of the codebase obtain these values

**Why we need it:**
- **Maintainability**: Future changes only need to be made in one place
per Backend
- **Code quality**: Follows DRY principle and Single Responsibility
Principle
- **Readability**: Cleaner, more intuitive code structure

### Does this PR introduce _any_ user-facing change?

**No.** This is a pure refactoring with no functional changes - same
behavior, cleaner code.

### How was this patch tested?

- All existing unit tests pass with updated mocks
- No new tests needed (pure refactoring, no behavior changes)
- CI validates correctness

---

- vLLM version: v0.13.0

Signed-off-by: lico67373 <918688502@qq.com>
Co-authored-by: drslark <slarksblood@qq.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
…ect#6041)" (vllm-project#6227)

This reverts commit 8966a99.

It breaks the test
`tests/e2e/singlecard/spec_decode/test_mtp_eagle_correctness.py::test_deepseek_mtp_correctness[True-FULL_DECODE_ONLY-2-wemaster/deepseek_mtp_main_random_bf16]`

- vLLM version: v0.14.0
- vLLM main:
vllm-project/vllm@d682094
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
…ject#6041)" (vllm-project#6227) (vllm-project#6231)

This reverts commit 9564934.

The CI failure doesn't related to this change. Let's reapply it.

- vLLM version: v0.14.0
- vLLM main:
vllm-project/vllm@d682094
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
### What this PR does / why we need it?

**Refactor: Unify full-graph parameter update logic**

This PR consolidates the scattered full-graph parameter update logic
into a unified approach, improving code architecture and eliminating
duplication.

**Key improvements:**

1. **Unified interface**
- Create `update_full_graph_params` as the single entry point for all
full-graph updates
   - Replace multiple scattered update calls with one unified function
- Remove ~50 lines of duplicated if-else logic across
`model_runner_v1.py` and `eagle_proposer.py`

2. **Better architecture**
- Move update logic to respective Backend classes
(`AscendAttentionBackend`, `AscendMLABackend`)
   - Each Backend manages its own parameter update logic internally
   - Simplify caller code to just dispatch to the appropriate Backend

3. **Cleaner parameter handling**
   - Remove unnecessary `pcp_size` and `dcp_size` parameter passing
   - Get parallel configuration directly from distributed groups
   - Consistent with how other parts of the codebase obtain these values

**Why we need it:**
- **Maintainability**: Future changes only need to be made in one place
per Backend
- **Code quality**: Follows DRY principle and Single Responsibility
Principle
- **Readability**: Cleaner, more intuitive code structure

### Does this PR introduce _any_ user-facing change?

**No.** This is a pure refactoring with no functional changes - same
behavior, cleaner code.

### How was this patch tested?

- All existing unit tests pass with updated mocks
- No new tests needed (pure refactoring, no behavior changes)
- CI validates correctness

---

- vLLM version: v0.13.0

Signed-off-by: lico67373 <918688502@qq.com>
Co-authored-by: drslark <slarksblood@qq.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
…ect#6041)" (vllm-project#6227)

This reverts commit 8966a99.

It breaks the test
`tests/e2e/singlecard/spec_decode/test_mtp_eagle_correctness.py::test_deepseek_mtp_correctness[True-FULL_DECODE_ONLY-2-wemaster/deepseek_mtp_main_random_bf16]`

- vLLM version: v0.14.0
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
…ject#6041)" (vllm-project#6227) (vllm-project#6231)

This reverts commit 9564934.

The CI failure doesn't related to this change. Let's reapply it.

- vLLM version: v0.14.0
- vLLM main:
vllm-project/vllm@d682094

Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
### What this PR does / why we need it?

**Refactor: Unify full-graph parameter update logic**

This PR consolidates the scattered full-graph parameter update logic
into a unified approach, improving code architecture and eliminating
duplication.

**Key improvements:**

1. **Unified interface**
- Create `update_full_graph_params` as the single entry point for all
full-graph updates
   - Replace multiple scattered update calls with one unified function
- Remove ~50 lines of duplicated if-else logic across
`model_runner_v1.py` and `eagle_proposer.py`

2. **Better architecture**
- Move update logic to respective Backend classes
(`AscendAttentionBackend`, `AscendMLABackend`)
   - Each Backend manages its own parameter update logic internally
   - Simplify caller code to just dispatch to the appropriate Backend

3. **Cleaner parameter handling**
   - Remove unnecessary `pcp_size` and `dcp_size` parameter passing
   - Get parallel configuration directly from distributed groups
   - Consistent with how other parts of the codebase obtain these values

**Why we need it:**
- **Maintainability**: Future changes only need to be made in one place
per Backend
- **Code quality**: Follows DRY principle and Single Responsibility
Principle
- **Readability**: Cleaner, more intuitive code structure

### Does this PR introduce _any_ user-facing change?

**No.** This is a pure refactoring with no functional changes - same
behavior, cleaner code.

### How was this patch tested?

- All existing unit tests pass with updated mocks
- No new tests needed (pure refactoring, no behavior changes)
- CI validates correctness

---

- vLLM version: v0.13.0

Signed-off-by: lico67373 <918688502@qq.com>
Co-authored-by: drslark <slarksblood@qq.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
…ect#6041)" (vllm-project#6227)

This reverts commit 8966a99.

It breaks the test
`tests/e2e/singlecard/spec_decode/test_mtp_eagle_correctness.py::test_deepseek_mtp_correctness[True-FULL_DECODE_ONLY-2-wemaster/deepseek_mtp_main_random_bf16]`

- vLLM version: v0.14.0
- vLLM main:
vllm-project/vllm@d682094
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
…ject#6041)" (vllm-project#6227) (vllm-project#6231)

This reverts commit 9564934.

The CI failure doesn't related to this change. Let's reapply it.

- vLLM version: v0.14.0
- vLLM main:
vllm-project/vllm@d682094
jiangyunfan1 pushed a commit to jiangyunfan1/vllm-ascend that referenced this pull request Apr 9, 2026
### What this PR does / why we need it?

**Refactor: Unify full-graph parameter update logic**

This PR consolidates the scattered full-graph parameter update logic
into a unified approach, improving code architecture and eliminating
duplication.

**Key improvements:**

1. **Unified interface**
- Create `update_full_graph_params` as the single entry point for all
full-graph updates
   - Replace multiple scattered update calls with one unified function
- Remove ~50 lines of duplicated if-else logic across
`model_runner_v1.py` and `eagle_proposer.py`

2. **Better architecture**
- Move update logic to respective Backend classes
(`AscendAttentionBackend`, `AscendMLABackend`)
   - Each Backend manages its own parameter update logic internally
   - Simplify caller code to just dispatch to the appropriate Backend

3. **Cleaner parameter handling**
   - Remove unnecessary `pcp_size` and `dcp_size` parameter passing
   - Get parallel configuration directly from distributed groups
   - Consistent with how other parts of the codebase obtain these values

**Why we need it:**
- **Maintainability**: Future changes only need to be made in one place
per Backend
- **Code quality**: Follows DRY principle and Single Responsibility
Principle
- **Readability**: Cleaner, more intuitive code structure

### Does this PR introduce _any_ user-facing change?

**No.** This is a pure refactoring with no functional changes - same
behavior, cleaner code.

### How was this patch tested?

- All existing unit tests pass with updated mocks
- No new tests needed (pure refactoring, no behavior changes)
- CI validates correctness

---

- vLLM version: v0.13.0

Signed-off-by: lico67373 <918688502@qq.com>
Co-authored-by: drslark <slarksblood@qq.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>
jiangyunfan1 pushed a commit to jiangyunfan1/vllm-ascend that referenced this pull request Apr 9, 2026
…ect#6041)" (vllm-project#6227)

This reverts commit 8966a99.

It breaks the test
`tests/e2e/singlecard/spec_decode/test_mtp_eagle_correctness.py::test_deepseek_mtp_correctness[True-FULL_DECODE_ONLY-2-wemaster/deepseek_mtp_main_random_bf16]`

- vLLM version: v0.14.0
- vLLM main:
vllm-project/vllm@d682094
jiangyunfan1 pushed a commit to jiangyunfan1/vllm-ascend that referenced this pull request Apr 9, 2026
…ject#6041)" (vllm-project#6227) (vllm-project#6231)

This reverts commit 9564934.

The CI failure doesn't related to this change. Let's reapply it.

- vLLM version: v0.14.0
- vLLM main:
vllm-project/vllm@d682094
yangzhe-2026 pushed a commit to yangzhe-2026/vllm-ascend that referenced this pull request May 6, 2026
### What this PR does / why we need it?

**Refactor: Unify full-graph parameter update logic**

This PR consolidates the scattered full-graph parameter update logic
into a unified approach, improving code architecture and eliminating
duplication.

**Key improvements:**

1. **Unified interface**
- Create `update_full_graph_params` as the single entry point for all
full-graph updates
   - Replace multiple scattered update calls with one unified function
- Remove ~50 lines of duplicated if-else logic across
`model_runner_v1.py` and `eagle_proposer.py`

2. **Better architecture**
- Move update logic to respective Backend classes
(`AscendAttentionBackend`, `AscendMLABackend`)
   - Each Backend manages its own parameter update logic internally
   - Simplify caller code to just dispatch to the appropriate Backend

3. **Cleaner parameter handling**
   - Remove unnecessary `pcp_size` and `dcp_size` parameter passing
   - Get parallel configuration directly from distributed groups
   - Consistent with how other parts of the codebase obtain these values

**Why we need it:**
- **Maintainability**: Future changes only need to be made in one place
per Backend
- **Code quality**: Follows DRY principle and Single Responsibility
Principle
- **Readability**: Cleaner, more intuitive code structure

### Does this PR introduce _any_ user-facing change?

**No.** This is a pure refactoring with no functional changes - same
behavior, cleaner code.

### How was this patch tested?

- All existing unit tests pass with updated mocks
- No new tests needed (pure refactoring, no behavior changes)
- CI validates correctness

---

- vLLM version: v0.13.0

Signed-off-by: lico67373 <918688502@qq.com>
Co-authored-by: drslark <slarksblood@qq.com>
Co-authored-by: weijinqian0 <1184188277@qq.com>
yangzhe-2026 pushed a commit to yangzhe-2026/vllm-ascend that referenced this pull request May 6, 2026
…ect#6041)" (vllm-project#6227)

This reverts commit 8966a99.

It breaks the test
`tests/e2e/singlecard/spec_decode/test_mtp_eagle_correctness.py::test_deepseek_mtp_correctness[True-FULL_DECODE_ONLY-2-wemaster/deepseek_mtp_main_random_bf16]`

- vLLM version: v0.14.0
- vLLM main:
vllm-project/vllm@d682094
yangzhe-2026 pushed a commit to yangzhe-2026/vllm-ascend that referenced this pull request May 6, 2026
…ject#6041)" (vllm-project#6227) (vllm-project#6231)

This reverts commit 9564934.

The CI failure doesn't related to this change. Let's reapply it.

- vLLM version: v0.14.0
- vLLM main:
vllm-project/vllm@d682094
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:tests ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants