[refactor] refactor model runner capture model by weiguihua2 · Pull Request #5230 · vllm-project/vllm-ascend

weiguihua2 · 2025-12-22T04:22:51Z

What this PR does / why we need it?

Refactor the capture_model method in model_runner to directly reuse the method from vLLM.

Currently, most of the logic in the capture_model method is similar to that in the vllm code. Directly using the vllm method can reduce the maintenance cost of the vllm-ascend code. Modify as follows:
1、refactor capture_model function, directly inheriting community methods
2、refactor initialize_aclgraph_capture function, move to initialize_attn_backend

Does this PR introduce any user-facing change?

No

How was this patch tested?

vLLM version: release/v0.13.0
vLLM main: vllm-project/vllm@ad32e3e

gemini-code-assist

Code Review

This pull request refactors the model capturing logic to inherit from the community GPUModelRunner, which simplifies the code and improves maintainability. It also refactors how CUDA graph support is determined for attention backends, moving from a static attribute to a dynamic method. These changes align the Ascend-specific codebase more closely with the upstream vLLM project. My review has identified a critical bug in a new context manager that fails to restore a patched function, and a performance issue in initialization logic where a function is called redundantly in a loop.

I am having trouble creating individual review comments. Click here to see my feedback.

vllm_ascend/worker/model_runner_v1.py (3201-3207)

The context manager _replace_gpu_model_runner_function_wrapper does not correctly restore the original graph_capture function. The finally block re-assigns the patched function instead of restoring the original one. This will lead to a permanent patch, which can cause unexpected behavior in other parts of the codebase that might rely on the original graph_capture from the parent module.

def _replace_gpu_model_runner_function_wrapper(target_module_name):
    target_module = sys.modules[target_module_name]
    original_graph_capture = target_module.graph_capture
    try:
        target_module.graph_capture = graph_capture
        yield
    finally:
        target_module.graph_capture = original_graph_capture

vllm_ascend/worker/model_runner_v1.py (2709-2713)

The function get_attn_backends_for_group is called twice for each kv_cache_group_spec. It's called once in the first loop to populate attention_backend_maps and attention_backend_list, and then again in this second loop. This is inefficient. The results from the first loop should be reused here.

        for i, attn_backends_map in enumerate(attention_backend_maps):
            self.attn_groups.append(create_attn_groups(attn_backends_map, i))

github-actions · 2025-12-22T04:38:31Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-12-22T05:03:23Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

github-actions · 2025-12-22T06:37:46Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

yiz-liu · 2025-12-22T08:53:23Z

        for i, kv_cache_group_spec in enumerate(
                kv_cache_config.kv_cache_groups):
            attn_backends = get_attn_backends_for_group(  # type: ignore
                kv_cache_group_spec)
-            self.attn_groups.append(create_attn_groups(attn_backends, i))
+            self.attn_groups.append(create_attn_groups(attn_backends[0], i))


We should iterate through attention_backend_maps now, please fix this in another PR.

github-actions · 2025-12-24T02:37:51Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-12-28T02:39:36Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: weiguihua2 <weiguihua2@huawei.com>

…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (88 commits) [1/N] Refactor nightly test structure (vllm-project#5479) Docs: Remove deprecated --task parameter for embedding models (vllm-project#5257) Revert "moe_gating_top_k" (vllm-project#5512) [Doc] Fix issue link for 0.12.0 (vllm-project#5500) [CI]update triton ascend version (vllm-project#5392) moe_gating_top_k (vllm-project#5271) [refactor] refactor model runner capture model (vllm-project#5230) Update corresponding vllm commit ID to 12 29 (vllm-project#5475) [Kernel]update csrc cmakelist for open-source cann (vllm-project#5458) [OP] add custom op aclnnMoeInitRoutingCustom (vllm-project#5251) [Refactor][EAGLE] 1/N delete __init__ in mtp_proposer (vllm-project#5176) [Refactor][Triton] Move reject sample triton kernels into ops/triton (vllm-project#5324) [Feature] support eager mode in model runner v2 (vllm-project#5210) [feature] fia support sliding windows (vllm-project#5239) Optimize some rejectsampler functions to make npu op launch non-blocking (vllm-project#4587) [Feature] Support to use fullgraph with eagle (vllm-project#5118) [EPLB][refactor] Modification of the initialization logic for expert_map and log2phy（depend on pr5285） (vllm-project#5311) [Refactor]6/N Extract common code of class AscendMLAImpl (vllm-project#5314) [Refactor] cache cos/sin in mla & remove parameter model in builder. (vllm-project#5277) update vllm pin to 12.27 (vllm-project#5412) ...

### What this PR does / why we need it? Refactor the `capture_model` method in model_runner to directly reuse the method from vLLM. Currently, most of the logic in the capture_model method is similar to that in the vllm code. Directly using the vllm method can reduce the maintenance cost of the vllm-ascend code. Modify as follows: 1、refactor capture_model function, directly inheriting community methods 2、refactor initialize_aclgraph_capture function, move to initialize_attn_backend ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: weiguihua2 <weiguihua2@huawei.com> Co-authored-by: weijinqian0 <1184188277@qq.com>

### What this PR does / why we need it? #5230 this PR introduced a problem when both mtp and full_decode_only are enabled for the DSV32 model, the operators cannot be compiled into the graph. This PR fixes that issue. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: cookieyyds <126683903+cookieyyds@users.noreply.github.com>

…oject#5679) ### What this PR does / why we need it? vllm-project#5230 this PR introduced a problem when both mtp and full_decode_only are enabled for the DSV32 model, the operators cannot be compiled into the graph. This PR fixes that issue. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: cookieyyds <126683903+cookieyyds@users.noreply.github.com>

### What this PR does / why we need it? Refactor the `capture_model` method in model_runner to directly reuse the method from vLLM. Currently, most of the logic in the capture_model method is similar to that in the vllm code. Directly using the vllm method can reduce the maintenance cost of the vllm-ascend code. Modify as follows: 1、refactor capture_model function, directly inheriting community methods 2、refactor initialize_aclgraph_capture function, move to initialize_attn_backend ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: weiguihua2 <weiguihua2@huawei.com> Co-authored-by: weijinqian0 <1184188277@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…oject#5679) ### What this PR does / why we need it? vllm-project#5230 this PR introduced a problem when both mtp and full_decode_only are enabled for the DSV32 model, the operators cannot be compiled into the graph. This PR fixes that issue. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: cookieyyds <126683903+cookieyyds@users.noreply.github.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

### What this PR does / why we need it? Refactor the `capture_model` method in model_runner to directly reuse the method from vLLM. Currently, most of the logic in the capture_model method is similar to that in the vllm code. Directly using the vllm method can reduce the maintenance cost of the vllm-ascend code. Modify as follows: 1、refactor capture_model function, directly inheriting community methods 2、refactor initialize_aclgraph_capture function, move to initialize_attn_backend ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: weiguihua2 <weiguihua2@huawei.com> Co-authored-by: weijinqian0 <1184188277@qq.com>

…oject#5679) ### What this PR does / why we need it? vllm-project#5230 this PR introduced a problem when both mtp and full_decode_only are enabled for the DSV32 model, the operators cannot be compiled into the graph. This PR fixes that issue. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: cookieyyds <126683903+cookieyyds@users.noreply.github.com>

### What this PR does / why we need it? Refactor the `capture_model` method in model_runner to directly reuse the method from vLLM. Currently, most of the logic in the capture_model method is similar to that in the vllm code. Directly using the vllm method can reduce the maintenance cost of the vllm-ascend code. Modify as follows: 1、refactor capture_model function, directly inheriting community methods 2、refactor initialize_aclgraph_capture function, move to initialize_attn_backend ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: weiguihua2 <weiguihua2@huawei.com> Co-authored-by: weijinqian0 <1184188277@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…oject#5679) ### What this PR does / why we need it? vllm-project#5230 this PR introduced a problem when both mtp and full_decode_only are enabled for the DSV32 model, the operators cannot be compiled into the graph. This PR fixes that issue. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: cookieyyds <126683903+cookieyyds@users.noreply.github.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…oject#5679) ### What this PR does / why we need it? vllm-project#5230 this PR introduced a problem when both mtp and full_decode_only are enabled for the DSV32 model, the operators cannot be compiled into the graph. This PR fixes that issue. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: cookieyyds <126683903+cookieyyds@users.noreply.github.com>

gemini-code-assist bot reviewed Dec 22, 2025

View reviewed changes

github-actions bot added the merge-conflicts label Dec 22, 2025

github-actions bot added module:tests module:core labels Dec 22, 2025

weiguihua2 force-pushed the new_main branch from 9a17bac to 4b28bbf Compare December 22, 2025 06:26

github-actions bot added merge-conflicts and removed merge-conflicts labels Dec 22, 2025

weiguihua2 force-pushed the new_main branch from ff99d99 to 164bf68 Compare December 22, 2025 07:28

github-actions bot removed the merge-conflicts label Dec 22, 2025

yiz-liu reviewed Dec 22, 2025

View reviewed changes

Comment thread vllm_ascend/worker/model_runner_v1.py Outdated

Comment thread vllm_ascend/worker/model_runner_v1.py

yiz-liu reviewed Dec 22, 2025

View reviewed changes

weiguihua2 added ready read for review ready-for-test start test by label for PR labels Dec 23, 2025

yiz-liu approved these changes Dec 23, 2025

View reviewed changes

github-actions bot added the merge-conflicts label Dec 24, 2025

weiguihua2 force-pushed the new_main branch from 3f0fcde to 6edd806 Compare December 24, 2025 02:59

github-actions bot removed the merge-conflicts label Dec 24, 2025

weiguihua2 force-pushed the new_main branch from 6b33750 to e460084 Compare December 24, 2025 03:03

zzzzwwjj approved these changes Dec 24, 2025

View reviewed changes

github-actions bot added the merge-conflicts label Dec 28, 2025

weiguihua2 force-pushed the new_main branch from e460084 to d49741d Compare December 29, 2025 01:45

zhenwenqi2024 mentioned this pull request Dec 29, 2025

[RFC]: Refactor npu_model_runner #5449

Closed

weiguihua2 force-pushed the new_main branch from d49741d to c311e59 Compare December 29, 2025 06:15

github-actions bot removed the merge-conflicts label Dec 29, 2025

[refactor] refactor model runner capture model

c8764ec

Signed-off-by: weiguihua2 <weiguihua2@huawei.com>

weiguihua2 force-pushed the new_main branch from 27423f1 to c8764ec Compare December 29, 2025 09:42

Merge branch 'main' into new_main

bfb7a54

weijinqian0 merged commit 15d73f2 into vllm-project:main Dec 30, 2025
8 checks passed

cookieyyds mentioned this pull request Jan 7, 2026

[bugfix]support dsv3.2 enable both mtp and full_decode_only #5679

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[refactor] refactor model runner capture model#5230

[refactor] refactor model runner capture model#5230
weijinqian0 merged 2 commits intovllm-project:mainfrom
weiguihua2:new_main

weiguihua2 commented Dec 22, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

github-actions bot commented Dec 22, 2025

Uh oh!

github-actions bot commented Dec 22, 2025

Uh oh!

github-actions bot commented Dec 22, 2025

Uh oh!

Uh oh!

Uh oh!

yiz-liu Dec 22, 2025

Uh oh!

weiguihua2 Dec 22, 2025

Uh oh!

github-actions bot commented Dec 24, 2025

Uh oh!

github-actions bot commented Dec 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

weiguihua2 commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

vllm_ascend/worker/model_runner_v1.py (3201-3207)

vllm_ascend/worker/model_runner_v1.py (2709-2713)

Uh oh!

github-actions bot commented Dec 22, 2025

Uh oh!

github-actions bot commented Dec 22, 2025

Uh oh!

github-actions bot commented Dec 22, 2025

Uh oh!

Uh oh!

Uh oh!

yiz-liu Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

weiguihua2 Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 24, 2025

Uh oh!

github-actions bot commented Dec 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

weiguihua2 commented Dec 22, 2025 •

edited

Loading