[Feature] support aclgraph for model runner v2 by Ronald1995 · Pull Request #7110 · vllm-project/vllm-ascend

Ronald1995 · 2026-03-10T06:56:16Z

What this PR does / why we need it?

This PR aims to support aclgraph for model runner v2, please see RFC #5208. The PR contains these modifications:

adapt to newest commit of vllm main branch.
supply a unified interface of extra forward context for both model runner v1 and model runner v2.
implement graph mode for main model.

Does this PR introduce any user-facing change?

no

How was this patch tested?

vLLM version: v0.16.0
vLLM main: vllm-project/vllm@4034c3d

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

realliujiaxu · 2026-03-11T08:47:01Z

Suggestion: Simplify _ExtraForwardContextProxy

To reduce boilerplate from ExtraForwardContext.xxx() wrappers, we can use a minimal proxy with attribute-style access.

# ascend_forward_context.py - simplified proxy

class _ExtraForwardContextProxy:
    """Unified forward-context access for v1/v2 model runners."""

    @staticmethod
    def _ctx():
        return get_forward_context()

    def __getattr__(self, name: str) -> Any:
        ctx = self._ctx()
        if envs_vllm.VLLM_USE_V2_MODEL_RUNNER:
            return ctx.additional_kwargs[name]
        return getattr(ctx, name)

    def __setattr__(self, name: str, value: Any) -> None:
        ctx = self._ctx()
        if envs_vllm.VLLM_USE_V2_MODEL_RUNNER:
            ctx.additional_kwargs[name] = value
        else:
            setattr(ctx, name, value)


# usage: from vllm_ascend.ascend_forward_context import extra_ctx
extra_ctx = _ExtraForwardContextProxy()

Call comparison

Current (this PR)	With proxy
`ExtraForwardContext.num_tokens()`	`extra_ctx.num_tokens`
`ExtraForwardContext.moe_comm_method()`	`extra_ctx.moe_comm_method`
`ExtraForwardContext.set_num_tokens(val)`	`extra_ctx.num_tokens = val`
`ExtraForwardContext.is_draft_model()`	`extra_ctx.is_draft_model`

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

Ronald1995 · 2026-03-12T06:29:11Z

Suggestion: Simplify _ExtraForwardContextProxy

To reduce boilerplate from ExtraForwardContext.xxx() wrappers, we can use a minimal proxy with attribute-style access.

# ascend_forward_context.py - simplified proxy

class _ExtraForwardContextProxy:
    """Unified forward-context access for v1/v2 model runners."""

    @staticmethod
    def _ctx():
        return get_forward_context()

    def __getattr__(self, name: str) -> Any:
        ctx = self._ctx()
        if envs_vllm.VLLM_USE_V2_MODEL_RUNNER:
            return ctx.additional_kwargs[name]
        return getattr(ctx, name)

    def __setattr__(self, name: str, value: Any) -> None:
        ctx = self._ctx()
        if envs_vllm.VLLM_USE_V2_MODEL_RUNNER:
            ctx.additional_kwargs[name] = value
        else:
            setattr(ctx, name, value)


# usage: from vllm_ascend.ascend_forward_context import extra_ctx
extra_ctx = _ExtraForwardContextProxy()

Call comparison

Current (this PR) With proxy
ExtraForwardContext.num_tokens() extra_ctx.num_tokens
ExtraForwardContext.moe_comm_method() extra_ctx.moe_comm_method
ExtraForwardContext.set_num_tokens(val) extra_ctx.num_tokens = val
ExtraForwardContext.is_draft_model() extra_ctx.is_draft_model

good idea, i have refined code by your suggestion.

github-actions · 2026-03-12T07:52:56Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

### What this PR does / why we need it? This PR aims to support aclgraph for model runner v2, please see RFC vllm-project#5208. The PR contains these modifications: - adapt to newest commit of vllm main branch. - supply a unified interface of extra forward context for both model runner v1 and model runner v2. - implement graph mode for main model. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? - vLLM version: v0.16.0 - vLLM main: vllm-project/vllm@4034c3d --------- Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

- Remove is_skipped flag from tests/e2e/singlecard/model_runner_v2/test_basic.py - Test was originally skipped due to get_cuda_view_from_cpu_tensor error (vllm-project#5752) - Recent model_runner_v2 improvements may have resolved the issue: - vllm-project#7110: Added aclgraph support - vllm-project#7496: Optimized post_update performance - vllm-project#7221: Optimized _topk_log_softmax_kernel performance - CI will verify if the test now passes successfully Signed-off-by: hejianping <hejianping7@huawei.com>

- Remove is_skipped flag from tests/e2e/singlecard/model_runner_v2/test_basic.py - Test was originally skipped due to get_cuda_view_from_cpu_tensor error (vllm-project#5752) - Recent model_runner_v2 improvements may have resolved the issue: - vllm-project#7110: Added aclgraph support - vllm-project#7496: Optimized post_update performance - vllm-project#7221: Optimized _topk_log_softmax_kernel performance - CI will verify if test now passes successfully Signed-off-by: hejianping <hejianping7@huawei.com>

- Remove is_skipped flag from tests/e2e/singlecard/model_runner_v2/test_basic.py - Test was originally skipped due to get_cuda_view_from_cpu_tensor error (vllm-project#5752) - Recent model_runner_v2 improvements may have resolved the issue: - vllm-project#7110: Added aclgraph support - vllm-project#7496: Optimized post_update performance - vllm-project#7221: Optimized _topk_log_softmax_kernel performance - CI will verify if the test now passes successfully Signed-off-by: hejianping <hejianping7@huawei.com>

Ronald1995 added 30 commits February 27, 2026 15:43

set_weight_prefetch_method in model runner v2

bd906fb

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

Merge branch 'main' into acl_graph

71db8cb

fix eagle import error

00e2104

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

update supported vllm version

d328742

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

adapt to build_attn_metadata

55692de

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix AclGraphManager args missing

9dcc2f7

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix AscendSampler args missing

812083f

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix sampler error

955c3e8

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix add_request error

7e391b9

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix prepare_prefill_inputs args error

7269858

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix set_stream error

08b0843

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

support full decode mode

c76287b

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

change calling method of forward_context attribute

bca6fe1

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix flash_comm_v1_enabled

28cd2ae

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

refactor ExtraForwardContext

f2ce655

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix moe_comm_method import error

631bac1

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

patch mem_get_info

7871920

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix capture_graph args error

16f5593

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix prepare_capture_inputs_wrapper error

d930945

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

minor fix of prepare_capture_inputs_wrapper

266e02b

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

update full graph params

b893bd9

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix cudagraph_manager args error

0609590

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix get_forward_context error

b97fe83

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

add comment

81bc2fc

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix position error

35c61c2

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix positions error

7cf1e92

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix wrapper error

9404394

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix wrapper error

7d55a62

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix wrapper error

5e20140

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix attn_backend error

b318c05

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

Ronald1995 added 6 commits March 11, 2026 09:59

make attention_v1 same as main branch

fe40211

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix mypy error

0c927f9

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix mypy error

8e6cba3

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

set_mc2_tokens_capacity in model runner

043d1b1

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix isort error

e75399e

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix mypy error

4373d98

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

Ronald1995 mentioned this pull request Mar 11, 2026

[MODELRUNNERV2]fix penality ops #7013

Merged

Ronald1995 added 4 commits March 11, 2026 14:42

mock ascend_forward_context.get_forward_context in ut

ae6ea0d

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix ut of acl_graph

9702fd8

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix ut error

a0993bb

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

fix ut error

02131f8

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

yiz-liu reviewed Mar 11, 2026

View reviewed changes

Comment thread vllm_ascend/worker/v2/aclgraph_utils.py Outdated

weijinqian0 approved these changes Mar 11, 2026

View reviewed changes

Ronald1995 added 2 commits March 12, 2026 11:03

refine ExtraForwardContext

c59ea67

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

add comment

ce07e8c

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>

yiz-liu added ready read for review ready-for-test start test by label for PR labels Mar 12, 2026

github-actions bot added the merge-conflicts label Mar 12, 2026

Merge branch 'main' into acl_graph

1e09364

github-actions bot removed the merge-conflicts label Mar 12, 2026

Ronald1995 added 2 commits March 12, 2026 17:36

Merge branch 'main' into acl_graph

95d1517

Merge branch 'main' into acl_graph

e0fe30d

wangxiyuan merged commit c980e68 into vllm-project:main Mar 13, 2026
38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] support aclgraph for model runner v2#7110

[Feature] support aclgraph for model runner v2#7110
wangxiyuan merged 73 commits intovllm-project:mainfrom
Ronald1995:acl_graph

Ronald1995 commented Mar 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

realliujiaxu commented Mar 11, 2026

Uh oh!

Ronald1995 commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Ronald1995 commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

realliujiaxu commented Mar 11, 2026

Uh oh!

Ronald1995 commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Ronald1995 commented Mar 10, 2026 •

edited

Loading