Skip to content

[Feature] support aclgraph for model runner v2#7110

Merged
wangxiyuan merged 73 commits intovllm-project:mainfrom
Ronald1995:acl_graph
Mar 13, 2026
Merged

[Feature] support aclgraph for model runner v2#7110
wangxiyuan merged 73 commits intovllm-project:mainfrom
Ronald1995:acl_graph

Conversation

@Ronald1995
Copy link
Copy Markdown
Contributor

@Ronald1995 Ronald1995 commented Mar 10, 2026

What this PR does / why we need it?

This PR aims to support aclgraph for model runner v2, please see RFC #5208. The PR contains these modifications:

  • adapt to newest commit of vllm main branch.
  • supply a unified interface of extra forward context for both model runner v1 and model runner v2.
  • implement graph mode for main model.

Does this PR introduce any user-facing change?

no

How was this patch tested?

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Comment thread vllm_ascend/worker/v2/aclgraph_utils.py Outdated
@realliujiaxu
Copy link
Copy Markdown
Collaborator

Suggestion: Simplify _ExtraForwardContextProxy

To reduce boilerplate from ExtraForwardContext.xxx() wrappers, we can use a minimal proxy with attribute-style access.

# ascend_forward_context.py - simplified proxy

class _ExtraForwardContextProxy:
    """Unified forward-context access for v1/v2 model runners."""

    @staticmethod
    def _ctx():
        return get_forward_context()

    def __getattr__(self, name: str) -> Any:
        ctx = self._ctx()
        if envs_vllm.VLLM_USE_V2_MODEL_RUNNER:
            return ctx.additional_kwargs[name]
        return getattr(ctx, name)

    def __setattr__(self, name: str, value: Any) -> None:
        ctx = self._ctx()
        if envs_vllm.VLLM_USE_V2_MODEL_RUNNER:
            ctx.additional_kwargs[name] = value
        else:
            setattr(ctx, name, value)


# usage: from vllm_ascend.ascend_forward_context import extra_ctx
extra_ctx = _ExtraForwardContextProxy()

Call comparison

Current (this PR) With proxy
ExtraForwardContext.num_tokens() extra_ctx.num_tokens
ExtraForwardContext.moe_comm_method() extra_ctx.moe_comm_method
ExtraForwardContext.set_num_tokens(val) extra_ctx.num_tokens = val
ExtraForwardContext.is_draft_model() extra_ctx.is_draft_model

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
@yiz-liu yiz-liu added ready read for review ready-for-test start test by label for PR labels Mar 12, 2026
@Ronald1995
Copy link
Copy Markdown
Contributor Author

Suggestion: Simplify _ExtraForwardContextProxy

To reduce boilerplate from ExtraForwardContext.xxx() wrappers, we can use a minimal proxy with attribute-style access.

# ascend_forward_context.py - simplified proxy

class _ExtraForwardContextProxy:
    """Unified forward-context access for v1/v2 model runners."""

    @staticmethod
    def _ctx():
        return get_forward_context()

    def __getattr__(self, name: str) -> Any:
        ctx = self._ctx()
        if envs_vllm.VLLM_USE_V2_MODEL_RUNNER:
            return ctx.additional_kwargs[name]
        return getattr(ctx, name)

    def __setattr__(self, name: str, value: Any) -> None:
        ctx = self._ctx()
        if envs_vllm.VLLM_USE_V2_MODEL_RUNNER:
            ctx.additional_kwargs[name] = value
        else:
            setattr(ctx, name, value)


# usage: from vllm_ascend.ascend_forward_context import extra_ctx
extra_ctx = _ExtraForwardContextProxy()

Call comparison

Current (this PR) With proxy
ExtraForwardContext.num_tokens() extra_ctx.num_tokens
ExtraForwardContext.moe_comm_method() extra_ctx.moe_comm_method
ExtraForwardContext.set_num_tokens(val) extra_ctx.num_tokens = val
ExtraForwardContext.is_draft_model() extra_ctx.is_draft_model

good idea, i have refined code by your suggestion.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@wangxiyuan wangxiyuan merged commit c980e68 into vllm-project:main Mar 13, 2026
38 checks passed
Nagisa125 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Mar 17, 2026
### What this PR does / why we need it?
This PR aims to support aclgraph for model runner v2, please see RFC
vllm-project#5208. The PR contains these modifications:
- adapt to newest commit of vllm main branch.
- supply a unified interface of extra forward context for both model
runner v1 and model runner v2.
- implement graph mode for main model. 

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?

- vLLM version: v0.16.0
- vLLM main:
vllm-project/vllm@4034c3d

---------

Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
winson-00178005 added a commit to winson-00178005/vllm-ascend that referenced this pull request Mar 26, 2026
- Remove is_skipped flag from tests/e2e/singlecard/model_runner_v2/test_basic.py
- Test was originally skipped due to get_cuda_view_from_cpu_tensor error (vllm-project#5752)
- Recent model_runner_v2 improvements may have resolved the issue:
  - vllm-project#7110: Added aclgraph support
  - vllm-project#7496: Optimized post_update performance
  - vllm-project#7221: Optimized _topk_log_softmax_kernel performance
- CI will verify if the test now passes successfully

Signed-off-by: hejianping <hejianping7@huawei.com>
winson-00178005 added a commit to winson-00178005/vllm-ascend that referenced this pull request Mar 26, 2026
- Remove is_skipped flag from tests/e2e/singlecard/model_runner_v2/test_basic.py
- Test was originally skipped due to get_cuda_view_from_cpu_tensor error (vllm-project#5752)
- Recent model_runner_v2 improvements may have resolved the issue:
  - vllm-project#7110: Added aclgraph support
  - vllm-project#7496: Optimized post_update performance
  - vllm-project#7221: Optimized _topk_log_softmax_kernel performance
- CI will verify if test now passes successfully

Signed-off-by: hejianping <hejianping7@huawei.com>
winson-00178005 added a commit to winson-00178005/vllm-ascend that referenced this pull request Mar 26, 2026
- Remove is_skipped flag from tests/e2e/singlecard/model_runner_v2/test_basic.py
- Test was originally skipped due to get_cuda_view_from_cpu_tensor error (vllm-project#5752)
- Recent model_runner_v2 improvements may have resolved the issue:
  - vllm-project#7110: Added aclgraph support
  - vllm-project#7496: Optimized post_update performance
  - vllm-project#7221: Optimized _topk_log_softmax_kernel performance
- CI will verify if the test now passes successfully

Signed-off-by: hejianping <hejianping7@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation module:core module:ops module:quantization ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants