[Feature] support aclgraph for model runner v2#7110
[Feature] support aclgraph for model runner v2#7110wangxiyuan merged 73 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
|
Suggestion: Simplify To reduce boilerplate from # ascend_forward_context.py - simplified proxy
class _ExtraForwardContextProxy:
"""Unified forward-context access for v1/v2 model runners."""
@staticmethod
def _ctx():
return get_forward_context()
def __getattr__(self, name: str) -> Any:
ctx = self._ctx()
if envs_vllm.VLLM_USE_V2_MODEL_RUNNER:
return ctx.additional_kwargs[name]
return getattr(ctx, name)
def __setattr__(self, name: str, value: Any) -> None:
ctx = self._ctx()
if envs_vllm.VLLM_USE_V2_MODEL_RUNNER:
ctx.additional_kwargs[name] = value
else:
setattr(ctx, name, value)
# usage: from vllm_ascend.ascend_forward_context import extra_ctx
extra_ctx = _ExtraForwardContextProxy()Call comparison
|
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
good idea, i have refined code by your suggestion. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
### What this PR does / why we need it? This PR aims to support aclgraph for model runner v2, please see RFC vllm-project#5208. The PR contains these modifications: - adapt to newest commit of vllm main branch. - supply a unified interface of extra forward context for both model runner v1 and model runner v2. - implement graph mode for main model. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? - vLLM version: v0.16.0 - vLLM main: vllm-project/vllm@4034c3d --------- Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
- Remove is_skipped flag from tests/e2e/singlecard/model_runner_v2/test_basic.py - Test was originally skipped due to get_cuda_view_from_cpu_tensor error (vllm-project#5752) - Recent model_runner_v2 improvements may have resolved the issue: - vllm-project#7110: Added aclgraph support - vllm-project#7496: Optimized post_update performance - vllm-project#7221: Optimized _topk_log_softmax_kernel performance - CI will verify if the test now passes successfully Signed-off-by: hejianping <hejianping7@huawei.com>
- Remove is_skipped flag from tests/e2e/singlecard/model_runner_v2/test_basic.py - Test was originally skipped due to get_cuda_view_from_cpu_tensor error (vllm-project#5752) - Recent model_runner_v2 improvements may have resolved the issue: - vllm-project#7110: Added aclgraph support - vllm-project#7496: Optimized post_update performance - vllm-project#7221: Optimized _topk_log_softmax_kernel performance - CI will verify if test now passes successfully Signed-off-by: hejianping <hejianping7@huawei.com>
- Remove is_skipped flag from tests/e2e/singlecard/model_runner_v2/test_basic.py - Test was originally skipped due to get_cuda_view_from_cpu_tensor error (vllm-project#5752) - Recent model_runner_v2 improvements may have resolved the issue: - vllm-project#7110: Added aclgraph support - vllm-project#7496: Optimized post_update performance - vllm-project#7221: Optimized _topk_log_softmax_kernel performance - CI will verify if the test now passes successfully Signed-off-by: hejianping <hejianping7@huawei.com>
What this PR does / why we need it?
This PR aims to support aclgraph for model runner v2, please see RFC #5208. The PR contains these modifications:
Does this PR introduce any user-facing change?
no
How was this patch tested?