Skip to content

[Refactor] Provide a framework to accommodate operators for different hardware devices#5735

Merged
weijinqian0 merged 12 commits intovllm-project:mainfrom
weijinqian0:main_for_device_adaptor
Jan 13, 2026
Merged

[Refactor] Provide a framework to accommodate operators for different hardware devices#5735
weijinqian0 merged 12 commits intovllm-project:mainfrom
weijinqian0:main_for_device_adaptor

Conversation

@weijinqian0
Copy link
Copy Markdown
Collaborator

@weijinqian0 weijinqian0 commented Jan 8, 2026

come from: #5463

Reason:

During the iteration process of the hardware version, there may be a large number of iterations for the operators, which can lead to short-term compatibility differences. Therefore, an intermediate adaptation layer is provided to accommodate the short-term differences in operators.

weijinqian_v1 added 6 commits January 8, 2026 13:39
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
@weijinqian0 weijinqian0 changed the title [Refactor] [Refactor] Provide a framework to accommodate operators for different hardware devices Jan 8, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a good refactoring by creating a DeviceOperator abstraction to handle hardware-specific operations, improving code structure and maintainability. The implementation uses a factory pattern to select the correct operator class based on the device type, which is a solid approach.

However, I've identified a critical issue and a high-severity issue in the new vllm_ascend/device/device_op.py file. The critical issue is the removal of a .contiguous() call, which is likely to cause runtime errors on A5 devices. The high-severity issue relates to an incorrect type hint that undermines static analysis and future maintenance. I have provided specific comments and suggestions to address these points.

Comment thread vllm_ascend/device/device_op.py Outdated
Comment thread vllm_ascend/device/device_op.py Outdated
Comment on lines +56 to +57
DeviceOperator: Optional[
CommonDeviceOperator.__class__] = get_device_operator()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The type hint for DeviceOperator is incorrect and misleading for a few reasons:

  1. get_device_operator() never returns None, so Optional is incorrect.
  2. CommonDeviceOperator.__class__ resolves to the generic type, which prevents static type checkers from verifying calls to methods like reshape_and_cache. This can allow typos or signature mismatches to become runtime errors.

I am suggesting the removal of the incorrect type hint. For proper type safety, you should add from typing import Type and then annotate DeviceOperator as DeviceOperator: Type[CommonDeviceOperator] = get_device_operator().

DeviceOperator = get_device_operator()

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 8, 2026

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
@weijinqian0 weijinqian0 added ready read for review ready-for-test start test by label for PR labels Jan 9, 2026
Comment thread vllm_ascend/device/device_op.py Outdated
weijinqian_v1 added 3 commits January 10, 2026 09:39
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: weijinqian0 <1184188277@qq.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
@weijinqian0 weijinqian0 merged commit 1ccb9ac into vllm-project:main Jan 13, 2026
16 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Jan 13, 2026
…to eplb_refactor

* 'main' of https://github.com/vllm-project/vllm-ascend:
  [CI] Unblock 4-cards test (vllm-project#5831)
  [Refactor] Provide a framework to accommodate operators for different hardware devices (vllm-project#5735)
  [Refactor] Modify the binding logic to allocate CPU cores for each NPU card (vllm-project#5555)
  [BugFix] Support setting tp=1 for the Eagle draft model to take effect (vllm-project#5519)
  support triton of mrope (vllm-project#5664)
  [bugfix] A2 Environment Pooling for Memcache Compatibility (vllm-project#5601)
  [Doc] Update community contributors and versioning naming to follow vLLM (vllm-project#5820)
  [Refactor] Add comments for Metadata classes in attention module (vllm-project#5789)
  [Bugfix] bugfix for the order of dummy run pad and sync (vllm-project#5777)
  [CI] Move nightly-a2 test to hk (vllm-project#5807)
  [CI] Show disk usage for CI shared volume (vllm-project#5821)
  Bump actions/checkout from 4 to 6 (vllm-project#5795)
  Bump actions/github-script from 7 to 8 (vllm-project#5796)
  [bugfix](cp) align max_context_chunk to cp_virtual_block_size (vllm-project#5767)
  [bugfix]limit graph replay sync (vllm-project#5761)
  [CI]Add Kimi k2 nightly test (vllm-project#5682)
  [Doc] add tls check to pd disaggregation readme  (vllm-project#5638)
  [CI] adpat v0.13.0 change (vllm-project#5793)
guanguan0308 pushed a commit to guanguan0308/vllm-ascend that referenced this pull request Jan 13, 2026
… hardware devices (vllm-project#5735)

come from: vllm-project#5463

Reason:

During the iteration process of the hardware version, there may be a
large number of iterations for the operators, which can lead to
short-term compatibility differences. Therefore, an intermediate
adaptation layer is provided to accommodate the short-term differences
in operators.


- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

---------

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian0 <1184188277@qq.com>
Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>
guanguan0308 pushed a commit to guanguan0308/vllm-ascend that referenced this pull request Jan 13, 2026
… hardware devices (vllm-project#5735)

come from: vllm-project#5463

Reason:

During the iteration process of the hardware version, there may be a
large number of iterations for the operators, which can lead to
short-term compatibility differences. Therefore, an intermediate
adaptation layer is provided to accommodate the short-term differences
in operators.


- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

---------

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian0 <1184188277@qq.com>
Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>
aipaes pushed a commit to aipaes/vllm-ascend that referenced this pull request Jan 15, 2026
… hardware devices (vllm-project#5735)

come from: vllm-project#5463

Reason:

During the iteration process of the hardware version, there may be a
large number of iterations for the operators, which can lead to
short-term compatibility differences. Therefore, an intermediate
adaptation layer is provided to accommodate the short-term differences
in operators.


- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

---------

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian0 <1184188277@qq.com>
Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>
@weijinqian0
Copy link
Copy Markdown
Collaborator Author

weijinqian0 commented Jan 27, 2026

epoch3

中文测试数据集

draft-vocab_size 首token接受度 接受长度
37984 0.27 1.33
75968 0.36 1.45
113952 0.36 1.46

英文测试数据集

draft-vocab_size 首token接受度 接受长度
37984 0.39 1.51
75968 0.51 1.77
113952 0.51 1.80

starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
… hardware devices (vllm-project#5735)

come from: vllm-project#5463

Reason:

During the iteration process of the hardware version, there may be a
large number of iterations for the operators, which can lead to
short-term compatibility differences. Therefore, an intermediate
adaptation layer is provided to accommodate the short-term differences
in operators.


- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

---------

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian0 <1184188277@qq.com>
Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
… hardware devices (vllm-project#5735)

come from: vllm-project#5463

Reason:

During the iteration process of the hardware version, there may be a
large number of iterations for the operators, which can lead to
short-term compatibility differences. Therefore, an intermediate
adaptation layer is provided to accommodate the short-term differences
in operators.

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

---------

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian0 <1184188277@qq.com>
Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
… hardware devices (vllm-project#5735)

come from: vllm-project#5463

Reason:

During the iteration process of the hardware version, there may be a
large number of iterations for the operators, which can lead to
short-term compatibility differences. Therefore, an intermediate
adaptation layer is provided to accommodate the short-term differences
in operators.


- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

---------

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian0 <1184188277@qq.com>
Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
… hardware devices (vllm-project#5735)

come from: vllm-project#5463

Reason:

During the iteration process of the hardware version, there may be a
large number of iterations for the operators, which can lead to
short-term compatibility differences. Therefore, an intermediate
adaptation layer is provided to accommodate the short-term differences
in operators.

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

---------

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian0 <1184188277@qq.com>
Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
… hardware devices (vllm-project#5735)

come from: vllm-project#5463

Reason:

During the iteration process of the hardware version, there may be a
large number of iterations for the operators, which can lead to
short-term compatibility differences. Therefore, an intermediate
adaptation layer is provided to accommodate the short-term differences
in operators.


- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

---------

Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Signed-off-by: weijinqian0 <1184188277@qq.com>
Co-authored-by: weijinqian_v1 <weijinqian@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants