Add monkey patch by wangxiyuan · Pull Request #24 · vllm-project/vllm-ascend

wangxiyuan · 2025-02-10T01:57:13Z

Some PR for plugin support is not merged by vllm yet. This PR add monkey patch to vllm-ascend to make vllm-ascend work with vllm directly.

This patch code should be removed once the related function is supported by vllm originally.

MengqingCao · 2025-02-10T02:40:22Z

vllm_ascend/patch/patch_commnicator.py

+        from vllm.platforms import current_platform
+        device_comm_cls = resolve_obj_by_qualname(
+            current_platform.get_device_communicator_cls())
+        self.communicator = device_comm_cls(group=self.device_group,


I think we should check if use_xxx_communicator (any is fine because they remain the same) and world_size > 1 is true before creating communicator.
https://github.com/vllm-project/vllm/blob/main/vllm/distributed/parallel_state.py#L167-L169

Besides model parallel group, there will be a world group, which won't use any device communication. Adding this check will reduce time when creating the world group.

I added world_size check in the new Patch. There is no use_xxx_communicator in vllm.

I mean use_tpu_communicator, use_xpu_communicator or use_hpu_communicator, any one of them is ok

They are checked in supper.init, right?

For example, the check of use_tpu_communicator in supper.init only work for tpu_communicator, we use it here for npu communicator, because there is no bool value for npu to control this check.
I think we could just use use_tpu_communicator as all the use_xxx_communicator remains the same in vLLM

I got your idea. Thanks. i'll update then

vllm_ascend/patch/patch_commnicator.py

Yikun · 2025-02-10T06:44:44Z

vllm_ascend/patch/patch_commnicator.py

+        from vllm.platforms import current_platform
+        device_comm_cls = resolve_obj_by_qualname(
+            current_platform.get_device_communicator_cls())
+        self.communicator = device_comm_cls(group=self.device_group,


does this still depends on the vllm-project/vllm CommunicatorBase? Seems CommunicatorBase should also move to vllm-ascend?

https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/communicator.py#L21

Removed CommunicatorBase in the new patchset

Yikun · 2025-02-10T06:48:13Z

vllm_ascend/patch/patch_commnicator.py

+# Remove this file when vllm support by
+# https://github.com/vllm-project/vllm/pull/11324.
+
+from vllm.distributed.parallel_state import GroupCoordinator


unrelated but just curious: should vllm be a dependency of vllm-ascend as oneline in requriement and pyproject?

emm. Let's have a try. we can add it.

While IMO, it maybe raises error because there is no CPU version of pytorch on pypi.

Once it's added, the install step in the future from my sight is:

install cpu version of Pytorch by hand. (torch==2.5.1+cpu)

pip install vllm-ascend

no warries, we can do it in followup

wuhuikx · 2025-02-10T08:11:20Z

vllm_ascend/communicator.py

        dist.all_reduce(x, group=self.group)
        return x
+
+    def gather(self, input_: torch.Tensor, dst: int = 0, dim: int = -1):


do we have any UT to check the functionality?

communicator test need more than one NPU card which is not supported by current CI. We're working on multi card support for CI system.

In this comment, we need test this PR by hand locally and be careful to merge it.

wuhuikx · 2025-02-10T08:11:40Z

vllm_ascend/communicator.py

+            output_tensor = None
+        return output_tensor
+
+    def all_gather(self, input_: torch.Tensor, dim: int = -1) -> torch.Tensor:


wangxiyuan · 2025-02-10T08:33:01Z

Do not merge until it's fully tested locally. Thanks.

Yikun · 2025-02-10T11:22:40Z

vllm-ascend/mypy.ini

Lines 12 to 14 in 7006835

    
           ; Remove this after https://github.com/vllm-project/vllm/pull/11324 merged 
        
           [mypy-vllm.distributed.device_communicators.base_communicator] 
        
           ignore_missing_imports = True

This should also be removed

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

Yikun

LGTM if it passed in multi-card env

wangxiyuan · 2025-02-11T02:57:29Z

See #30

### What this PR does / why we need it? - Remove on communicator mypy to address: #24 (comment) - Add mypy.ini to trigger list ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>

…m-project#45) ### What this PR does / why we need it? - Remove on communicator mypy to address: vllm-project#24 (comment) - Add mypy.ini to trigger list ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <yikunkero@gmail.com>

…q_pr_new fix pre-commit + fix comment

mooncake store connector support ipv6 Signed-off-by: CHEN <116010019@link.cuhk.edu.cn>

* [Feature][EPD] Supports EPD (vllm-project#13) * epd shm * epd shm * epd shm --------- Signed-off-by: wuhang <wuhang6@huawei.com> Co-authored-by: amy-why-3459 <wuhaiyan17@huawei.com> Co-authored-by: wuhang <wuhang6@huawei.com> * [Bugfix] change the input param for func maybe_save_ec_to_connector [Bugfix] change the input param for func maybe_save_ec_to_connector * [Bugfix] change the wrong input param name for func maybe_save_ec_to_connector (vllm-project#15) * [Feature]Mooncake store ECConnector: wait for ec save (vllm-project#16) Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> * support kv mooncake store connector (vllm-project#22) * wait for ec save Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> * KV mooncake store connector Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> --------- Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> * mooncake store connector support ipv6 Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> * adapt EC connector Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> * support kv mooncake store connector (vllm-project#22) * wait for ec save Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> * KV mooncake store connector Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> --------- Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> * kv mooncake store connector support Ipv6 (vllm-project#24) mooncake store connector support ipv6 Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> * Implement yuanrong-datasystem connector. * adapt EC connector Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> * adapt to ascend v0.11.0rc2 Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> * fix precommit Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> fix precommit Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> fix precommit Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> fix precommit Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> fix precommit Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> * fix precommit Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> * adapt to v0.11.0rc2 * fix precommit Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> --------- Signed-off-by: wuhang <wuhang6@huawei.com> Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> Co-authored-by: John Liu BUAA <liukecheng97@gmail.com> Co-authored-by: amy-why-3459 <wuhaiyan17@huawei.com> Co-authored-by: wuhang <wuhang6@huawei.com> Co-authored-by: Zeng Chuang <zengchuang3@huawei.com> Co-authored-by: yangsonglin <yangsonglin0821@163.com>

* 适配chunked prefill * 适配mla BNSD * 适配mla 2048 mask

…vllm-project#25) This reverts commit d74fed6.

…atch sunset plan - Move _c8_kv_scale_weight_loader and AscendC8KVCacheAttentionMethod from w8a8_dynamic.py into kv_c8.py so all KV-cache C8 quantization methods (both MLA/FAKQuant and dense-attention/QuaRot) live in one place - Update import paths in modelslim_config.py, attention_v1.py, and tests - Clean up patch_qwen3_c8.py: remove redundant module docstring to match the style of other patch files (license header only) - Add entry vllm-project#24 for patch_qwen3_c8.py in patch/__init__.py with Why/How/ Related PR/Future Plan following the established sunset-plan format Signed-off-by: lico67373 <918688502@qq.com> Made-with: Cursor

Adds C8 (INT8) KV cache quantization support for standard GQA attention models (e.g., Qwen3-32B W8A8C8). C8 uses static per-channel scales to store KV cache in INT8, reducing KV cache memory by ~50% compared to BF16, enabling higher batch concurrency and longer context on the same hardware. Key changes: - attention_v1.py: new AscendC8AttentionBackendImpl subclass with _prepare_c8_scales, _quantize_kv_to_int8, _forward_c8_decode, _forward_c8_chunked_prefill, and _forward_c8_fused_infer_attention. - kv_c8.py: AscendC8KVCacheAttentionMethod creates k/v_cache_scale/ offset parameters and upgrades the attention impl via class surgery; _c8_kv_scale_weight_loader handles per-channel scale shapes. - modelslim_config.py: activates C8 branch when kv_cache_type == "C8" in quant_model_description.json. - patch_qwen3_c8.py: intercepts C8 scale/offset weights before AutoWeightsLoader discards them. - patch/__init__.py: documents patch vllm-project#24 with a sunset plan. - tests/ut/quantization/test_kv_c8.py: unit tests for all C8 helpers. Signed-off-by: lico67373 <918688502@qq.com> Made-with: Cursor

Adds C8 (INT8) KV cache quantization support for standard GQA attention models (e.g., Qwen3-32B W8A8C8). C8 uses static per-channel scales to store KV cache in INT8, reducing KV cache memory by ~50% compared to BF16, enabling higher batch concurrency and longer context on the same hardware. Key changes: - attention_v1.py: new AscendC8AttentionBackendImpl subclass with _prepare_c8_scales, _quantize_kv_to_int8, _forward_c8_decode, _forward_c8_chunked_prefill, and _forward_c8_fused_infer_attention. - kv_c8.py: AscendC8KVCacheAttentionMethod creates k/v_cache_scale/ offset parameters and upgrades the attention impl via class surgery; _c8_kv_scale_weight_loader handles per-channel scale shapes. - modelslim_config.py: activates C8 branch when kv_cache_type == "C8" in quant_model_description.json. - patch_qwen3_c8.py: intercepts C8 scale/offset weights before AutoWeightsLoader discards them. - patch/__init__.py: documents patch vllm-project#24 with a sunset plan. - tests/ut/quantization/test_kv_c8.py: unit tests for all C8 helpers. - tests/ut/quantization/test_modelslim_config.py: C8 branch coverage. Signed-off-by: lico67373 <918688502@qq.com> Made-with: Cursor

wangxiyuan force-pushed the add_patch branch from 02dce4b to 6fe137e Compare February 10, 2025 02:00

wangxiyuan changed the title ~~Add monckey patch~~ Add monkey patch Feb 10, 2025

wangxiyuan force-pushed the add_patch branch from 6fe137e to 30b9edf Compare February 10, 2025 02:11

MengqingCao reviewed Feb 10, 2025

View reviewed changes

Yikun reviewed Feb 10, 2025

View reviewed changes

wangxiyuan force-pushed the add_patch branch 2 times, most recently from 4da98ee to 57f3aca Compare February 10, 2025 07:50

wuhuikx reviewed Feb 10, 2025

View reviewed changes

wuhuikx approved these changes Feb 10, 2025

View reviewed changes

Add monckey patch

07f2a16

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

wangxiyuan force-pushed the add_patch branch from 57f3aca to 07f2a16 Compare February 10, 2025 11:25

Yikun approved these changes Feb 11, 2025

View reviewed changes

wangxiyuan closed this Feb 11, 2025

wangxiyuan deleted the add_patch branch February 11, 2025 02:46

wangxiyuan restored the add_patch branch February 11, 2025 02:53

Yikun mentioned this pull request Feb 11, 2025

[FOLLOWUP][Misc] Remove unused mypy config for base_communicator #45

Merged

zhangsicheng5 pushed a commit to zhangsicheng5/vllm-ascend that referenced this pull request Sep 18, 2025

Merge pull request vllm-project#24 from Apocalypse990923-qshi/long_se…

f234a3c

…q_pr_new fix pre-commit + fix comment

leijie-ww mentioned this pull request Oct 17, 2025

[Bug]: vllm:EngineCore process coredump while testing TextVQA dataset for both Qwen3-VL-30B-A3B-Instruct and Qwen2.5-VL-7B-Instruct #3513

Closed

wuhang2014 pushed a commit to wuhang2014/vllm-ascend that referenced this pull request Nov 27, 2025

kv mooncake store connector support Ipv6 (vllm-project#24)

0795875

mooncake store connector support ipv6 Signed-off-by: CHEN <116010019@link.cuhk.edu.cn>

NickJudyHvv pushed a commit to NickJudyHvv/vllm-ascend that referenced this pull request Mar 2, 2026

适配mla chunked prefill&规避mla bnsd_nbsd不支持问题 (vllm-project#24)

d74fed6

* 适配chunked prefill * 适配mla BNSD * 适配mla 2048 mask

NickJudyHvv pushed a commit to NickJudyHvv/vllm-ascend that referenced this pull request Mar 2, 2026

Revert "适配mla chunked prefill&规避mla bnsd_nbsd不支持问题 (vllm-project#24)" (…

8423dc6

…vllm-project#25) This reverts commit d74fed6.

Conversation

wangxiyuan commented Feb 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Yikun Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangxiyuan commented Feb 10, 2025

Uh oh!

Yikun commented Feb 10, 2025

Uh oh!

Yikun left a comment

Choose a reason for hiding this comment

Uh oh!

wangxiyuan commented Feb 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Yikun Feb 10, 2025 •

edited

Loading

wangxiyuan Feb 10, 2025 •

edited

Loading