Skip to content

Revert "[Feat] enable hierarchical mc2 ops on A2 by default (#5545)"#5611

Merged
wangxiyuan merged 1 commit intovllm-project:mainfrom
Toneymiller:revert_mc2
Jan 5, 2026
Merged

Revert "[Feat] enable hierarchical mc2 ops on A2 by default (#5545)"#5611
wangxiyuan merged 1 commit intovllm-project:mainfrom
Toneymiller:revert_mc2

Conversation

@Toneymiller
Copy link
Copy Markdown
Contributor

@Toneymiller Toneymiller commented Jan 5, 2026

This reverts commit fb9fdcd.

What this PR does / why we need it?

this pr breaks the smoke test because of that leads the error of aclnnNeScalar:Kernel Run failed. opType: 25, NotEqual
launch failed for NotEqual, errno:361001
A6C9453D-4F0B-4256-DD80-A9C181DAB2D9

Does this PR introduce any user-facing change?

How was this patch tested?

…ject#5545)"

This reverts commit fb9fdcd.

Signed-off-by: zxwang <1476209578@qq.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request reverts a previous commit that enabled hierarchical mc2 ops on A2 devices by default. The changes reintroduce environment variable checks to conditionally enable this feature, making it non-default. My review focuses on ensuring the reverted code is robust. I've identified a potential issue where the feature could be accidentally enabled on non-A2 devices, and I've suggested a fix to make the check more specific.

Comment thread vllm_ascend/utils.py
Comment on lines +989 to +991
def is_hierarchical_communication_enabled():
return (os.getenv("HCCL_INTRA_ROCE_ENABLE", "") == "0"
and os.getenv("HCCL_INTRA_PCIE_ENABLE", "") == "1")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The comments in both this file and token_dispatcher.py strongly suggest that this hierarchical communication feature is specifically for A2 devices. To prevent it from being accidentally enabled on other device types if the environment variables are set, which could lead to unexpected behavior or performance issues, it's safer to add an explicit device type check here.

Suggested change
def is_hierarchical_communication_enabled():
return (os.getenv("HCCL_INTRA_ROCE_ENABLE", "") == "0"
and os.getenv("HCCL_INTRA_PCIE_ENABLE", "") == "1")
def is_hierarchical_communication_enabled():
return (get_ascend_device_type() == AscendDeviceType.A2 and
os.getenv("HCCL_INTRA_ROCE_ENABLE", "") == "0" and os.getenv("HCCL_INTRA_PCIE_ENABLE", "") == "1")

@wangxiyuan
Copy link
Copy Markdown
Collaborator

Please add revert reason in commit message

@Toneymiller
Copy link
Copy Markdown
Contributor Author

Please add revert reason in commit message

done

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 5, 2026

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@wangxiyuan wangxiyuan merged commit 8eae949 into vllm-project:main Jan 5, 2026
19 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Jan 6, 2026
…to FIA_rebase

* 'main' of https://github.com/vllm-project/vllm-ascend: (58 commits)
  [Main2Main] Upgrade vllm commit to 0106 (vllm-project#5617)
  [CI]update bisheng version (vllm-project#5621)
  [UT][PCP&DCP] UT for block_table.py (vllm-project#5032)
  [Main2Main] Upgrade vllm commit to 0105 (vllm-project#5595)
  [CI] mv ops to correct path (vllm-project#5615)
  [BugFix] Fix Smoke Testing Bug for DSR1 longseq (vllm-project#5613)
  Revert "[Feat] enable hierarchical mc2 ops on A2 by default (vllm-project#5545)" (vllm-project#5611)
  [TRITON][TEST]Add nightly test for triton split_qkv_rmsnorm_rope (vllm-project#5267)
  [perf] Fix MLAPO weight disposal for KV-consumer MLA in PD-mix deploy... (vllm-project#5192)
  [docs] Correct image about prefill phase of PCP (vllm-project#5598)
  [CI] update triton-ascend version (vllm-project#5584)
  [P/D]Remove mooncake kvpool unused parameter `local_hostname` (vllm-project#5574)
  [Bugfix] record cos and sin cache in AscendRotaryEmbedding (vllm-project#5516)
  [bugfix] fix test_camem failed with triton-ascend (vllm-project#5492)
  [UT]add triton ops ut :  test_fused_qkvzba_split_reshape_cat (vllm-project#5474)
  [CI] Download models from ms (vllm-project#5405)
  Docs: Add A3 Docker image guidance for Atlas A3 machines (vllm-project#5256)
  [Doc] Add NNAL installation guide and requirements (vllm-project#5235)
  Add the requirement of arctic-inference which  speculative decoding with suffix_decode  (vllm-project#5045)
  [BugFix][Fusion] Fix graph fusion failure problem (vllm-project#5253)
  ...
Rozwel-dx pushed a commit to Rozwel-dx/vllm-ascend that referenced this pull request Jan 8, 2026
…ject#5545)" (vllm-project#5611)

This reverts commit fb9fdcd.

### What this PR does / why we need it?
this pr breaks the smoke test because of that leads the error of
aclnnNeScalar:Kernel Run failed. opType: 25, NotEqual
        launch failed for NotEqual, errno:361001
<img width="1149" height="166"
alt="A6C9453D-4F0B-4256-DD80-A9C181DAB2D9"
src="https://github.com/user-attachments/assets/cab9c4b8-3fd1-4c6b-b424-474b46042726"
/>

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

Signed-off-by: zxwang <1476209578@qq.com>
aipaes pushed a commit to aipaes/vllm-ascend that referenced this pull request Jan 15, 2026
…ject#5545)" (vllm-project#5611)

This reverts commit fb9fdcd.

### What this PR does / why we need it?
this pr breaks the smoke test because of that leads the error of
aclnnNeScalar:Kernel Run failed. opType: 25, NotEqual
        launch failed for NotEqual, errno:361001
<img width="1149" height="166"
alt="A6C9453D-4F0B-4256-DD80-A9C181DAB2D9"
src="https://github.com/user-attachments/assets/cab9c4b8-3fd1-4c6b-b424-474b46042726"
/>

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

Signed-off-by: zxwang <1476209578@qq.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
…ject#5545)" (vllm-project#5611)

This reverts commit fb9fdcd.

### What this PR does / why we need it?
this pr breaks the smoke test because of that leads the error of
aclnnNeScalar:Kernel Run failed. opType: 25, NotEqual
        launch failed for NotEqual, errno:361001
<img width="1149" height="166"
alt="A6C9453D-4F0B-4256-DD80-A9C181DAB2D9"
src="https://github.com/user-attachments/assets/cab9c4b8-3fd1-4c6b-b424-474b46042726"
/>

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

Signed-off-by: zxwang <1476209578@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
…ject#5545)" (vllm-project#5611)

This reverts commit fb9fdcd.

### What this PR does / why we need it?
this pr breaks the smoke test because of that leads the error of
aclnnNeScalar:Kernel Run failed. opType: 25, NotEqual
        launch failed for NotEqual, errno:361001
<img width="1149" height="166"
alt="A6C9453D-4F0B-4256-DD80-A9C181DAB2D9"
src="https://github.com/user-attachments/assets/cab9c4b8-3fd1-4c6b-b424-474b46042726"
/>

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

Signed-off-by: zxwang <1476209578@qq.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
…ject#5545)" (vllm-project#5611)

This reverts commit fb9fdcd.

### What this PR does / why we need it?
this pr breaks the smoke test because of that leads the error of
aclnnNeScalar:Kernel Run failed. opType: 25, NotEqual
        launch failed for NotEqual, errno:361001
<img width="1149" height="166"
alt="A6C9453D-4F0B-4256-DD80-A9C181DAB2D9"
src="https://github.com/user-attachments/assets/cab9c4b8-3fd1-4c6b-b424-474b46042726"
/>

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

Signed-off-by: zxwang <1476209578@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
…ject#5545)" (vllm-project#5611)

This reverts commit fb9fdcd.

### What this PR does / why we need it?
this pr breaks the smoke test because of that leads the error of
aclnnNeScalar:Kernel Run failed. opType: 25, NotEqual
        launch failed for NotEqual, errno:361001
<img width="1149" height="166"
alt="A6C9453D-4F0B-4256-DD80-A9C181DAB2D9"
src="https://github.com/user-attachments/assets/cab9c4b8-3fd1-4c6b-b424-474b46042726"
/>

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

Signed-off-by: zxwang <1476209578@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants