Skip to content

[bugfix] A2 Environment Pooling for Memcache Compatibility#5601

Merged
wangxiyuan merged 9 commits intovllm-project:mainfrom
DreamerLeader:memcache_A2
Jan 13, 2026
Merged

[bugfix] A2 Environment Pooling for Memcache Compatibility#5601
wangxiyuan merged 9 commits intovllm-project:mainfrom
DreamerLeader:memcache_A2

Conversation

@DreamerLeader
Copy link
Copy Markdown
Contributor

@DreamerLeader DreamerLeader commented Jan 5, 2026

What this PR does / why we need it?

When running memcache in the A2 environment, the logic for registering memory needs to be added. Additionally, there is a link establishment conflict between memcache and HCCS during initialization in A2, so the link should be established in advance.

Does this PR introduce any user-facing change?

How was this patch tested?

fangjianwei added 2 commits January 5, 2026 11:49
Signed-off-by: fangjianwei <f30058701@china.huawei.com>
Signed-off-by: fangjianwei <f30058701@china.huawei.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a bugfix for Memcache compatibility on A2 environments by adding device-specific logic. My review identifies two key areas for improvement: refactoring duplicated code to enhance maintainability and strengthening error handling to prevent masking potential bugs. These changes will make the code cleaner and more robust.

Comment on lines +64 to +70
def register_buffer(self, ptrs: list[int], sizes: list[int]):
pass
try:
for ptr, size in zip(ptrs, sizes):
self.store.register_buffer(ptr, size)
except Exception as exc:
logger.info(
"A3 environment does not require cache registration.: %s", exc)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The try...except Exception block is too broad and can mask unexpected errors. The log message suggests this is intended for a specific case on the "A3 environment", but it will catch any exception on any environment and log it as an info message. This could hide critical bugs during buffer registration on non-A3 environments, leading to a broken state without a clear failure signal.

To make the error handling more robust, you should check the device type and only suppress the exception if it's running on an A3 device. For all other devices, any exception should be logged as an error and re-raised.

Additionally, the parameter sizes in this method's signature overrides lengths from the base Backend class. It's good practice to maintain consistency with the base class signature by using lengths.

    def register_buffer(self, ptrs: list[int], lengths: list[int]):
        soc_version = get_ascend_device_type()
        try:
            for ptr, length in zip(ptrs, lengths):
                self.store.register_buffer(ptr, length)
        except Exception as exc:
            if soc_version in {AscendDeviceType.A3}:
                logger.info(
                    "A3 environment does not require cache registration; "
                    "suppressing exception: %s", exc)
            else:
                logger.error("An unexpected error occurred during buffer registration: %s", exc)
                raise

Comment on lines +30 to +51
soc_version = get_ascend_device_type()
if soc_version in {AscendDeviceType.A2}:
import torch
from vllm.distributed import get_world_group
tmp_tensor = torch.zeros(1, device="npu")
output_tensor_list = [
torch.empty_like(tmp_tensor)
for _ in range(torch.distributed.get_world_size())
]
torch.distributed.all_gather(
output_tensor_list,
tmp_tensor,
group=get_world_group().device_group)
self.rank = parallel_config.rank
self.store = DistributedObjectStore()
res = self.store.init(self.rank)
assert res == 0
else:
self.rank = parallel_config.rank
self.store = DistributedObjectStore()
res = self.store.init(self.rank)
assert res == 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is significant code duplication between the if and else blocks. The lines for initializing self.rank and self.store are identical in both branches. This can be refactored to reduce redundancy and improve maintainability, which is crucial for preventing bugs where a fix is applied to one copy but not the other.

Additionally, torch is already imported at the top of the file, so the local import torch inside the if block is redundant and should be removed.

            soc_version = get_ascend_device_type()
            if soc_version in {AscendDeviceType.A2}:
                from vllm.distributed import get_world_group
                tmp_tensor = torch.zeros(1, device="npu")
                output_tensor_list = [
                    torch.empty_like(tmp_tensor)
                    for _ in range(torch.distributed.get_world_size())
                ]
                torch.distributed.all_gather(
                    output_tensor_list,
                    tmp_tensor,
                    group=get_world_group().device_group)
            self.rank = parallel_config.rank
            self.store = DistributedObjectStore()
            res = self.store.init(self.rank)
            assert res == 0

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 5, 2026

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

fangjianwei added 3 commits January 5, 2026 16:06
Signed-off-by: fangjianwei <f30058701@china.huawei.com>
Signed-off-by: fangjianwei <f30058701@china.huawei.com>
Signed-off-by: fangjianwei <f30058701@china.huawei.com>
@LCAIZJ LCAIZJ added the ready-for-test start test by label for PR label Jan 8, 2026
@wangxiyuan wangxiyuan added the ready read for review label Jan 8, 2026
@wangxiyuan wangxiyuan merged commit db7cf9b into vllm-project:main Jan 13, 2026
20 of 23 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Jan 13, 2026
…to eplb_refactor

* 'main' of https://github.com/vllm-project/vllm-ascend:
  [CI] Unblock 4-cards test (vllm-project#5831)
  [Refactor] Provide a framework to accommodate operators for different hardware devices (vllm-project#5735)
  [Refactor] Modify the binding logic to allocate CPU cores for each NPU card (vllm-project#5555)
  [BugFix] Support setting tp=1 for the Eagle draft model to take effect (vllm-project#5519)
  support triton of mrope (vllm-project#5664)
  [bugfix] A2 Environment Pooling for Memcache Compatibility (vllm-project#5601)
  [Doc] Update community contributors and versioning naming to follow vLLM (vllm-project#5820)
  [Refactor] Add comments for Metadata classes in attention module (vllm-project#5789)
  [Bugfix] bugfix for the order of dummy run pad and sync (vllm-project#5777)
  [CI] Move nightly-a2 test to hk (vllm-project#5807)
  [CI] Show disk usage for CI shared volume (vllm-project#5821)
  Bump actions/checkout from 4 to 6 (vllm-project#5795)
  Bump actions/github-script from 7 to 8 (vllm-project#5796)
  [bugfix](cp) align max_context_chunk to cp_virtual_block_size (vllm-project#5767)
  [bugfix]limit graph replay sync (vllm-project#5761)
  [CI]Add Kimi k2 nightly test (vllm-project#5682)
  [Doc] add tls check to pd disaggregation readme  (vllm-project#5638)
  [CI] adpat v0.13.0 change (vllm-project#5793)
guanguan0308 pushed a commit to guanguan0308/vllm-ascend that referenced this pull request Jan 13, 2026
…ect#5601)

### What this PR does / why we need it?
When running memcache in the A2 environment, the logic for registering
memory needs to be added. Additionally, there is a link establishment
conflict between memcache and HCCS during initialization in A2, so the
link should be established in advance.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

---------

Signed-off-by: fangjianwei <f30058701@china.huawei.com>
Co-authored-by: fangjianwei <f30058701@china.huawei.com>
guanguan0308 pushed a commit to guanguan0308/vllm-ascend that referenced this pull request Jan 13, 2026
…ect#5601)

### What this PR does / why we need it?
When running memcache in the A2 environment, the logic for registering
memory needs to be added. Additionally, there is a link establishment
conflict between memcache and HCCS during initialization in A2, so the
link should be established in advance.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

---------

Signed-off-by: fangjianwei <f30058701@china.huawei.com>
Co-authored-by: fangjianwei <f30058701@china.huawei.com>
wangxiyuan pushed a commit that referenced this pull request Jan 14, 2026
…5842)

### What this PR does / why we need it?
When running memcache in the A2 environment, the logic for registering
memory needs to be added. Additionally, there is a link establishment
conflict between memcache and HCCS during initialization in A2, so the
link should be established in advance.

pick-from: #5601

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
aipaes pushed a commit to aipaes/vllm-ascend that referenced this pull request Jan 15, 2026
…ect#5601)

### What this PR does / why we need it?
When running memcache in the A2 environment, the logic for registering
memory needs to be added. Additionally, there is a link establishment
conflict between memcache and HCCS during initialization in A2, so the
link should be established in advance.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

---------

Signed-off-by: fangjianwei <f30058701@china.huawei.com>
Co-authored-by: fangjianwei <f30058701@china.huawei.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
…ect#5601)

### What this PR does / why we need it?
When running memcache in the A2 environment, the logic for registering
memory needs to be added. Additionally, there is a link establishment
conflict between memcache and HCCS during initialization in A2, so the
link should be established in advance.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

---------

Signed-off-by: fangjianwei <f30058701@china.huawei.com>
Co-authored-by: fangjianwei <f30058701@china.huawei.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
…ect#5601)

### What this PR does / why we need it?
When running memcache in the A2 environment, the logic for registering
memory needs to be added. Additionally, there is a link establishment
conflict between memcache and HCCS during initialization in A2, so the
link should be established in advance.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

---------

Signed-off-by: fangjianwei <f30058701@china.huawei.com>
Co-authored-by: fangjianwei <f30058701@china.huawei.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
…ect#5601)

### What this PR does / why we need it?
When running memcache in the A2 environment, the logic for registering
memory needs to be added. Additionally, there is a link establishment
conflict between memcache and HCCS during initialization in A2, so the
link should be established in advance.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

---------

Signed-off-by: fangjianwei <f30058701@china.huawei.com>
Co-authored-by: fangjianwei <f30058701@china.huawei.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
…ect#5601)

### What this PR does / why we need it?
When running memcache in the A2 environment, the logic for registering
memory needs to be added. Additionally, there is a link establishment
conflict between memcache and HCCS during initialization in A2, so the
link should be established in advance.

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

---------

Signed-off-by: fangjianwei <f30058701@china.huawei.com>
Co-authored-by: fangjianwei <f30058701@china.huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants