Skip to content

[Refactor] Modify the binding logic to allocate CPU cores for each NPU card#5555

Merged
wangxiyuan merged 1 commit intovllm-project:mainfrom
Rozwel-dx:main
Jan 13, 2026
Merged

[Refactor] Modify the binding logic to allocate CPU cores for each NPU card#5555
wangxiyuan merged 1 commit intovllm-project:mainfrom
Rozwel-dx:main

Conversation

@Rozwel-dx
Copy link
Copy Markdown
Contributor

@Rozwel-dx Rozwel-dx commented Dec 31, 2025

[Refactor] Modify the binding logic to allocate CPU cores for each NPU card

What this PR does / why we need it?

Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Rozwel-dx@c85cc04

Signed-off-by: rowzwel_dx 1392851715@qq.com

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the CPU binding logic to be more sophisticated, allocating CPU cores for each NPU card based on NUMA affinity and isolating specific threads. The implementation has been moved into CpuAlloc and DeviceInfo classes, which is a good structural improvement.

My review has identified a few critical issues that need to be addressed:

  1. A TypeError will occur when calling the refactored bind_cpus function from worker.py due to a mismatched function signature.
  2. The CPU binding may be non-deterministic and incorrect because a list of Process IDs is not sorted before being used for mapping.

Additionally, there are opportunities for improving correctness and performance:

  • A potential TypeError in get_running_npus if a chip logic ID is not found.
  • Inefficient repeated calls to ps -Te when binding threads.

Please see the detailed comments for suggestions on how to fix these issues.

Comment thread vllm_ascend/cpu_binding.py Outdated
Comment thread vllm_ascend/worker/worker.py Outdated
Comment thread vllm_ascend/cpu_binding.py Outdated
Comment thread vllm_ascend/cpu_binding.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@Rozwel-dx Rozwel-dx force-pushed the main branch 2 times, most recently from 496d07b to d69f7a9 Compare January 5, 2026 02:40
Comment thread vllm_ascend/worker/worker.py Outdated
@Rozwel-dx Rozwel-dx force-pushed the main branch 6 times, most recently from 62fbf5c to 9788812 Compare January 8, 2026 06:03
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 8, 2026

This pull request has conflicts, please resolve those before we can evaluate the pull request.

…d on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference.

Signed-off-by: Rozwel-dx <1392851715@qq.com>

Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference.

Signed-off-by: Rozwel-dx <1392851715@qq.com>

Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference.

Signed-off-by: Rozwel-dx <1392851715@qq.com>

Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference.

Signed-off-by: Rozwel-dx <1392851715@qq.com>

Add cpu_binding UT

Signed-off-by: Rozwel-dx <1392851715@qq.com>

Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference.

Signed-off-by: Rozwel-dx <1392851715@qq.com>

Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference.

Signed-off-by: Rozwel-dx <1392851715@qq.com>

Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference.

Signed-off-by: Rozwel-dx <1392851715@qq.com>

Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference.

Signed-off-by: Rozwel-dx <1392851715@qq.com>

Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference.

Signed-off-by: Rozwel-dx <1392851715@qq.com>

Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference.

Signed-off-by: Rozwel-dx <1392851715@qq.com>

Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference.

Signed-off-by: Rozwel-dx <1392851715@qq.com>

Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference.

Signed-off-by: Rozwel-dx <1392851715@qq.com>

Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference.

Signed-off-by: Rozwel-dx <1392851715@qq.com>
@wangxiyuan wangxiyuan merged commit 8d57128 into vllm-project:main Jan 13, 2026
16 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Jan 13, 2026
…to eplb_refactor

* 'main' of https://github.com/vllm-project/vllm-ascend:
  [CI] Unblock 4-cards test (vllm-project#5831)
  [Refactor] Provide a framework to accommodate operators for different hardware devices (vllm-project#5735)
  [Refactor] Modify the binding logic to allocate CPU cores for each NPU card (vllm-project#5555)
  [BugFix] Support setting tp=1 for the Eagle draft model to take effect (vllm-project#5519)
  support triton of mrope (vllm-project#5664)
  [bugfix] A2 Environment Pooling for Memcache Compatibility (vllm-project#5601)
  [Doc] Update community contributors and versioning naming to follow vLLM (vllm-project#5820)
  [Refactor] Add comments for Metadata classes in attention module (vllm-project#5789)
  [Bugfix] bugfix for the order of dummy run pad and sync (vllm-project#5777)
  [CI] Move nightly-a2 test to hk (vllm-project#5807)
  [CI] Show disk usage for CI shared volume (vllm-project#5821)
  Bump actions/checkout from 4 to 6 (vllm-project#5795)
  Bump actions/github-script from 7 to 8 (vllm-project#5796)
  [bugfix](cp) align max_context_chunk to cp_virtual_block_size (vllm-project#5767)
  [bugfix]limit graph replay sync (vllm-project#5761)
  [CI]Add Kimi k2 nightly test (vllm-project#5682)
  [Doc] add tls check to pd disaggregation readme  (vllm-project#5638)
  [CI] adpat v0.13.0 change (vllm-project#5793)
guanguan0308 pushed a commit to guanguan0308/vllm-ascend that referenced this pull request Jan 13, 2026
…U card (vllm-project#5555)

[Refactor] Modify the binding logic to allocate CPU cores for each NPU
card

### What this PR does / why we need it?
Modify the binding logic to allocate CPU cores for each NPU card based
on NUMA affinity, while isolating acl_thread/release_thread and other
processes to prevent mutual interference.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Rozwel-dx@c85cc04

Signed-off-by: rowzwel_dx <1392851715@qq.com>
- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

Signed-off-by: Rozwel-dx <1392851715@qq.com>
guanguan0308 pushed a commit to guanguan0308/vllm-ascend that referenced this pull request Jan 13, 2026
…U card (vllm-project#5555)

[Refactor] Modify the binding logic to allocate CPU cores for each NPU
card

### What this PR does / why we need it?
Modify the binding logic to allocate CPU cores for each NPU card based
on NUMA affinity, while isolating acl_thread/release_thread and other
processes to prevent mutual interference.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Rozwel-dx@c85cc04

Signed-off-by: rowzwel_dx <1392851715@qq.com>
- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

Signed-off-by: Rozwel-dx <1392851715@qq.com>
aipaes pushed a commit to aipaes/vllm-ascend that referenced this pull request Jan 15, 2026
…U card (vllm-project#5555)

[Refactor] Modify the binding logic to allocate CPU cores for each NPU
card

### What this PR does / why we need it?
Modify the binding logic to allocate CPU cores for each NPU card based
on NUMA affinity, while isolating acl_thread/release_thread and other
processes to prevent mutual interference.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Rozwel-dx@c85cc04

Signed-off-by: rowzwel_dx <1392851715@qq.com>
- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

Signed-off-by: Rozwel-dx <1392851715@qq.com>
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
…U card (vllm-project#5555)

[Refactor] Modify the binding logic to allocate CPU cores for each NPU
card

### What this PR does / why we need it?
Modify the binding logic to allocate CPU cores for each NPU card based
on NUMA affinity, while isolating acl_thread/release_thread and other
processes to prevent mutual interference.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Rozwel-dx@c85cc04

Signed-off-by: rowzwel_dx <1392851715@qq.com>
- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

Signed-off-by: Rozwel-dx <1392851715@qq.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
…U card (vllm-project#5555)

[Refactor] Modify the binding logic to allocate CPU cores for each NPU
card

### What this PR does / why we need it?
Modify the binding logic to allocate CPU cores for each NPU card based
on NUMA affinity, while isolating acl_thread/release_thread and other
processes to prevent mutual interference.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Rozwel-dx@c85cc04

Signed-off-by: rowzwel_dx <1392851715@qq.com>
- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

Signed-off-by: Rozwel-dx <1392851715@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
…U card (vllm-project#5555)

[Refactor] Modify the binding logic to allocate CPU cores for each NPU
card

### What this PR does / why we need it?
Modify the binding logic to allocate CPU cores for each NPU card based
on NUMA affinity, while isolating acl_thread/release_thread and other
processes to prevent mutual interference.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Rozwel-dx@c85cc04

Signed-off-by: rowzwel_dx <1392851715@qq.com>
- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

Signed-off-by: Rozwel-dx <1392851715@qq.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
…U card (vllm-project#5555)

[Refactor] Modify the binding logic to allocate CPU cores for each NPU
card

### What this PR does / why we need it?
Modify the binding logic to allocate CPU cores for each NPU card based
on NUMA affinity, while isolating acl_thread/release_thread and other
processes to prevent mutual interference.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Rozwel-dx@c85cc04

Signed-off-by: rowzwel_dx <1392851715@qq.com>
- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

Signed-off-by: Rozwel-dx <1392851715@qq.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
…U card (vllm-project#5555)

[Refactor] Modify the binding logic to allocate CPU cores for each NPU
card

### What this PR does / why we need it?
Modify the binding logic to allocate CPU cores for each NPU card based
on NUMA affinity, while isolating acl_thread/release_thread and other
processes to prevent mutual interference.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Rozwel-dx@c85cc04

Signed-off-by: rowzwel_dx <1392851715@qq.com>
- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@7157596

Signed-off-by: Rozwel-dx <1392851715@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:core ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants