[Refactor] Modify the binding logic to allocate CPU cores for each NPU card#5555
[Refactor] Modify the binding logic to allocate CPU cores for each NPU card#5555wangxiyuan merged 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the CPU binding logic to be more sophisticated, allocating CPU cores for each NPU card based on NUMA affinity and isolating specific threads. The implementation has been moved into CpuAlloc and DeviceInfo classes, which is a good structural improvement.
My review has identified a few critical issues that need to be addressed:
- A
TypeErrorwill occur when calling the refactoredbind_cpusfunction fromworker.pydue to a mismatched function signature. - The CPU binding may be non-deterministic and incorrect because a list of Process IDs is not sorted before being used for mapping.
Additionally, there are opportunities for improving correctness and performance:
- A potential
TypeErroringet_running_npusif a chip logic ID is not found. - Inefficient repeated calls to
ps -Tewhen binding threads.
Please see the detailed comments for suggestions on how to fix these issues.
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
496d07b to
d69f7a9
Compare
62fbf5c to
9788812
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
29e3c7b to
9293082
Compare
…d on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference. Signed-off-by: Rozwel-dx <1392851715@qq.com> Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference. Signed-off-by: Rozwel-dx <1392851715@qq.com> Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference. Signed-off-by: Rozwel-dx <1392851715@qq.com> Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference. Signed-off-by: Rozwel-dx <1392851715@qq.com> Add cpu_binding UT Signed-off-by: Rozwel-dx <1392851715@qq.com> Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference. Signed-off-by: Rozwel-dx <1392851715@qq.com> Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference. Signed-off-by: Rozwel-dx <1392851715@qq.com> Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference. Signed-off-by: Rozwel-dx <1392851715@qq.com> Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference. Signed-off-by: Rozwel-dx <1392851715@qq.com> Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference. Signed-off-by: Rozwel-dx <1392851715@qq.com> Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference. Signed-off-by: Rozwel-dx <1392851715@qq.com> Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference. Signed-off-by: Rozwel-dx <1392851715@qq.com> Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference. Signed-off-by: Rozwel-dx <1392851715@qq.com> Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference. Signed-off-by: Rozwel-dx <1392851715@qq.com>
…to eplb_refactor * 'main' of https://github.com/vllm-project/vllm-ascend: [CI] Unblock 4-cards test (vllm-project#5831) [Refactor] Provide a framework to accommodate operators for different hardware devices (vllm-project#5735) [Refactor] Modify the binding logic to allocate CPU cores for each NPU card (vllm-project#5555) [BugFix] Support setting tp=1 for the Eagle draft model to take effect (vllm-project#5519) support triton of mrope (vllm-project#5664) [bugfix] A2 Environment Pooling for Memcache Compatibility (vllm-project#5601) [Doc] Update community contributors and versioning naming to follow vLLM (vllm-project#5820) [Refactor] Add comments for Metadata classes in attention module (vllm-project#5789) [Bugfix] bugfix for the order of dummy run pad and sync (vllm-project#5777) [CI] Move nightly-a2 test to hk (vllm-project#5807) [CI] Show disk usage for CI shared volume (vllm-project#5821) Bump actions/checkout from 4 to 6 (vllm-project#5795) Bump actions/github-script from 7 to 8 (vllm-project#5796) [bugfix](cp) align max_context_chunk to cp_virtual_block_size (vllm-project#5767) [bugfix]limit graph replay sync (vllm-project#5761) [CI]Add Kimi k2 nightly test (vllm-project#5682) [Doc] add tls check to pd disaggregation readme (vllm-project#5638) [CI] adpat v0.13.0 change (vllm-project#5793)
…U card (vllm-project#5555) [Refactor] Modify the binding logic to allocate CPU cores for each NPU card ### What this PR does / why we need it? Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Rozwel-dx@c85cc04 Signed-off-by: rowzwel_dx <1392851715@qq.com> - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: Rozwel-dx <1392851715@qq.com>
…U card (vllm-project#5555) [Refactor] Modify the binding logic to allocate CPU cores for each NPU card ### What this PR does / why we need it? Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Rozwel-dx@c85cc04 Signed-off-by: rowzwel_dx <1392851715@qq.com> - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: Rozwel-dx <1392851715@qq.com>
…U card (vllm-project#5555) [Refactor] Modify the binding logic to allocate CPU cores for each NPU card ### What this PR does / why we need it? Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Rozwel-dx@c85cc04 Signed-off-by: rowzwel_dx <1392851715@qq.com> - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: Rozwel-dx <1392851715@qq.com>
…U card (vllm-project#5555) [Refactor] Modify the binding logic to allocate CPU cores for each NPU card ### What this PR does / why we need it? Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Rozwel-dx@c85cc04 Signed-off-by: rowzwel_dx <1392851715@qq.com> - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: Rozwel-dx <1392851715@qq.com>
…U card (vllm-project#5555) [Refactor] Modify the binding logic to allocate CPU cores for each NPU card ### What this PR does / why we need it? Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Rozwel-dx@c85cc04 Signed-off-by: rowzwel_dx <1392851715@qq.com> - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: Rozwel-dx <1392851715@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…U card (vllm-project#5555) [Refactor] Modify the binding logic to allocate CPU cores for each NPU card ### What this PR does / why we need it? Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Rozwel-dx@c85cc04 Signed-off-by: rowzwel_dx <1392851715@qq.com> - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: Rozwel-dx <1392851715@qq.com>
…U card (vllm-project#5555) [Refactor] Modify the binding logic to allocate CPU cores for each NPU card ### What this PR does / why we need it? Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Rozwel-dx@c85cc04 Signed-off-by: rowzwel_dx <1392851715@qq.com> - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: Rozwel-dx <1392851715@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…U card (vllm-project#5555) [Refactor] Modify the binding logic to allocate CPU cores for each NPU card ### What this PR does / why we need it? Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Rozwel-dx@c85cc04 Signed-off-by: rowzwel_dx <1392851715@qq.com> - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@7157596 Signed-off-by: Rozwel-dx <1392851715@qq.com>
[Refactor] Modify the binding logic to allocate CPU cores for each NPU card
What this PR does / why we need it?
Modify the binding logic to allocate CPU cores for each NPU card based on NUMA affinity, while isolating acl_thread/release_thread and other processes to prevent mutual interference.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Rozwel-dx@c85cc04
Signed-off-by: rowzwel_dx 1392851715@qq.com