[Platform] Fix CPU binding logic #6889
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refines the CPU and memory binding mechanisms to enhance performance and stability, especially in NUMA-balanced environments. The changes ensure more accurate and balanced CPU allocation to NPUs and improve the reliability of memory binding by correctly identifying and utilizing NUMA nodes, thereby preventing potential performance bottlenecks and incorrect resource assignments. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request refactors the NUMA-balanced CPU binding logic. The changes in handle_no_affinity improve how CPUs are distributed among NPUs when affinity information is missing, making it more balanced. The bind_memory function is also made more robust by deriving the NUMA node from the NPU's CPU pool and adding validation checks. My review identifies some leftover debugging code, including a print statement and an unused variable, that should be removed.
| # Infer "my_npu" from local rank + visible running_npu_list, assuming local rank is index into running_npu_list. | ||
| if 0 <= self.rank_id < len(running): | ||
| my_npu = running[self.rank_id] | ||
| else: | ||
| # Fallback: modulo in case rank range is larger than visible list length. | ||
| my_npu = running[self.rank_id % len(running)] | ||
|
|
||
| print( | ||
| f"[no_affinity_fine] rank:{self.rank_id} -> my_npu:{my_npu}; " | ||
| f"running_npu_list:{running}; num_available_nodes:{num_nodes}" | ||
| ) |
There was a problem hiding this comment.
This block of code appears to be for debugging purposes. The my_npu variable is calculated but only used in the following print statement, and not used anywhere else in the function. print statements should be avoided in library code as they write to standard output and can be noisy. It's better to use the existing logger for such information. Since this seems to be purely for debugging, this entire block can be removed to improve code clarity and remove dead code.
c0a2eec to
b56c763
Compare
Signed-off-by: chenchuw886 <chenchuw@huawei.com>
3103d53 to
f591563
Compare
### What this PR does / why we need it? - Rework CpuAlloc.handle_no_affinity() to build available NUMA nodes after allowed_cpus filtering, assign NPUs to NUMA nodes via round‑robin, and split CPUs per NPU with disjoint slices for better balance. - Improve bind_memory() robustness by deriving the target NUMA from each NPU’s CPU pool, validating NUMA existence, and skipping binding when data is missing. - bind_memory() now only bind the single NUMA node that corresponds to NPU id, instead of 2 NUMA nodes. - Fix the issue that all NPUs bind to 0th NUMA node when DP16 due to global NPU id is not visible across DP domain. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added/updated unit tests: test_cpu_binding.py 1. test_binding_mode_table covers A2 vs A3 binding mode mapping. 2. test_build_cpu_pools_fallback_to_numa_balanced covers fallback when affinity info is missing. 3. TestBindingSwitch.test_is_arm_cpu covers ARM/x86/unknown arch detection. 4. test_bind_cpus_skip_non_arm covers non‑ARM skip path in bind_cpus. test_worker_v1.py 1. Updated mocks for enable_cpu_binding default True to align with new config default. - vLLM version: v0.16.0 - vLLM main: vllm-project/vllm@15d76f7 Signed-off-by: chenchuw886 <chenchuw@huawei.com> Co-authored-by: chenchuw886 <chenchuw@huawei.com>
### What this PR does / why we need it? - Rework CpuAlloc.handle_no_affinity() to build available NUMA nodes after allowed_cpus filtering, assign NPUs to NUMA nodes via round‑robin, and split CPUs per NPU with disjoint slices for better balance. - Improve bind_memory() robustness by deriving the target NUMA from each NPU’s CPU pool, validating NUMA existence, and skipping binding when data is missing. - bind_memory() now only bind the single NUMA node that corresponds to NPU id, instead of 2 NUMA nodes. - Fix the issue that all NPUs bind to 0th NUMA node when DP16 due to global NPU id is not visible across DP domain. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added/updated unit tests: test_cpu_binding.py 1. test_binding_mode_table covers A2 vs A3 binding mode mapping. 2. test_build_cpu_pools_fallback_to_numa_balanced covers fallback when affinity info is missing. 3. TestBindingSwitch.test_is_arm_cpu covers ARM/x86/unknown arch detection. 4. test_bind_cpus_skip_non_arm covers non‑ARM skip path in bind_cpus. test_worker_v1.py 1. Updated mocks for enable_cpu_binding default True to align with new config default. - vLLM version: v0.16.0 - vLLM main: vllm-project/vllm@15d76f7 Signed-off-by: chenchuw886 <chenchuw@huawei.com> Co-authored-by: chenchuw886 <chenchuw@huawei.com>
### What this PR does / why we need it? - Rework CpuAlloc.handle_no_affinity() to build available NUMA nodes after allowed_cpus filtering, assign NPUs to NUMA nodes via round‑robin, and split CPUs per NPU with disjoint slices for better balance. - Improve bind_memory() robustness by deriving the target NUMA from each NPU’s CPU pool, validating NUMA existence, and skipping binding when data is missing. - bind_memory() now only bind the single NUMA node that corresponds to NPU id, instead of 2 NUMA nodes. - Fix the issue that all NPUs bind to 0th NUMA node when DP16 due to global NPU id is not visible across DP domain. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added/updated unit tests: test_cpu_binding.py 1. test_binding_mode_table covers A2 vs A3 binding mode mapping. 2. test_build_cpu_pools_fallback_to_numa_balanced covers fallback when affinity info is missing. 3. TestBindingSwitch.test_is_arm_cpu covers ARM/x86/unknown arch detection. 4. test_bind_cpus_skip_non_arm covers non‑ARM skip path in bind_cpus. test_worker_v1.py 1. Updated mocks for enable_cpu_binding default True to align with new config default. - vLLM version: v0.16.0 - vLLM main: vllm-project/vllm@15d76f7 Signed-off-by: chenchuw886 <chenchuw@huawei.com> Co-authored-by: chenchuw886 <chenchuw@huawei.com>
### What this PR does / why we need it? - Rework CpuAlloc.handle_no_affinity() to build available NUMA nodes after allowed_cpus filtering, assign NPUs to NUMA nodes via round‑robin, and split CPUs per NPU with disjoint slices for better balance. - Improve bind_memory() robustness by deriving the target NUMA from each NPU’s CPU pool, validating NUMA existence, and skipping binding when data is missing. - bind_memory() now only bind the single NUMA node that corresponds to NPU id, instead of 2 NUMA nodes. - Fix the issue that all NPUs bind to 0th NUMA node when DP16 due to global NPU id is not visible across DP domain. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added/updated unit tests: test_cpu_binding.py 1. test_binding_mode_table covers A2 vs A3 binding mode mapping. 2. test_build_cpu_pools_fallback_to_numa_balanced covers fallback when affinity info is missing. 3. TestBindingSwitch.test_is_arm_cpu covers ARM/x86/unknown arch detection. 4. test_bind_cpus_skip_non_arm covers non‑ARM skip path in bind_cpus. test_worker_v1.py 1. Updated mocks for enable_cpu_binding default True to align with new config default. - vLLM version: v0.16.0 - vLLM main: vllm-project/vllm@15d76f7 Signed-off-by: chenchuw886 <chenchuw@huawei.com> Co-authored-by: chenchuw886 <chenchuw@huawei.com> Signed-off-by: nanxing <1014662416@qq.com>
What this PR does / why we need it?
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added/updated unit tests:
test_cpu_binding.py
test_worker_v1.py