[Platform] Fix CPU binding logic by chenchuw886 · Pull Request #6889 · vllm-project/vllm-ascend

chenchuw886 · 2026-03-01T11:26:25Z

What this PR does / why we need it?

Rework CpuAlloc.handle_no_affinity() to build available NUMA nodes after allowed_cpus filtering, assign NPUs to NUMA nodes via round‑robin, and split CPUs per NPU with disjoint slices for better balance.
Improve bind_memory() robustness by deriving the target NUMA from each NPU’s CPU pool, validating NUMA existence, and skipping binding when data is missing.
bind_memory() now only bind the single NUMA node that corresponds to NPU id, instead of 2 NUMA nodes.
Fix the issue that all NPUs bind to 0th NUMA node when DP16 due to global NPU id is not visible across DP domain.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added/updated unit tests:

test_cpu_binding.py

test_binding_mode_table covers A2 vs A3 binding mode mapping.
test_build_cpu_pools_fallback_to_numa_balanced covers fallback when affinity info is missing.
TestBindingSwitch.test_is_arm_cpu covers ARM/x86/unknown arch detection.
test_bind_cpus_skip_non_arm covers non‑ARM skip path in bind_cpus.

test_worker_v1.py

Updated mocks for enable_cpu_binding default True to align with new config default.

vLLM version: v0.16.0
vLLM main: vllm-project/vllm@15d76f7

github-actions · 2026-03-01T11:26:40Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist · 2026-03-01T11:26:40Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the CPU and memory binding mechanisms to enhance performance and stability, especially in NUMA-balanced environments. The changes ensure more accurate and balanced CPU allocation to NPUs and improve the reliability of memory binding by correctly identifying and utilizing NUMA nodes, thereby preventing potential performance bottlenecks and incorrect resource assignments.

Highlights

Reworked CPU Binding Logic: The handle_no_affinity function has been significantly refactored to first filter available NUMA nodes based on allowed CPUs, then assign NPUs to these nodes using a round-robin strategy, and finally split the CPUs within each NUMA node into disjoint slices for better balance across NPUs.
Improved Memory Binding Robustness: The bind_memory function now robustly determines the target NUMA node from the NPU's CPU pool, validates its existence, and skips binding if necessary data is missing. It also correctly binds memory to only a single NUMA node, preventing issues with incorrect NUMA assignments.
Addressed NPU Binding Issue: Fixed a bug where all NPUs would incorrectly bind to the 0th NUMA node, particularly in DP16 scenarios, by ensuring global NPU IDs are properly utilized for NUMA assignment.
Expanded Unit Test Coverage: New and updated unit tests have been added to cover various aspects of CPU binding, including binding mode mapping, fallback mechanisms, architecture detection, and non-ARM skip paths, ensuring the new logic is thoroughly validated.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

vllm_ascend/cpu_binding.py
- Reworked handle_no_affinity to filter NUMA nodes, assign NPUs round-robin, and split CPUs into disjoint slices.
- Introduced _get_npu_numa_node helper to derive target NUMA from NPU's CPU pool.
- Added validation checks in bind_memory for missing CPU pools or invalid NUMA nodes.
- Modified bind_memory to bind to a single NUMA node instead of two, reducing cross-NUMA traffic.

Activity

Added/updated unit tests in test_cpu_binding.py covering A2 vs A3 binding mode mapping, fallback for missing affinity, ARM/x86/unknown arch detection, and non-ARM skip paths.
Updated mocks in test_worker_v1.py for enable_cpu_binding default to align with new config defaults.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors the NUMA-balanced CPU binding logic. The changes in handle_no_affinity improve how CPUs are distributed among NPUs when affinity information is missing, making it more balanced. The bind_memory function is also made more robust by deriving the NUMA node from the NPU's CPU pool and adding validation checks. My review identifies some leftover debugging code, including a print statement and an unused variable, that should be removed.

gemini-code-assist · 2026-03-01T11:27:23Z

+        # Infer "my_npu" from local rank + visible running_npu_list, assuming local rank is index into running_npu_list.
+        if 0 <= self.rank_id < len(running):
+            my_npu = running[self.rank_id]
+        else:
+            # Fallback: modulo in case rank range is larger than visible list length.
+            my_npu = running[self.rank_id % len(running)]
+
+        print(
+            f"[no_affinity_fine] rank:{self.rank_id} -> my_npu:{my_npu}; "
+            f"running_npu_list:{running}; num_available_nodes:{num_nodes}"
+        )


This block of code appears to be for debugging purposes. The my_npu variable is calculated but only used in the following print statement, and not used anywhere else in the function. print statements should be avoided in library code as they write to standard output and can be noisy. It's better to use the existing logger for such information. Since this seems to be purely for debugging, this entire block can be removed to improve code clarity and remove dead code.

Signed-off-by: chenchuw886 <chenchuw@huawei.com>

### What this PR does / why we need it? - Rework CpuAlloc.handle_no_affinity() to build available NUMA nodes after allowed_cpus filtering, assign NPUs to NUMA nodes via round‑robin, and split CPUs per NPU with disjoint slices for better balance. - Improve bind_memory() robustness by deriving the target NUMA from each NPU’s CPU pool, validating NUMA existence, and skipping binding when data is missing. - bind_memory() now only bind the single NUMA node that corresponds to NPU id, instead of 2 NUMA nodes. - Fix the issue that all NPUs bind to 0th NUMA node when DP16 due to global NPU id is not visible across DP domain. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added/updated unit tests: test_cpu_binding.py 1. test_binding_mode_table covers A2 vs A3 binding mode mapping. 2. test_build_cpu_pools_fallback_to_numa_balanced covers fallback when affinity info is missing. 3. TestBindingSwitch.test_is_arm_cpu covers ARM/x86/unknown arch detection. 4. test_bind_cpus_skip_non_arm covers non‑ARM skip path in bind_cpus. test_worker_v1.py 1. Updated mocks for enable_cpu_binding default True to align with new config default. - vLLM version: v0.16.0 - vLLM main: vllm-project/vllm@15d76f7 Signed-off-by: chenchuw886 <chenchuw@huawei.com> Co-authored-by: chenchuw886 <chenchuw@huawei.com>

### What this PR does / why we need it? - Rework CpuAlloc.handle_no_affinity() to build available NUMA nodes after allowed_cpus filtering, assign NPUs to NUMA nodes via round‑robin, and split CPUs per NPU with disjoint slices for better balance. - Improve bind_memory() robustness by deriving the target NUMA from each NPU’s CPU pool, validating NUMA existence, and skipping binding when data is missing. - bind_memory() now only bind the single NUMA node that corresponds to NPU id, instead of 2 NUMA nodes. - Fix the issue that all NPUs bind to 0th NUMA node when DP16 due to global NPU id is not visible across DP domain. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added/updated unit tests: test_cpu_binding.py 1. test_binding_mode_table covers A2 vs A3 binding mode mapping. 2. test_build_cpu_pools_fallback_to_numa_balanced covers fallback when affinity info is missing. 3. TestBindingSwitch.test_is_arm_cpu covers ARM/x86/unknown arch detection. 4. test_bind_cpus_skip_non_arm covers non‑ARM skip path in bind_cpus. test_worker_v1.py 1. Updated mocks for enable_cpu_binding default True to align with new config default. - vLLM version: v0.16.0 - vLLM main: vllm-project/vllm@15d76f7 Signed-off-by: chenchuw886 <chenchuw@huawei.com> Co-authored-by: chenchuw886 <chenchuw@huawei.com> Signed-off-by: nanxing <1014662416@qq.com>

chenchuw886 requested a review from wangxiyuan as a code owner March 1, 2026 11:26

github-actions Bot added the module:core label Mar 1, 2026

chenchuw886 changed the title ~~Fix NUMA-balanced CPU binding logic~~ [Platform] Fix CPU binding logic Mar 1, 2026

chenchuw886 changed the title ~~[Platform] Fix CPU binding logic~~ [Platform] Fix CPU binding logic Mar 1, 2026

gemini-code-assist Bot reviewed Mar 1, 2026

View reviewed changes

chenchuw886 force-pushed the cpu_bind_0301 branch 2 times, most recently from c0a2eec to b56c763 Compare March 1, 2026 11:36

Fix NUMA-balanced logic

f591563

Signed-off-by: chenchuw886 <chenchuw@huawei.com>

chenchuw886 force-pushed the cpu_bind_0301 branch from 3103d53 to f591563 Compare March 1, 2026 11:53

wangxiyuan approved these changes Mar 1, 2026

View reviewed changes

zzzzwwjj approved these changes Mar 1, 2026

View reviewed changes

zzzzwwjj merged commit a77fe93 into vllm-project:main Mar 1, 2026
27 checks passed

chenchuw886 mentioned this pull request Apr 24, 2026

[Platform] Enable ARM-only CPU binding with NUMA-balanced A3 policy and update docs/tests #6686

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Platform] Fix CPU binding logic #6889

[Platform] Fix CPU binding logic #6889
zzzzwwjj merged 1 commit into
vllm-project:mainfrom
chenchuw886:cpu_bind_0301

chenchuw886 commented Mar 1, 2026 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented Mar 1, 2026

Uh oh!

gemini-code-assist Bot commented Mar 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chenchuw886 commented Mar 1, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions Bot commented Mar 1, 2026

Uh oh!

gemini-code-assist Bot commented Mar 1, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chenchuw886 commented Mar 1, 2026 •

edited by github-actions Bot

Loading