Skip to content

[Platform] Fix CPU binding logic #6889

Merged
zzzzwwjj merged 1 commit into
vllm-project:mainfrom
chenchuw886:cpu_bind_0301
Mar 1, 2026
Merged

[Platform] Fix CPU binding logic #6889
zzzzwwjj merged 1 commit into
vllm-project:mainfrom
chenchuw886:cpu_bind_0301

Conversation

@chenchuw886
Copy link
Copy Markdown
Contributor

@chenchuw886 chenchuw886 commented Mar 1, 2026

What this PR does / why we need it?

  • Rework CpuAlloc.handle_no_affinity() to build available NUMA nodes after allowed_cpus filtering, assign NPUs to NUMA nodes via round‑robin, and split CPUs per NPU with disjoint slices for better balance.
  • Improve bind_memory() robustness by deriving the target NUMA from each NPU’s CPU pool, validating NUMA existence, and skipping binding when data is missing.
  • bind_memory() now only bind the single NUMA node that corresponds to NPU id, instead of 2 NUMA nodes.
  • Fix the issue that all NPUs bind to 0th NUMA node when DP16 due to global NPU id is not visible across DP domain.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added/updated unit tests:

test_cpu_binding.py

  1. test_binding_mode_table covers A2 vs A3 binding mode mapping.
  2. test_build_cpu_pools_fallback_to_numa_balanced covers fallback when affinity info is missing.
  3. TestBindingSwitch.test_is_arm_cpu covers ARM/x86/unknown arch detection.
  4. test_bind_cpus_skip_non_arm covers non‑ARM skip path in bind_cpus.

test_worker_v1.py

  1. Updated mocks for enable_cpu_binding default True to align with new config default.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 1, 2026

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the CPU and memory binding mechanisms to enhance performance and stability, especially in NUMA-balanced environments. The changes ensure more accurate and balanced CPU allocation to NPUs and improve the reliability of memory binding by correctly identifying and utilizing NUMA nodes, thereby preventing potential performance bottlenecks and incorrect resource assignments.

Highlights

  • Reworked CPU Binding Logic: The handle_no_affinity function has been significantly refactored to first filter available NUMA nodes based on allowed CPUs, then assign NPUs to these nodes using a round-robin strategy, and finally split the CPUs within each NUMA node into disjoint slices for better balance across NPUs.
  • Improved Memory Binding Robustness: The bind_memory function now robustly determines the target NUMA node from the NPU's CPU pool, validates its existence, and skips binding if necessary data is missing. It also correctly binds memory to only a single NUMA node, preventing issues with incorrect NUMA assignments.
  • Addressed NPU Binding Issue: Fixed a bug where all NPUs would incorrectly bind to the 0th NUMA node, particularly in DP16 scenarios, by ensuring global NPU IDs are properly utilized for NUMA assignment.
  • Expanded Unit Test Coverage: New and updated unit tests have been added to cover various aspects of CPU binding, including binding mode mapping, fallback mechanisms, architecture detection, and non-ARM skip paths, ensuring the new logic is thoroughly validated.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • vllm_ascend/cpu_binding.py
    • Reworked handle_no_affinity to filter NUMA nodes, assign NPUs round-robin, and split CPUs into disjoint slices.
    • Introduced _get_npu_numa_node helper to derive target NUMA from NPU's CPU pool.
    • Added validation checks in bind_memory for missing CPU pools or invalid NUMA nodes.
    • Modified bind_memory to bind to a single NUMA node instead of two, reducing cross-NUMA traffic.
Activity
  • Added/updated unit tests in test_cpu_binding.py covering A2 vs A3 binding mode mapping, fallback for missing affinity, ARM/x86/unknown arch detection, and non-ARM skip paths.
  • Updated mocks in test_worker_v1.py for enable_cpu_binding default to align with new config defaults.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@chenchuw886 chenchuw886 changed the title Fix NUMA-balanced CPU binding logic [Platform] Fix CPU binding logic Mar 1, 2026
@chenchuw886 chenchuw886 changed the title [Platform] Fix CPU binding logic [Platform] Fix CPU binding logic Mar 1, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the NUMA-balanced CPU binding logic. The changes in handle_no_affinity improve how CPUs are distributed among NPUs when affinity information is missing, making it more balanced. The bind_memory function is also made more robust by deriving the NUMA node from the NPU's CPU pool and adding validation checks. My review identifies some leftover debugging code, including a print statement and an unused variable, that should be removed.

Comment on lines +232 to +242
# Infer "my_npu" from local rank + visible running_npu_list, assuming local rank is index into running_npu_list.
if 0 <= self.rank_id < len(running):
my_npu = running[self.rank_id]
else:
# Fallback: modulo in case rank range is larger than visible list length.
my_npu = running[self.rank_id % len(running)]

print(
f"[no_affinity_fine] rank:{self.rank_id} -> my_npu:{my_npu}; "
f"running_npu_list:{running}; num_available_nodes:{num_nodes}"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This block of code appears to be for debugging purposes. The my_npu variable is calculated but only used in the following print statement, and not used anywhere else in the function. print statements should be avoided in library code as they write to standard output and can be noisy. It's better to use the existing logger for such information. Since this seems to be purely for debugging, this entire block can be removed to improve code clarity and remove dead code.

@chenchuw886 chenchuw886 force-pushed the cpu_bind_0301 branch 2 times, most recently from c0a2eec to b56c763 Compare March 1, 2026 11:36
Signed-off-by: chenchuw886 <chenchuw@huawei.com>
@zzzzwwjj zzzzwwjj merged commit a77fe93 into vllm-project:main Mar 1, 2026
27 checks passed
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
### What this PR does / why we need it?

- Rework CpuAlloc.handle_no_affinity() to build available NUMA nodes
after allowed_cpus filtering, assign NPUs to NUMA nodes via round‑robin,
and split CPUs per NPU with disjoint slices for better balance.
- Improve bind_memory() robustness by deriving the target NUMA from each
NPU’s CPU pool, validating NUMA existence, and skipping binding when
data is missing.
- bind_memory() now only bind the single NUMA node that corresponds to
NPU id, instead of 2 NUMA nodes.
- Fix the issue that all NPUs bind to 0th NUMA node when DP16 due to
global NPU id is not visible across DP domain.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added/updated unit tests:

test_cpu_binding.py
1.   test_binding_mode_table covers A2 vs A3 binding mode mapping.
2. test_build_cpu_pools_fallback_to_numa_balanced covers fallback when
affinity info is missing.
3. TestBindingSwitch.test_is_arm_cpu covers ARM/x86/unknown arch
detection.
4.   test_bind_cpus_skip_non_arm covers non‑ARM skip path in bind_cpus.

test_worker_v1.py
1. Updated mocks for enable_cpu_binding default True to align with new
config default.

- vLLM version: v0.16.0
- vLLM main:
vllm-project/vllm@15d76f7

Signed-off-by: chenchuw886 <chenchuw@huawei.com>
Co-authored-by: chenchuw886 <chenchuw@huawei.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
### What this PR does / why we need it?

- Rework CpuAlloc.handle_no_affinity() to build available NUMA nodes
after allowed_cpus filtering, assign NPUs to NUMA nodes via round‑robin,
and split CPUs per NPU with disjoint slices for better balance.
- Improve bind_memory() robustness by deriving the target NUMA from each
NPU’s CPU pool, validating NUMA existence, and skipping binding when
data is missing.
- bind_memory() now only bind the single NUMA node that corresponds to
NPU id, instead of 2 NUMA nodes.
- Fix the issue that all NPUs bind to 0th NUMA node when DP16 due to
global NPU id is not visible across DP domain.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added/updated unit tests:

test_cpu_binding.py
1.   test_binding_mode_table covers A2 vs A3 binding mode mapping.
2. test_build_cpu_pools_fallback_to_numa_balanced covers fallback when
affinity info is missing.
3. TestBindingSwitch.test_is_arm_cpu covers ARM/x86/unknown arch
detection.
4.   test_bind_cpus_skip_non_arm covers non‑ARM skip path in bind_cpus.

test_worker_v1.py
1. Updated mocks for enable_cpu_binding default True to align with new
config default.

- vLLM version: v0.16.0
- vLLM main:
vllm-project/vllm@15d76f7

Signed-off-by: chenchuw886 <chenchuw@huawei.com>
Co-authored-by: chenchuw886 <chenchuw@huawei.com>
yangzhe-2026 pushed a commit to yangzhe-2026/vllm-ascend that referenced this pull request May 6, 2026
### What this PR does / why we need it?

- Rework CpuAlloc.handle_no_affinity() to build available NUMA nodes
after allowed_cpus filtering, assign NPUs to NUMA nodes via round‑robin,
and split CPUs per NPU with disjoint slices for better balance.
- Improve bind_memory() robustness by deriving the target NUMA from each
NPU’s CPU pool, validating NUMA existence, and skipping binding when
data is missing.
- bind_memory() now only bind the single NUMA node that corresponds to
NPU id, instead of 2 NUMA nodes.
- Fix the issue that all NPUs bind to 0th NUMA node when DP16 due to
global NPU id is not visible across DP domain.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added/updated unit tests:

test_cpu_binding.py
1.   test_binding_mode_table covers A2 vs A3 binding mode mapping.
2. test_build_cpu_pools_fallback_to_numa_balanced covers fallback when
affinity info is missing.
3. TestBindingSwitch.test_is_arm_cpu covers ARM/x86/unknown arch
detection.
4.   test_bind_cpus_skip_non_arm covers non‑ARM skip path in bind_cpus.

test_worker_v1.py
1. Updated mocks for enable_cpu_binding default True to align with new
config default.

- vLLM version: v0.16.0
- vLLM main:
vllm-project/vllm@15d76f7

Signed-off-by: chenchuw886 <chenchuw@huawei.com>
Co-authored-by: chenchuw886 <chenchuw@huawei.com>
nanxingMy pushed a commit to nanxingMy/vllm-ascend that referenced this pull request May 15, 2026
### What this PR does / why we need it?

- Rework CpuAlloc.handle_no_affinity() to build available NUMA nodes
after allowed_cpus filtering, assign NPUs to NUMA nodes via round‑robin,
and split CPUs per NPU with disjoint slices for better balance.
- Improve bind_memory() robustness by deriving the target NUMA from each
NPU’s CPU pool, validating NUMA existence, and skipping binding when
data is missing.
- bind_memory() now only bind the single NUMA node that corresponds to
NPU id, instead of 2 NUMA nodes.
- Fix the issue that all NPUs bind to 0th NUMA node when DP16 due to
global NPU id is not visible across DP domain.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added/updated unit tests:

test_cpu_binding.py
1.   test_binding_mode_table covers A2 vs A3 binding mode mapping.
2. test_build_cpu_pools_fallback_to_numa_balanced covers fallback when
affinity info is missing.
3. TestBindingSwitch.test_is_arm_cpu covers ARM/x86/unknown arch
detection.
4.   test_bind_cpus_skip_non_arm covers non‑ARM skip path in bind_cpus.

test_worker_v1.py
1. Updated mocks for enable_cpu_binding default True to align with new
config default.

- vLLM version: v0.16.0
- vLLM main:
vllm-project/vllm@15d76f7

Signed-off-by: chenchuw886 <chenchuw@huawei.com>
Co-authored-by: chenchuw886 <chenchuw@huawei.com>
Signed-off-by: nanxing <1014662416@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants