Skip to content

[CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs#6945

Merged
wangxiyuan merged 4 commits intovllm-project:mainfrom
chenchuw886:cpu_bind_global_slice
Mar 3, 2026
Merged

[CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs#6945
wangxiyuan merged 4 commits intovllm-project:mainfrom
chenchuw886:cpu_bind_global_slice

Conversation

@chenchuw886
Copy link
Copy Markdown
Contributor

@chenchuw886 chenchuw886 commented Mar 3, 2026

What this PR does / why we need it?

This PR introduces global CPU slicing for Ascend NPUs to ensure non-overlapping CPU partitions, addresses IRQ binding logical errors on A3, and enhances the logic for determining total NPUs in CPU allocation. These changes are necessary to optimize CPU resource management and improve system stability.

  • Global CPU Slicing: Introduced a global CPU slicing mechanism for Ascend NPUs to ensure non-overlapping CPU partitions across multiple processes or data parallel groups, preventing resource contention.
  • Improved IRQ Binding for A3 Devices: Refined the IRQ binding logic specifically for Ascend A3 devices, correctly mapping logical NPU IDs to physical card and chip IDs for accurate npu-smi queries and preventing multi-process overwrite of IRQ settings.
  • Enhanced NPU Count Determination: Improved the logic for determining the total number of logical NPUs, prioritizing NPU mapping information to ensure more accurate CPU allocation.
  • Minimum CPU Requirement: Established a minimum requirement of 5 CPUs per NPU for binding, reserving specific cores for IRQ, main, ACL, and release operations to ensure stable operation.

Related RFC

#6966

Does this PR introduce any user-facing change?

No user-facing changes are introduced.

How was this patch tested?

CI passed with new added/existing tests.

…non-overlapping CPU partitions

Signed-off-by: c00818886 <chenchuwei@huawei.com>
…cess loop in IRQ binding, clarify min required CPUs limitations

Signed-off-by: c00818886 <chenchuwei@huawei.com>
…ion; add descriptions

Signed-off-by: c00818886 <chenchuwei@huawei.com>
@chenchuw886 chenchuw886 requested a review from wangxiyuan as a code owner March 3, 2026 04:26
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly overhauls the CPU binding mechanism for Ascend NPUs. It introduces a global CPU slicing strategy to prevent CPU resource overlap when multiple processes share the same CPU set, ensuring deterministic and isolated CPU partitions for each logical NPU. Additionally, it corrects logical errors in IRQ binding for A3 devices and improves the overall robustness of NPU CPU allocation by better determining the total number of logical NPUs. These changes aim to optimize CPU resource management and enhance system stability without introducing user-facing modifications.

Highlights

  • Global CPU Slicing: Introduced a global CPU slicing mechanism for Ascend NPUs to ensure non-overlapping CPU partitions across multiple processes or data parallel groups, preventing resource contention.
  • Improved IRQ Binding for A3 Devices: Refined the IRQ binding logic specifically for Ascend A3 devices, correctly mapping logical NPU IDs to physical card and chip IDs for accurate npu-smi queries and preventing multi-process overwrite of IRQ settings.
  • Enhanced NPU Count Determination: Improved the logic for determining the total number of logical NPUs, prioritizing NPU mapping information to ensure more accurate CPU allocation.
  • Minimum CPU Requirement: Established a minimum requirement of 5 CPUs per NPU for binding, reserving specific cores for IRQ, main, ACL, and release operations to ensure stable operation.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • vllm_ascend/cpu_binding.py
    • Added MIN_CPUS_PER_NPU constant (5 CPUs) and defined TOPO_AFFINITY_MODE, GLOBAL_SLICE_MODE, and DEVICE_BINDING_MODE mapping device types to binding modes.
    • Introduced all_logic_npus and total_logic_npus attributes to _DeviceInfo to track all logical NPU IDs.
    • Implemented get_all_logic_npus to parse and collect all logical NPU IDs from npu_map_info.
    • Replaced the handle_no_affinity method with build_global_slice_cpu_pool, which implements the new global CPU slicing logic.
    • Updated the _binding_mode class method to use the new DEVICE_BINDING_MODE dictionary.
    • Modified build_cpu_pools to conditionally call build_global_slice_cpu_pool based on the binding mode or as a fallback when NPU affinity info is missing.
    • Adjusted the allocate method to enforce the MIN_CPUS_PER_NPU requirement and update the error message.
    • Revised bind_npu_irq to target IRQ binding only for the current rank's NPU, preventing multi-process overwrites, and to correctly derive PCI addresses for A3 devices using card_id and chip_id.
Activity
  • CI passed with newly added and existing tests, ensuring the stability and correctness of the changes.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 3, 2026

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@chenchuw886 chenchuw886 changed the title ### Implement global CPU slicing and improve IRQ binding for Ascend NPUs [CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs Mar 3, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant improvements to CPU resource management for Ascend NPUs. The key change is the implementation of a 'global slice' CPU binding mode, which ensures non-overlapping CPU partitions across multiple processes, preventing resource contention. The PR also includes a critical fix for IRQ binding on A3 devices and enhances the logic for determining the total number of NPUs for allocation. These changes are well-implemented and will improve system stability.

However, there are two main concerns:

  1. Testing: The unit tests in tests/ut/device_allocator/test_cpu_binding.py do not appear to be updated to reflect the substantial changes in cpu_binding.py. Outdated tests for renamed functions and old logic remain. It is critical to add and update tests to cover the new global slicing mechanism and IRQ binding logic to ensure correctness and prevent future regressions. This is a high-severity issue that impacts the maintainability and reliability of the code.
  2. PR Format: The pull request title and description do not adhere to the format specified in the repository's style guide. For better consistency, please update them.

Suggested PR Title:

[Ops][Feature] Implement global CPU slicing and improve IRQ binding for Ascend NPUs

Suggested PR Summary:

### What this PR does / why we need it?
This PR introduces a global CPU slicing mechanism for Ascend NPUs to ensure non-overlapping CPU partitions in multi-process environments. It also corrects a logical error in IRQ binding on A3 devices and improves the logic for determining the total number of NPUs for CPU allocation. These changes are essential for optimizing CPU resource management, improving system stability, and preventing resource contention.

Fixes #<issue_number>

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed. Unit tests for `cpu_binding.py` should be updated to cover the new global slicing logic and IRQ binding fixes to validate the new functionality and prevent regressions.

…d improve logic for total logic NPUs

Signed-off-by: c00818886 <chenchuwei@huawei.com>
@chenchuw886 chenchuw886 force-pushed the cpu_bind_global_slice branch from 806c63e to 9f2a65f Compare March 3, 2026 05:33
@linfeng-yuan linfeng-yuan added ready read for review ready-for-test start test by label for PR labels Mar 3, 2026
@wangxiyuan wangxiyuan merged commit b771ca9 into vllm-project:main Mar 3, 2026
57 of 60 checks passed
Copy link
Copy Markdown

@wqh17101 wqh17101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider the scenario where the operating system language is Chinese

f.write(self.cpu_to_mask(cq_cpu))

for line in info.splitlines():
if "PCIe Bus Info" in line:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the OS is set to Chinese, these strings probably won't match.

845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Mar 5, 2026
…to qwen3next_graph

* 'main' of https://github.com/vllm-project/vllm-ascend: (40 commits)
  [Feature] Add docs of batch invariance and make some extra operators patch (vllm-project#6910)
  [bugfix]Qwen2.5VL accurate question (vllm-project#6975)
  [CI] Add DeepSeek-V3.2 large EP nightly ci (vllm-project#6378)
  [Ops][BugFix] Fix RoPE shape mismatch for mtp models with flashcomm v1 enabled (vllm-project#6939)
  [bugfix]fix file not found error in nightly of single-node (vllm-project#6976)
  [Bugfix] Fix the acceptance rates dorp issue when applying eagle3 to QuaRot model (vllm-project#6914)
  [CI] Enable auto upgrade e2e estimated time for auto-partition suites (vllm-project#6840)
  [Doc][Misc] Fix msprobe_guide.md documentation issues (vllm-project#6965)
  [Nightly][Refactor]Migrate nightly single-node model tests from `.py` to `.yaml` (vllm-project#6503)
  [BugFix] Improve GDN layer detection for multimodal models (vllm-project#6941)
  [feat]ds3.2 pcp support mtp and chunkprefill (vllm-project#6917)
  [CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs (vllm-project#6945)
  [Triton] Centralize Ascend extension op dispatch in triton_utils (vllm-project#6937)
  [csrc][bugfix] Add compile-time Ascend950/910_95 compatibility for custom ops between CANN8.5 and 9.0 (vllm-project#6936)
  [300I][Bugfix] fix unquant model weight nd2nz error (vllm-project#6851)
  [doc] fix supported_models (vllm-project#6930)
  [CI] nightly test timeout (vllm-project#6912)
  [CI] Upgrade CANN to 8.5.1 (vllm-project#6897)
  [Model]Add Qwen3-Omni quantization Ascend NPU adaptation and optimization (vllm-project#6828)
  [P/D][v0.16.0]Adapt to RecomputeScheduler in vLLM 0.16.0 (vllm-project#6898)
  ...
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
…r Ascend NPUs (vllm-project#6945)

### What this PR does / why we need it?

This PR introduces global CPU slicing for Ascend NPUs to ensure
non-overlapping CPU partitions, addresses IRQ binding logical errors on
A3, and enhances the logic for determining total NPUs in CPU allocation.
These changes are necessary to optimize CPU resource management and
improve system stability.

- **Global CPU Slicing**: Introduced a global CPU slicing mechanism for
Ascend NPUs to ensure non-overlapping CPU partitions across multiple
processes or data parallel groups, preventing resource contention.
- **Improved IRQ Binding for A3 Devices**: Refined the IRQ binding logic
specifically for Ascend A3 devices, correctly mapping logical NPU IDs to
physical card and chip IDs for accurate npu-smi queries and preventing
multi-process overwrite of IRQ settings.
- **Enhanced NPU Count Determination**: Improved the logic for
determining the total number of logical NPUs, prioritizing NPU mapping
information to ensure more accurate CPU allocation.
- **Minimum CPU Requirement**: Established a minimum requirement of 5
CPUs per NPU for binding, reserving specific cores for IRQ, main, ACL,
and release operations to ensure stable operation.

### Does this PR introduce _any_ user-facing change?

No user-facing changes are introduced.

### How was this patch tested?

CI passed with new added/existing tests.

- vLLM version: v0.16.0
- vLLM main:
vllm-project/vllm@15d76f7

---------

Signed-off-by: c00818886 <chenchuwei@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:core ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants