[CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs by chenchuw886 · Pull Request #6945 · vllm-project/vllm-ascend

chenchuw886 · 2026-03-03T04:26:21Z

What this PR does / why we need it?

This PR introduces global CPU slicing for Ascend NPUs to ensure non-overlapping CPU partitions, addresses IRQ binding logical errors on A3, and enhances the logic for determining total NPUs in CPU allocation. These changes are necessary to optimize CPU resource management and improve system stability.

Global CPU Slicing: Introduced a global CPU slicing mechanism for Ascend NPUs to ensure non-overlapping CPU partitions across multiple processes or data parallel groups, preventing resource contention.
Improved IRQ Binding for A3 Devices: Refined the IRQ binding logic specifically for Ascend A3 devices, correctly mapping logical NPU IDs to physical card and chip IDs for accurate npu-smi queries and preventing multi-process overwrite of IRQ settings.
Enhanced NPU Count Determination: Improved the logic for determining the total number of logical NPUs, prioritizing NPU mapping information to ensure more accurate CPU allocation.
Minimum CPU Requirement: Established a minimum requirement of 5 CPUs per NPU for binding, reserving specific cores for IRQ, main, ACL, and release operations to ensure stable operation.

Related RFC

#6966

Does this PR introduce any user-facing change?

No user-facing changes are introduced.

How was this patch tested?

CI passed with new added/existing tests.

vLLM version: v0.16.0
vLLM main: vllm-project/vllm@15d76f7

…non-overlapping CPU partitions Signed-off-by: c00818886 <chenchuwei@huawei.com>

…cess loop in IRQ binding, clarify min required CPUs limitations Signed-off-by: c00818886 <chenchuwei@huawei.com>

…ion; add descriptions Signed-off-by: c00818886 <chenchuwei@huawei.com>

gemini-code-assist · 2026-03-03T04:26:34Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly overhauls the CPU binding mechanism for Ascend NPUs. It introduces a global CPU slicing strategy to prevent CPU resource overlap when multiple processes share the same CPU set, ensuring deterministic and isolated CPU partitions for each logical NPU. Additionally, it corrects logical errors in IRQ binding for A3 devices and improves the overall robustness of NPU CPU allocation by better determining the total number of logical NPUs. These changes aim to optimize CPU resource management and enhance system stability without introducing user-facing modifications.

Highlights

Global CPU Slicing: Introduced a global CPU slicing mechanism for Ascend NPUs to ensure non-overlapping CPU partitions across multiple processes or data parallel groups, preventing resource contention.
Improved IRQ Binding for A3 Devices: Refined the IRQ binding logic specifically for Ascend A3 devices, correctly mapping logical NPU IDs to physical card and chip IDs for accurate npu-smi queries and preventing multi-process overwrite of IRQ settings.
Enhanced NPU Count Determination: Improved the logic for determining the total number of logical NPUs, prioritizing NPU mapping information to ensure more accurate CPU allocation.
Minimum CPU Requirement: Established a minimum requirement of 5 CPUs per NPU for binding, reserving specific cores for IRQ, main, ACL, and release operations to ensure stable operation.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

vllm_ascend/cpu_binding.py
- Added MIN_CPUS_PER_NPU constant (5 CPUs) and defined TOPO_AFFINITY_MODE, GLOBAL_SLICE_MODE, and DEVICE_BINDING_MODE mapping device types to binding modes.
- Introduced all_logic_npus and total_logic_npus attributes to _DeviceInfo to track all logical NPU IDs.
- Implemented get_all_logic_npus to parse and collect all logical NPU IDs from npu_map_info.
- Replaced the handle_no_affinity method with build_global_slice_cpu_pool, which implements the new global CPU slicing logic.
- Updated the _binding_mode class method to use the new DEVICE_BINDING_MODE dictionary.
- Modified build_cpu_pools to conditionally call build_global_slice_cpu_pool based on the binding mode or as a fallback when NPU affinity info is missing.
- Adjusted the allocate method to enforce the MIN_CPUS_PER_NPU requirement and update the error message.
- Revised bind_npu_irq to target IRQ binding only for the current rank's NPU, preventing multi-process overwrites, and to correctly derive PCI addresses for A3 devices using card_id and chip_id.

Activity

CI passed with newly added and existing tests, ensuring the stability and correctness of the changes.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-03-03T04:26:36Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces significant improvements to CPU resource management for Ascend NPUs. The key change is the implementation of a 'global slice' CPU binding mode, which ensures non-overlapping CPU partitions across multiple processes, preventing resource contention. The PR also includes a critical fix for IRQ binding on A3 devices and enhances the logic for determining the total number of NPUs for allocation. These changes are well-implemented and will improve system stability.

However, there are two main concerns:

Testing: The unit tests in tests/ut/device_allocator/test_cpu_binding.py do not appear to be updated to reflect the substantial changes in cpu_binding.py. Outdated tests for renamed functions and old logic remain. It is critical to add and update tests to cover the new global slicing mechanism and IRQ binding logic to ensure correctness and prevent future regressions. This is a high-severity issue that impacts the maintainability and reliability of the code.
PR Format: The pull request title and description do not adhere to the format specified in the repository's style guide. For better consistency, please update them.

Suggested PR Title:

[Ops][Feature] Implement global CPU slicing and improve IRQ binding for Ascend NPUs

Suggested PR Summary:

### What this PR does / why we need it?
This PR introduces a global CPU slicing mechanism for Ascend NPUs to ensure non-overlapping CPU partitions in multi-process environments. It also corrects a logical error in IRQ binding on A3 devices and improves the logic for determining the total number of NPUs for CPU allocation. These changes are essential for optimizing CPU resource management, improving system stability, and preventing resource contention.

Fixes #<issue_number>

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
CI passed. Unit tests for `cpu_binding.py` should be updated to cover the new global slicing logic and IRQ binding fixes to validate the new functionality and prevent regressions.

…d improve logic for total logic NPUs Signed-off-by: c00818886 <chenchuwei@huawei.com>

wqh17101

Please consider the scenario where the operating system language is Chinese

wqh17101 · 2026-03-03T06:37:01Z

-                f.write(self.cpu_to_mask(cq_cpu))
+
+        for line in info.splitlines():
+            if "PCIe Bus Info" in line:


If the OS is set to Chinese, these strings probably won't match.

…to qwen3next_graph * 'main' of https://github.com/vllm-project/vllm-ascend: (40 commits) [Feature] Add docs of batch invariance and make some extra operators patch (vllm-project#6910) [bugfix]Qwen2.5VL accurate question (vllm-project#6975) [CI] Add DeepSeek-V3.2 large EP nightly ci (vllm-project#6378) [Ops][BugFix] Fix RoPE shape mismatch for mtp models with flashcomm v1 enabled (vllm-project#6939) [bugfix]fix file not found error in nightly of single-node (vllm-project#6976) [Bugfix] Fix the acceptance rates dorp issue when applying eagle3 to QuaRot model (vllm-project#6914) [CI] Enable auto upgrade e2e estimated time for auto-partition suites (vllm-project#6840) [Doc][Misc] Fix msprobe_guide.md documentation issues (vllm-project#6965) [Nightly][Refactor]Migrate nightly single-node model tests from `.py` to `.yaml` (vllm-project#6503) [BugFix] Improve GDN layer detection for multimodal models (vllm-project#6941) [feat]ds3.2 pcp support mtp and chunkprefill (vllm-project#6917) [CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs (vllm-project#6945) [Triton] Centralize Ascend extension op dispatch in triton_utils (vllm-project#6937) [csrc][bugfix] Add compile-time Ascend950/910_95 compatibility for custom ops between CANN8.5 and 9.0 (vllm-project#6936) [300I][Bugfix] fix unquant model weight nd2nz error (vllm-project#6851) [doc] fix supported_models (vllm-project#6930) [CI] nightly test timeout (vllm-project#6912) [CI] Upgrade CANN to 8.5.1 (vllm-project#6897) [Model]Add Qwen3-Omni quantization Ascend NPU adaptation and optimization (vllm-project#6828) [P/D][v0.16.0]Adapt to RecomputeScheduler in vLLM 0.16.0 (vllm-project#6898) ...

…r Ascend NPUs (vllm-project#6945) ### What this PR does / why we need it? This PR introduces global CPU slicing for Ascend NPUs to ensure non-overlapping CPU partitions, addresses IRQ binding logical errors on A3, and enhances the logic for determining total NPUs in CPU allocation. These changes are necessary to optimize CPU resource management and improve system stability. - **Global CPU Slicing**: Introduced a global CPU slicing mechanism for Ascend NPUs to ensure non-overlapping CPU partitions across multiple processes or data parallel groups, preventing resource contention. - **Improved IRQ Binding for A3 Devices**: Refined the IRQ binding logic specifically for Ascend A3 devices, correctly mapping logical NPU IDs to physical card and chip IDs for accurate npu-smi queries and preventing multi-process overwrite of IRQ settings. - **Enhanced NPU Count Determination**: Improved the logic for determining the total number of logical NPUs, prioritizing NPU mapping information to ensure more accurate CPU allocation. - **Minimum CPU Requirement**: Established a minimum requirement of 5 CPUs per NPU for binding, reserving specific cores for IRQ, main, ACL, and release operations to ensure stable operation. ### Does this PR introduce _any_ user-facing change? No user-facing changes are introduced. ### How was this patch tested? CI passed with new added/existing tests. - vLLM version: v0.16.0 - vLLM main: vllm-project/vllm@15d76f7 --------- Signed-off-by: c00818886 <chenchuwei@huawei.com>

chenchuw886 added 3 commits March 3, 2026 11:10

[CPU binding] Implement global CPU slicing for Ascend NPUs to ensure …

1574b5e

…non-overlapping CPU partitions Signed-off-by: c00818886 <chenchuwei@huawei.com>

[CPU binding] Fix IRQ binding logical npu_id error on A3, removing ex…

8229c5f

…cess loop in IRQ binding, clarify min required CPUs limitations Signed-off-by: c00818886 <chenchuwei@huawei.com>

[CPU binding] Improve logic for determining total NPUs in CPU allocat…

9daf7bc

…ion; add descriptions Signed-off-by: c00818886 <chenchuwei@huawei.com>

chenchuw886 requested a review from wangxiyuan as a code owner March 3, 2026 04:26

github-actions bot added the module:core label Mar 3, 2026

chenchuw886 changed the title ~~### Implement global CPU slicing and improve IRQ binding for Ascend NPUs~~ [CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs Mar 3, 2026

gemini-code-assist bot reviewed Mar 3, 2026

View reviewed changes

Enhance unit tests for CPU allocation; add global slice mode tests an…

9f2a65f

…d improve logic for total logic NPUs Signed-off-by: c00818886 <chenchuwei@huawei.com>

chenchuw886 force-pushed the cpu_bind_global_slice branch from 806c63e to 9f2a65f Compare March 3, 2026 05:33

linfeng-yuan added ready read for review ready-for-test start test by label for PR labels Mar 3, 2026

wangxiyuan approved these changes Mar 3, 2026

View reviewed changes

zzzzwwjj approved these changes Mar 3, 2026

View reviewed changes

wangxiyuan merged commit b771ca9 into vllm-project:main Mar 3, 2026
57 of 60 checks passed

wqh17101 reviewed Mar 4, 2026

View reviewed changes

wqh17101 mentioned this pull request Mar 4, 2026

【BUG】CPU binding may fail where the operating system language is Chinese #6992

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs#6945

[CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs#6945
wangxiyuan merged 4 commits intovllm-project:mainfrom
chenchuw886:cpu_bind_global_slice

chenchuw886 commented Mar 3, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 3, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

wqh17101 left a comment

Uh oh!

wqh17101 Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

chenchuw886 commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Related RFC

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot commented Mar 3, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

wqh17101 left a comment

Choose a reason for hiding this comment

Uh oh!

wqh17101 Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

chenchuw886 commented Mar 3, 2026 •

edited

Loading