[CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs#6945
Conversation
…non-overlapping CPU partitions Signed-off-by: c00818886 <chenchuwei@huawei.com>
…cess loop in IRQ binding, clarify min required CPUs limitations Signed-off-by: c00818886 <chenchuwei@huawei.com>
…ion; add descriptions Signed-off-by: c00818886 <chenchuwei@huawei.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly overhauls the CPU binding mechanism for Ascend NPUs. It introduces a global CPU slicing strategy to prevent CPU resource overlap when multiple processes share the same CPU set, ensuring deterministic and isolated CPU partitions for each logical NPU. Additionally, it corrects logical errors in IRQ binding for A3 devices and improves the overall robustness of NPU CPU allocation by better determining the total number of logical NPUs. These changes aim to optimize CPU resource management and enhance system stability without introducing user-facing modifications. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request introduces significant improvements to CPU resource management for Ascend NPUs. The key change is the implementation of a 'global slice' CPU binding mode, which ensures non-overlapping CPU partitions across multiple processes, preventing resource contention. The PR also includes a critical fix for IRQ binding on A3 devices and enhances the logic for determining the total number of NPUs for allocation. These changes are well-implemented and will improve system stability.
However, there are two main concerns:
- Testing: The unit tests in
tests/ut/device_allocator/test_cpu_binding.pydo not appear to be updated to reflect the substantial changes incpu_binding.py. Outdated tests for renamed functions and old logic remain. It is critical to add and update tests to cover the new global slicing mechanism and IRQ binding logic to ensure correctness and prevent future regressions. This is a high-severity issue that impacts the maintainability and reliability of the code. - PR Format: The pull request title and description do not adhere to the format specified in the repository's style guide. For better consistency, please update them.
Suggested PR Title:
[Ops][Feature] Implement global CPU slicing and improve IRQ binding for Ascend NPUs
Suggested PR Summary:
### What this PR does / why we need it?
This PR introduces a global CPU slicing mechanism for Ascend NPUs to ensure non-overlapping CPU partitions in multi-process environments. It also corrects a logical error in IRQ binding on A3 devices and improves the logic for determining the total number of NPUs for CPU allocation. These changes are essential for optimizing CPU resource management, improving system stability, and preventing resource contention.
Fixes #<issue_number>
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI passed. Unit tests for `cpu_binding.py` should be updated to cover the new global slicing logic and IRQ binding fixes to validate the new functionality and prevent regressions.…d improve logic for total logic NPUs Signed-off-by: c00818886 <chenchuwei@huawei.com>
806c63e to
9f2a65f
Compare
wqh17101
left a comment
There was a problem hiding this comment.
Please consider the scenario where the operating system language is Chinese
| f.write(self.cpu_to_mask(cq_cpu)) | ||
|
|
||
| for line in info.splitlines(): | ||
| if "PCIe Bus Info" in line: |
There was a problem hiding this comment.
If the OS is set to Chinese, these strings probably won't match.
…to qwen3next_graph * 'main' of https://github.com/vllm-project/vllm-ascend: (40 commits) [Feature] Add docs of batch invariance and make some extra operators patch (vllm-project#6910) [bugfix]Qwen2.5VL accurate question (vllm-project#6975) [CI] Add DeepSeek-V3.2 large EP nightly ci (vllm-project#6378) [Ops][BugFix] Fix RoPE shape mismatch for mtp models with flashcomm v1 enabled (vllm-project#6939) [bugfix]fix file not found error in nightly of single-node (vllm-project#6976) [Bugfix] Fix the acceptance rates dorp issue when applying eagle3 to QuaRot model (vllm-project#6914) [CI] Enable auto upgrade e2e estimated time for auto-partition suites (vllm-project#6840) [Doc][Misc] Fix msprobe_guide.md documentation issues (vllm-project#6965) [Nightly][Refactor]Migrate nightly single-node model tests from `.py` to `.yaml` (vllm-project#6503) [BugFix] Improve GDN layer detection for multimodal models (vllm-project#6941) [feat]ds3.2 pcp support mtp and chunkprefill (vllm-project#6917) [CPU binding] Implement global CPU slicing and improve IRQ binding for Ascend NPUs (vllm-project#6945) [Triton] Centralize Ascend extension op dispatch in triton_utils (vllm-project#6937) [csrc][bugfix] Add compile-time Ascend950/910_95 compatibility for custom ops between CANN8.5 and 9.0 (vllm-project#6936) [300I][Bugfix] fix unquant model weight nd2nz error (vllm-project#6851) [doc] fix supported_models (vllm-project#6930) [CI] nightly test timeout (vllm-project#6912) [CI] Upgrade CANN to 8.5.1 (vllm-project#6897) [Model]Add Qwen3-Omni quantization Ascend NPU adaptation and optimization (vllm-project#6828) [P/D][v0.16.0]Adapt to RecomputeScheduler in vLLM 0.16.0 (vllm-project#6898) ...
…r Ascend NPUs (vllm-project#6945) ### What this PR does / why we need it? This PR introduces global CPU slicing for Ascend NPUs to ensure non-overlapping CPU partitions, addresses IRQ binding logical errors on A3, and enhances the logic for determining total NPUs in CPU allocation. These changes are necessary to optimize CPU resource management and improve system stability. - **Global CPU Slicing**: Introduced a global CPU slicing mechanism for Ascend NPUs to ensure non-overlapping CPU partitions across multiple processes or data parallel groups, preventing resource contention. - **Improved IRQ Binding for A3 Devices**: Refined the IRQ binding logic specifically for Ascend A3 devices, correctly mapping logical NPU IDs to physical card and chip IDs for accurate npu-smi queries and preventing multi-process overwrite of IRQ settings. - **Enhanced NPU Count Determination**: Improved the logic for determining the total number of logical NPUs, prioritizing NPU mapping information to ensure more accurate CPU allocation. - **Minimum CPU Requirement**: Established a minimum requirement of 5 CPUs per NPU for binding, reserving specific cores for IRQ, main, ACL, and release operations to ensure stable operation. ### Does this PR introduce _any_ user-facing change? No user-facing changes are introduced. ### How was this patch tested? CI passed with new added/existing tests. - vLLM version: v0.16.0 - vLLM main: vllm-project/vllm@15d76f7 --------- Signed-off-by: c00818886 <chenchuwei@huawei.com>
What this PR does / why we need it?
This PR introduces global CPU slicing for Ascend NPUs to ensure non-overlapping CPU partitions, addresses IRQ binding logical errors on A3, and enhances the logic for determining total NPUs in CPU allocation. These changes are necessary to optimize CPU resource management and improve system stability.
Related RFC
#6966
Does this PR introduce any user-facing change?
No user-facing changes are introduced.
How was this patch tested?
CI passed with new added/existing tests.