[0.13.0][cherry-pick][Bugfix][Triton] Centralize Ascend extension op dispatch in triton_utils#7112
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the handling of Triton Ascend extension operations by centralizing their dispatch. The primary goal is to fix a Triton import error by providing a more robust and flexible way to load these operations. This involves creating a utility to dynamically resolve Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a centralized dispatch mechanism for Triton extension operators specific to Ascend hardware. The core change is in vllm_ascend/ops/triton/triton_utils.py, which now includes a resolver function to dynamically load operators from either the triton.language.extra.cann.extension module or the standard triton.language module. This approach elegantly handles different Triton environments and fixes a potential import error. The other modified files have been updated to use these centrally dispatched operators, replacing direct calls to tl.* functions. The changes are a clean and effective refactoring that improves code maintainability. The implementation appears solid, and I found no issues.
…m-project#6937) ### What this PR does / why we need it? This pull request refactors the dispatch mechanism for the **triton-ascend-specific operators** `insert_slice`, `extract_slice`, and `get_element` to ensure compatibility with both CANN 8.5 and 9.0. A unified helper function, `_resolve_triton_ascend_op`, has been introduced in `vllm_ascend/ops/triton/triton_utils.py`. This function dynamically resolves these operators by first attempting to import them from the `triton.language.extra.cann.extension` module, which is present in newer CANN versions. If that fails, it falls back to the standard `triton.language` module. This approach centralizes operator dispatch logic, allowing individual Triton kernels to use these functions without being aware of the underlying Triton/CANN version. All call sites have been updated to use these new unified functions. ### Does this PR introduce _any_ user-facing change? No. This is an internal refactoring of operator implementations and does not introduce any user-facing changes. ### How was this patch tested? CI is expected to pass with existing tests. **Testing Context:** - vLLM version: v0.16.0 - vLLM main: `15d76f74e2fdb12a95ea00f0ca283acf6219a2b7` Signed-off-by: linfeng-yuan <1102311262@qq.com> # Conflicts: # vllm_ascend/ops/triton/activation/swiglu_quant.py # vllm_ascend/ops/triton/fla/solve_tril.py # vllm_ascend/ops/triton/linearnorm/split_qkv_rmsnorm_rope.py # vllm_ascend/ops/triton/reject_sample.py
0fdcf96 to
99d7c8e
Compare
9feab83
into
vllm-project:releases/v0.13.0
cherry-pick from #6937 to fix triton import error
What this PR does / why we need it?
This pull request refactors the dispatch mechanism for the triton-ascend-specific operators insert_slice, extract_slice, and get_element to ensure compatibility with both CANN 8.5 and 9.0.
A unified helper function, _resolve_triton_ascend_op, has been introduced in vllm_ascend/ops/triton/triton_utils.py. This function dynamically resolves these operators by first attempting to import them from the triton.language.extra.cann.extension module, which is present in newer CANN versions. If that fails, it falls back to the standard triton.language module.
This approach centralizes operator dispatch logic, allowing individual Triton kernels to use these functions without being aware of the underlying Triton/CANN version. All call sites have been updated to use these new unified functions.
Does this PR introduce any user-facing change?
How was this patch tested?