Skip to content

[0.13.0][cherry-pick][Bugfix][Triton] Centralize Ascend extension op dispatch in triton_utils#7112

Merged
weijinqian0 merged 2 commits intovllm-project:releases/v0.13.0from
zjchenn:cherrypick/6937
Mar 16, 2026
Merged

[0.13.0][cherry-pick][Bugfix][Triton] Centralize Ascend extension op dispatch in triton_utils#7112
weijinqian0 merged 2 commits intovllm-project:releases/v0.13.0from
zjchenn:cherrypick/6937

Conversation

@zjchenn
Copy link
Copy Markdown
Contributor

@zjchenn zjchenn commented Mar 10, 2026

cherry-pick from #6937 to fix triton import error

What this PR does / why we need it?

This pull request refactors the dispatch mechanism for the triton-ascend-specific operators insert_slice, extract_slice, and get_element to ensure compatibility with both CANN 8.5 and 9.0.

A unified helper function, _resolve_triton_ascend_op, has been introduced in vllm_ascend/ops/triton/triton_utils.py. This function dynamically resolves these operators by first attempting to import them from the triton.language.extra.cann.extension module, which is present in newer CANN versions. If that fails, it falls back to the standard triton.language module.

This approach centralizes operator dispatch logic, allowing individual Triton kernels to use these functions without being aware of the underlying Triton/CANN version. All call sites have been updated to use these new unified functions.

Does this PR introduce any user-facing change?

How was this patch tested?

@zjchenn zjchenn changed the title [0.13.0][cherry-pick][Bugfix][Triton] Centralize Ascend extension op dispatch in triton_utils (#6937) [0.13.0][cherry-pick][Bugfix][Triton] Centralize Ascend extension op dispatch in triton_utils Mar 10, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the handling of Triton Ascend extension operations by centralizing their dispatch. The primary goal is to fix a Triton import error by providing a more robust and flexible way to load these operations. This involves creating a utility to dynamically resolve insert_slice, extract_slice, and get_element from available Triton modules, and then updating existing kernels to utilize these new centralized functions. The changes are internal and do not introduce any user-facing modifications.

Highlights

  • Centralized Triton Op Dispatch: Implemented a centralized dispatch mechanism in triton_utils.py to resolve Triton Ascend extension operations (insert_slice, extract_slice, get_element), ensuring they are loaded correctly from either triton.language.extra.cann.extension or triton.language.
  • Refactored Triton Op Calls: Updated various Triton kernels across multiple files to use the newly centralized extract_slice, insert_slice, and get_element utilities instead of direct tl. calls.
  • Bugfix for Triton Import Error: This change cherry-picks a fix aimed at resolving a Triton import error, improving the robustness of custom operator loading.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • vllm_ascend/ops/triton/activation/swiglu_quant.py
    • Updated tl.extract_slice calls to use the new extract_slice utility.
  • vllm_ascend/ops/triton/fla/solve_tril.py
    • Updated tl.extract_slice and tl.insert_slice calls to use the new utilities.
  • vllm_ascend/ops/triton/linearnorm/split_qkv_rmsnorm_rope.py
    • Updated tl.extract_slice and tl.insert_slice calls to use the new utilities.
  • vllm_ascend/ops/triton/reject_sample.py
    • Updated tl.get_element calls to use the new get_element utility.
  • vllm_ascend/ops/triton/triton_utils.py
    • Implemented a centralized dispatch mechanism for Triton Ascend extension operations.
    • Defined insert_slice, extract_slice, and get_element based on this mechanism.
Activity
  • This pull request is a cherry-pick from vllm-project/vllm-ascend/pull/6937.
  • The author, zjchenn, states the purpose is to fix a Triton import error.
  • The changes are described as an internal refactoring without user-facing impact.
  • Existing CI tests are expected to pass with this patch.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a centralized dispatch mechanism for Triton extension operators specific to Ascend hardware. The core change is in vllm_ascend/ops/triton/triton_utils.py, which now includes a resolver function to dynamically load operators from either the triton.language.extra.cann.extension module or the standard triton.language module. This approach elegantly handles different Triton environments and fixes a potential import error. The other modified files have been updated to use these centrally dispatched operators, replacing direct calls to tl.* functions. The changes are a clean and effective refactoring that improves code maintainability. The implementation appears solid, and I found no issues.

…m-project#6937)

### What this PR does / why we need it?

This pull request refactors the dispatch mechanism for the
**triton-ascend-specific operators** `insert_slice`, `extract_slice`,
and `get_element` to ensure compatibility with both CANN 8.5 and 9.0.

A unified helper function, `_resolve_triton_ascend_op`, has been
introduced in `vllm_ascend/ops/triton/triton_utils.py`. This function
dynamically resolves these operators by first attempting to import them
from the `triton.language.extra.cann.extension` module, which is present
in newer CANN versions. If that fails, it falls back to the standard
`triton.language` module.

This approach centralizes operator dispatch logic, allowing individual
Triton kernels to use these functions without being aware of the
underlying Triton/CANN version. All call sites have been updated to use
these new unified functions.

### Does this PR introduce _any_ user-facing change?

No. This is an internal refactoring of operator implementations and does
not introduce any user-facing changes.

### How was this patch tested?

CI is expected to pass with existing tests.

**Testing Context:**
- vLLM version: v0.16.0
- vLLM main: `15d76f74e2fdb12a95ea00f0ca283acf6219a2b7`

Signed-off-by: linfeng-yuan <1102311262@qq.com>
# Conflicts:
#	vllm_ascend/ops/triton/activation/swiglu_quant.py
#	vllm_ascend/ops/triton/fla/solve_tril.py
#	vllm_ascend/ops/triton/linearnorm/split_qkv_rmsnorm_rope.py
#	vllm_ascend/ops/triton/reject_sample.py
@weijinqian0 weijinqian0 added ready read for review ready-for-test start test by label for PR labels Mar 10, 2026
@weijinqian0 weijinqian0 merged commit 9feab83 into vllm-project:releases/v0.13.0 Mar 16, 2026
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants