Use CUDART_VERSION reduction compatibility in GQA attention by Copilot · Pull Request #28296 · microsoft/onnxruntime

Copilot · 2026-04-30T16:19:51Z

Description

Update /home/runner/work/onnxruntime/onnxruntime/onnxruntime/contrib_ops/cuda/bert/gqa_unfused_attention.cu to match the existing CUDA attention compatibility pattern used elsewhere in the repo.

Replace the local reduction functors with the established CUDART_VERSION >= 12090 guards.
Use ::cuda::maximum() and ::cuda::std::plus() for CUDA 12.9+.
Keep cub::Max() and cub::Sum() as the fallback for older toolkits.

Motivation and Context

This keeps the GQA unfused attention kernel consistent with nearby CUDA attention code and avoids the CUDA 12.9+ deprecation issue around the old CUB reduction functors while preserving compatibility with older CUDA toolkits.

Validation:

git diff --check
Code review validation: no comments
CodeQL validation: no analyzable language changes detected

…s in gqa_unfused_attention.cu Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/f48f1b22-0b97-4e82-a3e0-6c98fcd0317c Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>

Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/748706c1-e89d-4bbb-bb9f-dc129f909727 Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>

Copilot AI and others added 2 commits April 30, 2026 15:57

Use CUDART_VERSION reduction compatibility in GQA attention

2aa0c26

Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/748706c1-e89d-4bbb-bb9f-dc129f909727 Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>

Copilot AI assigned Copilot and tianleiwu Apr 30, 2026

Copilot created this pull request from a session on behalf of tianleiwu April 30, 2026 16:20 View session

Copilot finished work on behalf of tianleiwu April 30, 2026 16:20

Copilot AI requested a review from tianleiwu April 30, 2026 16:20

tianleiwu approved these changes Apr 30, 2026

View reviewed changes

tianleiwu marked this pull request as ready for review April 30, 2026 16:21

sanaa-hamel-microsoft approved these changes Apr 30, 2026

View reviewed changes

tianleiwu enabled auto-merge (squash) April 30, 2026 17:07

tianleiwu merged commit 9c2b0c3 into main Apr 30, 2026
89 checks passed

tianleiwu deleted the copilot/fix-cuda-13-build-error branch April 30, 2026 18:24

BrewTestBot mentioned this pull request May 8, 2026

onnxruntime 1.26.0 Homebrew/homebrew-core#281672

Merged

This was referenced May 8, 2026

chore(deps): Bump Microsoft.ML.OnnxRuntime from 1.22.0 to 1.26.0 verbara/Verbara.Sdk#3

Open

Bump Microsoft.ML.OnnxRuntime from 1.25.1 to 1.26.0 IoTSharp/SonnetDB#62

Open

Bump Microsoft.ML.OnnxRuntime from 1.25.1 to 1.26.0 lopatnov/translate#23

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use CUDART_VERSION reduction compatibility in GQA attention#28296

Use CUDART_VERSION reduction compatibility in GQA attention#28296
tianleiwu merged 2 commits intomainfrom
copilot/fix-cuda-13-build-error

Copilot AI commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Apr 30, 2026

Description

Motivation and Context

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants