Skip to content

Use CUDART_VERSION reduction compatibility in GQA attention#28296

Merged
tianleiwu merged 2 commits intomainfrom
copilot/fix-cuda-13-build-error
Apr 30, 2026
Merged

Use CUDART_VERSION reduction compatibility in GQA attention#28296
tianleiwu merged 2 commits intomainfrom
copilot/fix-cuda-13-build-error

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 30, 2026

Description

Update /home/runner/work/onnxruntime/onnxruntime/onnxruntime/contrib_ops/cuda/bert/gqa_unfused_attention.cu to match the existing CUDA attention compatibility pattern used elsewhere in the repo.

  • Replace the local reduction functors with the established CUDART_VERSION >= 12090 guards.
  • Use ::cuda::maximum() and ::cuda::std::plus() for CUDA 12.9+.
  • Keep cub::Max() and cub::Sum() as the fallback for older toolkits.

Motivation and Context

This keeps the GQA unfused attention kernel consistent with nearby CUDA attention code and avoids the CUDA 12.9+ deprecation issue around the old CUB reduction functors while preserving compatibility with older CUDA toolkits.

Validation:

  • git diff --check
  • Code review validation: no comments
  • CodeQL validation: no analyzable language changes detected

Copilot AI and others added 2 commits April 30, 2026 15:57
…s in gqa_unfused_attention.cu

Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/f48f1b22-0b97-4e82-a3e0-6c98fcd0317c

Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
@tianleiwu tianleiwu marked this pull request as ready for review April 30, 2026 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants