[CUDA] GroupQueryAttention with XQA and Quantized KV Cache Support by tianleiwu · Pull Request #27246 · microsoft/onnxruntime

1 configuration not found

Warning: Code scanning cannot determine the alerts introduced by this pull request, because 1 configuration present on refs/heads/main was not found:

API upload

❓ <default>

View all branch alerts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] GroupQueryAttention with XQA and Quantized KV Cache Support#27246

[CUDA] GroupQueryAttention with XQA and Quantized KV Cache Support#27246
tianleiwu merged 8 commits into
mainfrom
tlwu/gqa_xqa_quantized_kv_cache

Uh oh!

1 configuration not found

API upload

Re-running checks...

review feedback

Uh oh!

1 configuration not found

API upload

Re-running checks...