Skip to content

Conversation

@kunal-vaishnavi
Copy link
Contributor

Description

This PR moves the CUDA memcpy for the QK output when type T is equal to type QK from attention_impl.cu into attention_qk.cu.

Motivation and Context

This PR fixes a linkage error when type T and type QK are the same in attention_qk.cu.

@yuslepukhin
Copy link
Member

Looks reasonable

@yuslepukhin
Copy link
Member

The linkage error is due to the fact that Multihead attention code attempts to indirectly instantiate CopyQK with <float, float> but that does not exists

@kunal-vaishnavi kunal-vaishnavi merged commit 2b3d7fb into main Mar 24, 2025
87 of 91 checks passed
@kunal-vaishnavi kunal-vaishnavi deleted the kvaishnavi/attention-qk branch March 24, 2025 02:43
zhaoxul-qti pushed a commit to CodeLinaro/onnxruntime that referenced this pull request Apr 17, 2025
### Description
This PR moves the CUDA memcpy for the QK output when type `T` is equal
to type `QK` from `attention_impl.cu` into `attention_qk.cu`.

### Motivation and Context
This PR fixes a linkage error when type `T` and type `QK` are the same
in `attention_qk.cu`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants