KV Cache
#1617
Replies: 1 comment
-
@krzysz00 Ive written this. I think there would be no early workgroup exits. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Example
Short summary of notes
max_seq_len
(say 4096)n_block_idx
* gemm0NperBlock tocurrent_seq_len
For a short term solution :
current_seq_len
.n_block_idx
>current_seq_len
k_block_idx
>current_seq_len
NOTE 1 : I think above needed to be understood in transposed manner as we do : (Vt x ( Kt x Qt ))t
NOTE 2 : This is how do this w/o touching coordinate transforms.
Beta Was this translation helpful? Give feedback.
All reactions