Skip to content

Conversation

@MasterJH5574
Copy link
Contributor

This PR supports PagedKVCache with leveraging TIR kernels.

Right now we do not have sufficient TIR kernels for multi-level sequences in PagedKVCache, therefore Fork in PagedKVCache is disabled when such a function does not exist.

This PR adds a "reduced" creator of PagedKVCache, where some auxiliary functions such as the begin/end forward function of prefill/decode default to None.

CUDA tests are added to ensure correctness.

Co-authored-by: Hongyi Jin [email protected]
Co-authored-by: Bohan Hou [email protected]

@MasterJH5574 MasterJH5574 force-pushed the unity-dev/2024-01-09-paged-kv-cache-tir branch 3 times, most recently from 1603a90 to efbb7dd Compare January 11, 2024 22:10
This PR supports PagedKVCache with leveraging TIR kernels.

Right now we do not have sufficient TIR kernels for multi-level
sequences in PagedKVCache, therefore `Fork` in PagedKVCache
is disabled when such a function does not exist.

This PR adds a "reduced" creator of PagedKVCache, where
some auxiliary functions such as the begin/end forward function
of prefill/decode default to None.

CUDA tests are added to ensure correctness.

Co-authored-by: Hongyi Jin <[email protected]>
Co-authored-by: Bohan Hou <[email protected]>
@MasterJH5574 MasterJH5574 force-pushed the unity-dev/2024-01-09-paged-kv-cache-tir branch from efbb7dd to b96d082 Compare January 12, 2024 00:05
@tqchen tqchen merged commit 7798e93 into apache:unity Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants