Skip to content

Conversation

@MasterJH5574
Copy link
Contributor

This PR enhances PagedKVCache with the inline RoPE compute, which unblocks the movement towards sliding window and attention sink.

Both FlashInfer and TIR kernels are updated in this PR with the RoPE calculation. Note that FlashInfer is bumped in order to include the RoPE update.

The previous standalone kernel used for RoPE application are thereby removed.


Co-authored-by: Bohan Hou [email protected]
Co-authored-by: Hongyi Jin [email protected]

@MasterJH5574 MasterJH5574 force-pushed the unity-dev/2024-01-13-kv-cache-rope branch 3 times, most recently from b67d71e to 281ddc3 Compare January 13, 2024 22:49
@MasterJH5574 MasterJH5574 force-pushed the unity-dev/2024-01-13-kv-cache-rope branch 3 times, most recently from f162975 to bd3b958 Compare January 14, 2024 19:01
This PR enhances PagedKVCache with the inline RoPE compute,
which unblocks the movement towards sliding window and attention
sink.

Both FlashInfer and TIR kernels are updated in this PR with
the RoPE calculation. Note that FlashInfer is bumped in order
to include the RoPE update.

The previous standalone kernel used for RoPE application
are thereby removed.

---

Co-authored-by: Bohan Hou <[email protected]>
Co-authored-by: Hongyi Jin <[email protected]>
@MasterJH5574 MasterJH5574 force-pushed the unity-dev/2024-01-13-kv-cache-rope branch from bd3b958 to 6f180d4 Compare January 14, 2024 21:10
@tqchen tqchen merged commit 98d5153 into apache:unity Jan 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants