Skip to content

Commit 6f180d4

Browse files
MasterJH5574spectrometerHBHjinhongyii
committed
[Unity] PagedKVCache supporting on-the-fly RoPE calculation
This PR enhances PagedKVCache with the inline RoPE compute, which unblocks the movement towards sliding window and attention sink. Both FlashInfer and TIR kernels are updated in this PR with the RoPE calculation. Note that FlashInfer is bumped in order to include the RoPE update. The previous standalone kernel used for RoPE application are thereby removed. --- Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]>
1 parent 07d8e02 commit 6f180d4