Commit 6f180d4
[Unity] PagedKVCache supporting on-the-fly RoPE calculation
This PR enhances PagedKVCache with the inline RoPE compute,
which unblocks the movement towards sliding window and attention
sink.
Both FlashInfer and TIR kernels are updated in this PR with
the RoPE calculation. Note that FlashInfer is bumped in order
to include the RoPE update.
The previous standalone kernel used for RoPE application
are thereby removed.
---
Co-authored-by: Bohan Hou <[email protected]>
Co-authored-by: Hongyi Jin <[email protected]>1 parent 07d8e02 commit 6f180d4
File tree
4 files changed
+635
-220
lines changed- 3rdparty
- src/runtime/relax_vm
- tests/python/relax
4 files changed
+635
-220
lines changedSubmodule flashinfer updated 22 files
- CMakeLists.txt+28-16
- cmake/config.cmake+8-2
- include/flashinfer/cascade.cuh+177-58
- include/flashinfer/decode.cuh+27-32
- include/flashinfer/handler.cuh+30-20
- include/flashinfer/page.cuh+29-12
- include/flashinfer/prefill.cuh+275-140
- include/flashinfer/utils.cuh+11-6
- python/csrc/batch_decode.cu+2-2
- python/csrc/batch_prefill.cu+4-4
- python/csrc/cascade.cu+7-7
- python/flashinfer/ops/__init__.py+8-8
- python/setup.py+1
- src/bench_batch_decode.cu+10-8
- src/bench_cascade.cu+349
- src/test_batch_decode.cu+11-9
- src/test_batch_prefill.cu+15-20
- src/test_cascade.cu+413
- src/test_single_decode.cu+1-1
- src/test_single_prefill.cu+1-1
- src/tvm_wrapper.cu+77-26
- src/utils.h+103
0 commit comments