Skip to content

Conversation

@bhargaveede
Copy link

We are not essentially doing anything for Chunk Prefill.
It's SplitKV Kernel when we compare with Cuda kernels which we implemented to support Chunked Prefill in serving frameworks.

So, To avoid the naming confusion, I renamed the kernels to use SplitKV instead of ChunkPrefill

…is Split KV attention

Changing the name to avoid the confusion as well as align with Flash Attention
@airMeng
Copy link
Collaborator

airMeng commented Nov 4, 2025

hi @bhargaveede split_kv means splitting the kv cache along kv seq dimensions and dispatching to different work groups. We haven't supported this feature so split_kv is not accurate. @sunjiweiswift will implement split_kv for decoding only in #23, so please stay tuned

@airMeng airMeng closed this Nov 4, 2025
@bhargaveede
Copy link
Author

Hi @airMeng Thanks for the clarification.
However, I see that we are doing kv_splits and tiling based on that inside a single work group.
If I understood correctly, split_kv you rerferred is to split across work group.

Still, We have a naming confusion with chunked_prefill as chunked_prefill is serving framework feature and naming the current kernel as Chunked Prefill adds confusion.

Do you have any suggestion here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants