Port over BatchPrefillWithRaggedKVCache to HIP#50
Merged
demandal25 merged 5 commits intoROCm:amd-integrationfrom Nov 14, 2025
Merged
Port over BatchPrefillWithRaggedKVCache to HIP#50demandal25 merged 5 commits intoROCm:amd-integrationfrom
demandal25 merged 5 commits intoROCm:amd-integrationfrom
Conversation
demandal25
approved these changes
Nov 14, 2025
There was a problem hiding this comment.
Pull Request Overview
This PR ports the BatchPrefillWithRaggedKVCache kernel from CUDA to HIP, improving code maintainability by replacing magic numbers with defined constants and adapting shared memory indexing for AMD GPUs.
Key changes:
- Introduces
BatchPrefillHandlerclass and wrapper functions for HIP - Adds comprehensive test coverage for batch prefill operations on HIP
- Replaces hardcoded values with named constants like
HALF_ELEMS_PER_THREADandNUM_ACCUM_ROWS_PER_THREAD - Implements platform-specific shared memory indexing for HIP vs CUDA
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| libflashinfer/utils/flashinfer_prefill_ops_hip.h | Adds BatchPrefillHandler class and wrapper functions for ragged and paged KV cache operations on HIP |
| libflashinfer/tests/hip/test_batch_prefill.cpp | New comprehensive test suite for batch prefill operations with various configurations |
| libflashinfer/include/flashinfer/attention/generic/prefill.cuh | Updates kernel implementation with named constants, platform-specific indexing, and pragma once directive |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Debasis Mandal <Debasis.Mandal@amd.com>
demandal25
added a commit
that referenced
this pull request
Nov 21, 2025
This PR fixes the batch prefill example script for ragged kv cache. The examples for batch prefill with ragged kv cache in the script are now passing. This reinforces the basic correctness of the PR #50 More exhaustive tests are in the pytest script for batch prefill and should pass to call it a full victory.
diptorupd
pushed a commit
that referenced
this pull request
Dec 5, 2025
This PR ports over the `BatchPrefillWithRaggedKVCacheKernel` to HIP. The PR makes changes to - Remove magic numbers and use defined variables like `HALF_ELEMS_PER_THREAD` or `NUM_ACCUM_ROWS_PER_THREAD` - Adjusts sizes for `s_frag`, `o_frag` `m` and `d` - Modifies the shared memory indexing logic --------- Signed-off-by: Debasis Mandal <Debasis.Mandal@amd.com> Co-authored-by: Debasis Mandal <Debasis.Mandal@amd.com>
diptorupd
pushed a commit
that referenced
this pull request
Dec 5, 2025
This PR fixes the batch prefill example script for ragged kv cache. The examples for batch prefill with ragged kv cache in the script are now passing. This reinforces the basic correctness of the PR #50 More exhaustive tests are in the pytest script for batch prefill and should pass to call it a full victory.
zhenhantech
pushed a commit
to zhenhantech/flashinfer
that referenced
this pull request
Jan 9, 2026
Simplifies Dockerfile
zhenhantech
pushed a commit
to zhenhantech/flashinfer
that referenced
this pull request
Jan 9, 2026
This PR ports over the `BatchPrefillWithRaggedKVCacheKernel` to HIP. The PR makes changes to - Remove magic numbers and use defined variables like `HALF_ELEMS_PER_THREAD` or `NUM_ACCUM_ROWS_PER_THREAD` - Adjusts sizes for `s_frag`, `o_frag` `m` and `d` - Modifies the shared memory indexing logic --------- Signed-off-by: Debasis Mandal <Debasis.Mandal@amd.com> Co-authored-by: Debasis Mandal <Debasis.Mandal@amd.com>
zhenhantech
pushed a commit
to zhenhantech/flashinfer
that referenced
this pull request
Jan 9, 2026
This PR fixes the batch prefill example script for ragged kv cache. The examples for batch prefill with ragged kv cache in the script are now passing. This reinforces the basic correctness of the PR ROCm#50 More exhaustive tests are in the pytest script for batch prefill and should pass to call it a full victory.
diptorupd
pushed a commit
to diptorupd/flashinfer
that referenced
this pull request
Jan 28, 2026
Simplifies Dockerfile
diptorupd
pushed a commit
to diptorupd/flashinfer
that referenced
this pull request
Jan 28, 2026
This PR ports over the `BatchPrefillWithRaggedKVCacheKernel` to HIP. The PR makes changes to - Remove magic numbers and use defined variables like `HALF_ELEMS_PER_THREAD` or `NUM_ACCUM_ROWS_PER_THREAD` - Adjusts sizes for `s_frag`, `o_frag` `m` and `d` - Modifies the shared memory indexing logic --------- Signed-off-by: Debasis Mandal <Debasis.Mandal@amd.com> Co-authored-by: Debasis Mandal <Debasis.Mandal@amd.com>
diptorupd
pushed a commit
to diptorupd/flashinfer
that referenced
this pull request
Jan 28, 2026
This PR fixes the batch prefill example script for ragged kv cache. The examples for batch prefill with ragged kv cache in the script are now passing. This reinforces the basic correctness of the PR ROCm#50 More exhaustive tests are in the pytest script for batch prefill and should pass to call it a full victory.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR ports over the
BatchPrefillWithRaggedKVCacheKernelto HIP. The PR makes changes toHALF_ELEMS_PER_THREADorNUM_ACCUM_ROWS_PER_THREADs_frag,o_fragmandd