Skip to content

Port over BatchPrefillWithRaggedKVCache to HIP#50

Merged
demandal25 merged 5 commits intoROCm:amd-integrationfrom
rtmadduri:feature/impl-batch-ragged-prefill
Nov 14, 2025
Merged

Port over BatchPrefillWithRaggedKVCache to HIP#50
demandal25 merged 5 commits intoROCm:amd-integrationfrom
rtmadduri:feature/impl-batch-ragged-prefill

Conversation

@rtmadduri
Copy link
Collaborator

This PR ports over the BatchPrefillWithRaggedKVCacheKernel to HIP. The PR makes changes to

  • Remove magic numbers and use defined variables like HALF_ELEMS_PER_THREAD or NUM_ACCUM_ROWS_PER_THREAD
  • Adjusts sizes for s_frag, o_frag m and d
  • Modifies the shared memory indexing logic

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR ports the BatchPrefillWithRaggedKVCache kernel from CUDA to HIP, improving code maintainability by replacing magic numbers with defined constants and adapting shared memory indexing for AMD GPUs.

Key changes:

  • Introduces BatchPrefillHandler class and wrapper functions for HIP
  • Adds comprehensive test coverage for batch prefill operations on HIP
  • Replaces hardcoded values with named constants like HALF_ELEMS_PER_THREAD and NUM_ACCUM_ROWS_PER_THREAD
  • Implements platform-specific shared memory indexing for HIP vs CUDA

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
libflashinfer/utils/flashinfer_prefill_ops_hip.h Adds BatchPrefillHandler class and wrapper functions for ragged and paged KV cache operations on HIP
libflashinfer/tests/hip/test_batch_prefill.cpp New comprehensive test suite for batch prefill operations with various configurations
libflashinfer/include/flashinfer/attention/generic/prefill.cuh Updates kernel implementation with named constants, platform-specific indexing, and pragma once directive

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Debasis Mandal <Debasis.Mandal@amd.com>
@demandal25 demandal25 merged commit 49c8c6e into ROCm:amd-integration Nov 14, 2025
1 check passed
demandal25 added a commit that referenced this pull request Nov 21, 2025
This PR fixes the batch prefill example script for ragged kv cache. The
examples for batch prefill with ragged kv cache in the script are now
passing. This reinforces the basic correctness of the PR #50

More exhaustive tests are in the pytest script for batch prefill and
should pass to call it a full victory.
diptorupd pushed a commit that referenced this pull request Dec 5, 2025
diptorupd pushed a commit that referenced this pull request Dec 5, 2025
This PR ports over the `BatchPrefillWithRaggedKVCacheKernel` to HIP. The
PR makes changes to

- Remove magic numbers and use defined variables like
`HALF_ELEMS_PER_THREAD` or `NUM_ACCUM_ROWS_PER_THREAD`
- Adjusts sizes for `s_frag`, `o_frag` `m` and `d`
- Modifies the shared memory indexing logic

---------

Signed-off-by: Debasis Mandal <Debasis.Mandal@amd.com>
Co-authored-by: Debasis Mandal <Debasis.Mandal@amd.com>
diptorupd pushed a commit that referenced this pull request Dec 5, 2025
This PR fixes the batch prefill example script for ragged kv cache. The
examples for batch prefill with ragged kv cache in the script are now
passing. This reinforces the basic correctness of the PR #50

More exhaustive tests are in the pytest script for batch prefill and
should pass to call it a full victory.
zhenhantech pushed a commit to zhenhantech/flashinfer that referenced this pull request Jan 9, 2026
zhenhantech pushed a commit to zhenhantech/flashinfer that referenced this pull request Jan 9, 2026
This PR ports over the `BatchPrefillWithRaggedKVCacheKernel` to HIP. The
PR makes changes to

- Remove magic numbers and use defined variables like
`HALF_ELEMS_PER_THREAD` or `NUM_ACCUM_ROWS_PER_THREAD`
- Adjusts sizes for `s_frag`, `o_frag` `m` and `d`
- Modifies the shared memory indexing logic

---------

Signed-off-by: Debasis Mandal <Debasis.Mandal@amd.com>
Co-authored-by: Debasis Mandal <Debasis.Mandal@amd.com>
zhenhantech pushed a commit to zhenhantech/flashinfer that referenced this pull request Jan 9, 2026
This PR fixes the batch prefill example script for ragged kv cache. The
examples for batch prefill with ragged kv cache in the script are now
passing. This reinforces the basic correctness of the PR ROCm#50

More exhaustive tests are in the pytest script for batch prefill and
should pass to call it a full victory.
diptorupd pushed a commit to diptorupd/flashinfer that referenced this pull request Jan 28, 2026
diptorupd pushed a commit to diptorupd/flashinfer that referenced this pull request Jan 28, 2026
This PR ports over the `BatchPrefillWithRaggedKVCacheKernel` to HIP. The
PR makes changes to

- Remove magic numbers and use defined variables like
`HALF_ELEMS_PER_THREAD` or `NUM_ACCUM_ROWS_PER_THREAD`
- Adjusts sizes for `s_frag`, `o_frag` `m` and `d`
- Modifies the shared memory indexing logic

---------

Signed-off-by: Debasis Mandal <Debasis.Mandal@amd.com>
Co-authored-by: Debasis Mandal <Debasis.Mandal@amd.com>
diptorupd pushed a commit to diptorupd/flashinfer that referenced this pull request Jan 28, 2026
This PR fixes the batch prefill example script for ragged kv cache. The
examples for batch prefill with ragged kv cache in the script are now
passing. This reinforces the basic correctness of the PR ROCm#50

More exhaustive tests are in the pytest script for batch prefill and
should pass to call it a full victory.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants