Port over BatchPrefillWithRaggedKVCache to HIP by rtmadduri · Pull Request #50 · ROCm/flashinfer

rtmadduri · 2025-11-14T08:50:23Z

This PR ports over the BatchPrefillWithRaggedKVCacheKernel to HIP. The PR makes changes to

Remove magic numbers and use defined variables like HALF_ELEMS_PER_THREAD or NUM_ACCUM_ROWS_PER_THREAD
Adjusts sizes for s_frag, o_frag m and d
Modifies the shared memory indexing logic

libflashinfer/include/flashinfer/attention/generic/prefill.cuh

Copilot

Pull Request Overview

This PR ports the BatchPrefillWithRaggedKVCache kernel from CUDA to HIP, improving code maintainability by replacing magic numbers with defined constants and adapting shared memory indexing for AMD GPUs.

Key changes:

Introduces BatchPrefillHandler class and wrapper functions for HIP
Adds comprehensive test coverage for batch prefill operations on HIP
Replaces hardcoded values with named constants like HALF_ELEMS_PER_THREAD and NUM_ACCUM_ROWS_PER_THREAD
Implements platform-specific shared memory indexing for HIP vs CUDA

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File	Description
libflashinfer/utils/flashinfer_prefill_ops_hip.h	Adds BatchPrefillHandler class and wrapper functions for ragged and paged KV cache operations on HIP
libflashinfer/tests/hip/test_batch_prefill.cpp	New comprehensive test suite for batch prefill operations with various configurations
libflashinfer/include/flashinfer/attention/generic/prefill.cuh	Updates kernel implementation with named constants, platform-specific indexing, and pragma once directive

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

libflashinfer/tests/hip/test_batch_prefill.cpp

libflashinfer/include/flashinfer/attention/generic/prefill.cuh

Signed-off-by: Debasis Mandal <Debasis.Mandal@amd.com>

This PR fixes the batch prefill example script for ragged kv cache. The examples for batch prefill with ragged kv cache in the script are now passing. This reinforces the basic correctness of the PR #50 More exhaustive tests are in the pytest script for batch prefill and should pass to call it a full victory.

Simplifies Dockerfile

This PR ports over the `BatchPrefillWithRaggedKVCacheKernel` to HIP. The PR makes changes to - Remove magic numbers and use defined variables like `HALF_ELEMS_PER_THREAD` or `NUM_ACCUM_ROWS_PER_THREAD` - Adjusts sizes for `s_frag`, `o_frag` `m` and `d` - Modifies the shared memory indexing logic --------- Signed-off-by: Debasis Mandal <Debasis.Mandal@amd.com> Co-authored-by: Debasis Mandal <Debasis.Mandal@amd.com>

This PR fixes the batch prefill example script for ragged kv cache. The examples for batch prefill with ragged kv cache in the script are now passing. This reinforces the basic correctness of the PR #50 More exhaustive tests are in the pytest script for batch prefill and should pass to call it a full victory.

Simplifies Dockerfile

This PR ports over the `BatchPrefillWithRaggedKVCacheKernel` to HIP. The PR makes changes to - Remove magic numbers and use defined variables like `HALF_ELEMS_PER_THREAD` or `NUM_ACCUM_ROWS_PER_THREAD` - Adjusts sizes for `s_frag`, `o_frag` `m` and `d` - Modifies the shared memory indexing logic --------- Signed-off-by: Debasis Mandal <Debasis.Mandal@amd.com> Co-authored-by: Debasis Mandal <Debasis.Mandal@amd.com>

This PR fixes the batch prefill example script for ragged kv cache. The examples for batch prefill with ragged kv cache in the script are now passing. This reinforces the basic correctness of the PR ROCm#50 More exhaustive tests are in the pytest script for batch prefill and should pass to call it a full victory.

Simplifies Dockerfile

This PR ports over the `BatchPrefillWithRaggedKVCacheKernel` to HIP. The PR makes changes to - Remove magic numbers and use defined variables like `HALF_ELEMS_PER_THREAD` or `NUM_ACCUM_ROWS_PER_THREAD` - Adjusts sizes for `s_frag`, `o_frag` `m` and `d` - Modifies the shared memory indexing logic --------- Signed-off-by: Debasis Mandal <Debasis.Mandal@amd.com> Co-authored-by: Debasis Mandal <Debasis.Mandal@amd.com>

This PR fixes the batch prefill example script for ragged kv cache. The examples for batch prefill with ragged kv cache in the script are now passing. This reinforces the basic correctness of the PR ROCm#50 More exhaustive tests are in the pytest script for batch prefill and should pass to call it a full victory.

rtmadduri added 4 commits November 13, 2025 08:45

Implement test_batch_prefill unit test

1634298

pre-commit

4843259

first draft

7b9bccb

rebased and batch ragged prefill working

81acedf

rtmadduri requested review from demandal25 and diptorupd November 14, 2025 08:50

demandal25 requested a review from Copilot November 14, 2025 19:14

Copilot started reviewing on behalf of demandal25 November 14, 2025 19:15 View session

Copilot finished reviewing on behalf of demandal25 November 14, 2025 19:17

demandal25 approved these changes Nov 14, 2025

View reviewed changes

libflashinfer/include/flashinfer/attention/generic/prefill.cuh Show resolved Hide resolved

Copilot AI reviewed Nov 14, 2025

View reviewed changes

Merge branch 'amd-integration' into feature/impl-batch-ragged-prefill

9f801e8

Signed-off-by: Debasis Mandal <Debasis.Mandal@amd.com>

demandal25 merged commit 49c8c6e into ROCm:amd-integration Nov 14, 2025
1 check passed

demandal25 mentioned this pull request Nov 21, 2025

Fix batch prefill example script for ragged kv cache #73

Merged

diptorupd pushed a commit that referenced this pull request Dec 5, 2025

Remove USER requirement from Docker (#50)

96166ec

Simplifies Dockerfile

zhenhantech pushed a commit to zhenhantech/flashinfer that referenced this pull request Jan 9, 2026

Remove USER requirement from Docker (ROCm#50)

53b3d35

Simplifies Dockerfile

diptorupd pushed a commit to diptorupd/flashinfer that referenced this pull request Jan 28, 2026

Remove USER requirement from Docker (ROCm#50)

8470373

Simplifies Dockerfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port over BatchPrefillWithRaggedKVCache to HIP#50

Port over BatchPrefillWithRaggedKVCache to HIP#50
demandal25 merged 5 commits intoROCm:amd-integrationfrom
rtmadduri:feature/impl-batch-ragged-prefill

rtmadduri commented Nov 14, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rtmadduri commented Nov 14, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants