Fix cpp build and tests in master branch by demandal25 · Pull Request #52 · ROCm/flashinfer

demandal25 · 2025-11-14T23:23:38Z

The amd-integration branch was failing for cpp tests, not just at runtime, but at build time. This PR does the following:

renames some variables or adds missing inclusion of headers.
comments out the hip batch prefill test as it was not only throwing compilation error, but failing at runtime too

Copilot

Pull Request Overview

This PR fixes compilation and test issues in the HIP backend by correcting API namespace usage, adding a missing include, and temporarily disabling failing tests.

Corrects namespace usage for load_quad_transposed_fragment function call
Comments out paged prefill tests (temporarily disabling them)
Adds missing fastdiv.cuh include required by debug utilities

Reviewed Changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.

File	Description
test_mfma_fp32_16x16x16fp16.cpp	Updates function call to use correct implementation namespace
test_batch_prefill.cpp	Comments out paged prefill test functions and test cases
mma_debug_utils_hip.h	Adds missing include for fastdiv.cuh
.gitignore	Adds NFS temporary file pattern

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-14T23:26:21Z

libflashinfer/tests/hip/test_mfma_fp32_16x16x16fp16.cpp


  flashinfer::gpu_iface::mma::load_fragment<__half>(a_reg, &A[a_idx]);
-  flashinfer::gpu_iface::mma::load_quad_transposed_fragment<__half>(b_reg, &B[b_idx]);
+  flashinfer::gpu_iface::mma_impl::hip::load_quad_transposed_fragment<__half>(b_reg, &B[b_idx]);


The function load_quad_transposed_fragment is being accessed directly from the implementation namespace mma_impl::hip instead of the public API namespace mma. This bypasses the intended abstraction layer. Consider either: (1) exposing load_quad_transposed_fragment in the public mma namespace API similar to how load_fragment is exposed, or (2) if this is HIP-specific functionality that shouldn't be in the public API, document why direct access to the implementation namespace is necessary.

this is a valid one! @diptorupd ?

I think this happened during the various merge/rebase cycles we took with the feature/hipified_prefill_v4 branch. We should just add the function to mma public API.

@demandal25 On reviewing the test case again the use is basically to get the B matrix loaded into a CDNA3 B-matrix transposed layout. The public API already has such a function:

https://github.com/ROCm/flashinfer/blob/6ca1e7f18d61c87932c44dc1c1490b2c778caec2/libflashinfer/include/flashinfer/attention/generic/permuted_smem.cuh#L169C35-L169C59

The compute_sfm_v does the same thing by loading the V matrix into a CDNA3 B-matrix layout:
https://github.com/demandal25/flashinfer/blob/f8ea2070e4b5686dbf31680e82a935106e2672d0/libflashinfer/include/flashinfer/attention/generic/prefill.cuh#L1227

Let me review the test case and fix it properly. The test case is kind of legacy before I worked out the layout transformations completely.

The PR adds a list to tests to skip from the CMake `build_tests` target. The tests in the skip list can still be individually built. E.g. ```bash # The `test_batch_prefill.cpp` is currently broken and added to the skip list. cmake -DFLASHINFER_ENABLE_HIP=ON -DFLASHINFER_UNITTESTS=ON -GNinja .. ninja build_tests # does not build the test_batch_prefill.cpp tests # The test file can still be built individually using ninja test_batch_prefill_hip ``` Also added the fix to `mma_debug_utils_hip.hpp` from #52 Supersedes #52, #57

demandal25 · 2025-11-18T18:27:51Z

Superseded by #59

CPP test suite was using `hipified` headers. In this PR, we port over unit tests to use `gpu_iface`. This is necessary for us as the next step is to move the build infrastructure to use `gpu_iface` This PR has been tested locally ``` Test project /root/flashinfer/libflashinfer/tests/hip/build Start 1: MathTest 1/6 Test #1: MathTest ......................... Passed 3.40 sec Start 2: PosEncTest 2/6 Test #2: PosEncTest ....................... Passed 3.40 sec Start 3: CascadeTest 3/6 Test #3: CascadeTest ...................... Passed 985.27 sec Start 4: PageTest 4/6 Test #4: PageTest ......................... Passed 112.40 sec Start 5: SingleDecodeTest 5/6 Test #5: SingleDecodeTest ................. Passed 35.46 sec Start 6: BatchDecodeTest 6/6 Test #6: BatchDecodeTest .................. Passed 556.81 sec 100% tests passed, 0 tests failed out of 6 ``` To replicate the tests ``` cd flashinfer/libflashinfer/tests/hip ``` ``` mkdir build && cd build/ ``` ``` cmake -DCMAKE_PREFIX_PATH=/root/libtorch -DCMAKE_CXX_COMPILER:PATH=/opt/rocm/bin/amdclang++ -DFLASHINFER_INCLUDE_DIRS=/root/flashinfer/libflashinfer/include/ .. ``` ``` make ``` ``` ctest ```

The PR adds a list to tests to skip from the CMake `build_tests` target. The tests in the skip list can still be individually built. E.g. ```bash # The `test_batch_prefill.cpp` is currently broken and added to the skip list. cmake -DFLASHINFER_ENABLE_HIP=ON -DFLASHINFER_UNITTESTS=ON -GNinja .. ninja build_tests # does not build the test_batch_prefill.cpp tests # The test file can still be built individually using ninja test_batch_prefill_hip ``` Also added the fix to `mma_debug_utils_hip.hpp` from #52 Supersedes #52, #57

CPP test suite was using `hipified` headers. In this PR, we port over unit tests to use `gpu_iface`. This is necessary for us as the next step is to move the build infrastructure to use `gpu_iface` This PR has been tested locally ``` Test project /root/flashinfer/libflashinfer/tests/hip/build Start 1: MathTest 1/6 Test ROCm#1: MathTest ......................... Passed 3.40 sec Start 2: PosEncTest 2/6 Test ROCm#2: PosEncTest ....................... Passed 3.40 sec Start 3: CascadeTest 3/6 Test ROCm#3: CascadeTest ...................... Passed 985.27 sec Start 4: PageTest 4/6 Test ROCm#4: PageTest ......................... Passed 112.40 sec Start 5: SingleDecodeTest 5/6 Test ROCm#5: SingleDecodeTest ................. Passed 35.46 sec Start 6: BatchDecodeTest 6/6 Test ROCm#6: BatchDecodeTest .................. Passed 556.81 sec 100% tests passed, 0 tests failed out of 6 ``` To replicate the tests ``` cd flashinfer/libflashinfer/tests/hip ``` ``` mkdir build && cd build/ ``` ``` cmake -DCMAKE_PREFIX_PATH=/root/libtorch -DCMAKE_CXX_COMPILER:PATH=/opt/rocm/bin/amdclang++ -DFLASHINFER_INCLUDE_DIRS=/root/flashinfer/libflashinfer/include/ .. ``` ``` make ``` ``` ctest ```

The PR adds a list to tests to skip from the CMake `build_tests` target. The tests in the skip list can still be individually built. E.g. ```bash # The `test_batch_prefill.cpp` is currently broken and added to the skip list. cmake -DFLASHINFER_ENABLE_HIP=ON -DFLASHINFER_UNITTESTS=ON -GNinja .. ninja build_tests # does not build the test_batch_prefill.cpp tests # The test file can still be built individually using ninja test_batch_prefill_hip ``` Also added the fix to `mma_debug_utils_hip.hpp` from ROCm#52 Supersedes ROCm#52, ROCm#57

CPP test suite was using `hipified` headers. In this PR, we port over unit tests to use `gpu_iface`. This is necessary for us as the next step is to move the build infrastructure to use `gpu_iface` This PR has been tested locally ``` Test project /root/flashinfer/libflashinfer/tests/hip/build Start 1: MathTest 1/6 Test #1: MathTest ......................... Passed 3.40 sec Start 2: PosEncTest 2/6 Test #2: PosEncTest ....................... Passed 3.40 sec Start 3: CascadeTest 3/6 Test #3: CascadeTest ...................... Passed 985.27 sec Start 4: PageTest 4/6 Test #4: PageTest ......................... Passed 112.40 sec Start 5: SingleDecodeTest 5/6 Test #5: SingleDecodeTest ................. Passed 35.46 sec Start 6: BatchDecodeTest 6/6 Test #6: BatchDecodeTest .................. Passed 556.81 sec 100% tests passed, 0 tests failed out of 6 ``` To replicate the tests ``` cd flashinfer/libflashinfer/tests/hip ``` ``` mkdir build && cd build/ ``` ``` cmake -DCMAKE_PREFIX_PATH=/root/libtorch -DCMAKE_CXX_COMPILER:PATH=/opt/rocm/bin/amdclang++ -DFLASHINFER_INCLUDE_DIRS=/root/flashinfer/libflashinfer/include/ .. ``` ``` make ``` ``` ctest ```

The PR adds a list to tests to skip from the CMake `build_tests` target. The tests in the skip list can still be individually built. E.g. ```bash # The `test_batch_prefill.cpp` is currently broken and added to the skip list. cmake -DFLASHINFER_ENABLE_HIP=ON -DFLASHINFER_UNITTESTS=ON -GNinja .. ninja build_tests # does not build the test_batch_prefill.cpp tests # The test file can still be built individually using ninja test_batch_prefill_hip ``` Also added the fix to `mma_debug_utils_hip.hpp` from ROCm#52 Supersedes ROCm#52, ROCm#57

Copilot AI review requested due to automatic review settings November 14, 2025 23:23

Copilot started reviewing on behalf of demandal25 November 14, 2025 23:24 View session

demandal25 changed the title ~~Fix the master~~ Fix cpp build and tests in master branch Nov 14, 2025

Copilot finished reviewing on behalf of demandal25 November 14, 2025 23:25

Copilot AI reviewed Nov 14, 2025

View reviewed changes

demandal25 added 2 commits November 14, 2025 23:32

Fix cpp build and tests in master branch

d5df106

disable batch prefill test for HIP

f8ea207

demandal25 force-pushed the fix-tests-in-master-branch branch from 3b7485e to f8ea207 Compare November 14, 2025 23:33

demandal25 requested review from diptorupd and rtmadduri November 14, 2025 23:34

diptorupd mentioned this pull request Nov 18, 2025

Skip failing C++ tests and fix mma_debug_utils #59

Merged

demandal25 closed this Nov 18, 2025

demandal25 deleted the fix-tests-in-master-branch branch January 5, 2026 23:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix cpp build and tests in master branch#52

Fix cpp build and tests in master branch#52
demandal25 wants to merge 2 commits intoROCm:amd-integrationfrom
demandal25:fix-tests-in-master-branch

demandal25 commented Nov 14, 2025 •

edited by diptorupd

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 14, 2025

Uh oh!

demandal25 Nov 14, 2025

Uh oh!

diptorupd Nov 15, 2025 •

edited

Loading

Uh oh!

diptorupd Nov 15, 2025

Uh oh!

demandal25 commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

demandal25 commented Nov 14, 2025 • edited by diptorupd Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

demandal25 Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

diptorupd Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

diptorupd Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

demandal25 commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

demandal25 commented Nov 14, 2025 •

edited by diptorupd

Loading

diptorupd Nov 15, 2025 •

edited

Loading