Skip to content

Fix cpp build and tests in master branch#52

Closed
demandal25 wants to merge 2 commits intoROCm:amd-integrationfrom
demandal25:fix-tests-in-master-branch
Closed

Fix cpp build and tests in master branch#52
demandal25 wants to merge 2 commits intoROCm:amd-integrationfrom
demandal25:fix-tests-in-master-branch

Conversation

@demandal25
Copy link
Collaborator

@demandal25 demandal25 commented Nov 14, 2025

The amd-integration branch was failing for cpp tests, not just at runtime, but at build time. This PR does the following:

  • renames some variables or adds missing inclusion of headers.
  • comments out the hip batch prefill test as it was not only throwing compilation error, but failing at runtime too

Copilot AI review requested due to automatic review settings November 14, 2025 23:23
@demandal25 demandal25 changed the title Fix the master Fix cpp build and tests in master branch Nov 14, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes compilation and test issues in the HIP backend by correcting API namespace usage, adding a missing include, and temporarily disabling failing tests.

  • Corrects namespace usage for load_quad_transposed_fragment function call
  • Comments out paged prefill tests (temporarily disabling them)
  • Adds missing fastdiv.cuh include required by debug utilities

Reviewed Changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.

File Description
test_mfma_fp32_16x16x16fp16.cpp Updates function call to use correct implementation namespace
test_batch_prefill.cpp Comments out paged prefill test functions and test cases
mma_debug_utils_hip.h Adds missing include for fastdiv.cuh
.gitignore Adds NFS temporary file pattern

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


flashinfer::gpu_iface::mma::load_fragment<__half>(a_reg, &A[a_idx]);
flashinfer::gpu_iface::mma::load_quad_transposed_fragment<__half>(b_reg, &B[b_idx]);
flashinfer::gpu_iface::mma_impl::hip::load_quad_transposed_fragment<__half>(b_reg, &B[b_idx]);
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function load_quad_transposed_fragment is being accessed directly from the implementation namespace mma_impl::hip instead of the public API namespace mma. This bypasses the intended abstraction layer. Consider either: (1) exposing load_quad_transposed_fragment in the public mma namespace API similar to how load_fragment is exposed, or (2) if this is HIP-specific functionality that shouldn't be in the public API, document why direct access to the implementation namespace is necessary.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a valid one! @diptorupd ?

Copy link
Collaborator

@diptorupd diptorupd Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this happened during the various merge/rebase cycles we took with the feature/hipified_prefill_v4 branch. We should just add the function to mma public API.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@demandal25 On reviewing the test case again the use is basically to get the B matrix loaded into a CDNA3 B-matrix transposed layout. The public API already has such a function:

https://github.com/ROCm/flashinfer/blob/6ca1e7f18d61c87932c44dc1c1490b2c778caec2/libflashinfer/include/flashinfer/attention/generic/permuted_smem.cuh#L169C35-L169C59

The compute_sfm_v does the same thing by loading the V matrix into a CDNA3 B-matrix layout:
https://github.com/demandal25/flashinfer/blob/f8ea2070e4b5686dbf31680e82a935106e2672d0/libflashinfer/include/flashinfer/attention/generic/prefill.cuh#L1227

Let me review the test case and fix it properly. The test case is kind of legacy before I worked out the layout transformations completely.

@demandal25 demandal25 force-pushed the fix-tests-in-master-branch branch from 3b7485e to f8ea207 Compare November 14, 2025 23:33
demandal25 pushed a commit that referenced this pull request Nov 18, 2025
The PR adds a list to tests to skip from the CMake `build_tests` target.
The tests in the skip list can still be individually built.

E.g.
```bash
# The `test_batch_prefill.cpp` is currently broken and added to the skip list.
cmake -DFLASHINFER_ENABLE_HIP=ON -DFLASHINFER_UNITTESTS=ON  -GNinja ..
ninja build_tests # does not build the test_batch_prefill.cpp tests
# The test file can still be built individually using
ninja test_batch_prefill_hip
```
Also added the fix to `mma_debug_utils_hip.hpp` from #52

Supersedes #52, #57
@demandal25
Copy link
Collaborator Author

Superseded by #59

@demandal25 demandal25 closed this Nov 18, 2025
diptorupd pushed a commit that referenced this pull request Dec 5, 2025
CPP test suite was using `hipified` headers. In this PR, we port over unit tests to use `gpu_iface`. This is necessary for us as the next step is to move the build infrastructure to use `gpu_iface`

This PR has been tested locally 
```
Test project /root/flashinfer/libflashinfer/tests/hip/build
    Start 1: MathTest
1/6 Test #1: MathTest .........................   Passed    3.40 sec
    Start 2: PosEncTest
2/6 Test #2: PosEncTest .......................   Passed    3.40 sec
    Start 3: CascadeTest
3/6 Test #3: CascadeTest ......................   Passed  985.27 sec
    Start 4: PageTest
4/6 Test #4: PageTest .........................   Passed  112.40 sec
    Start 5: SingleDecodeTest
5/6 Test #5: SingleDecodeTest .................   Passed   35.46 sec
    Start 6: BatchDecodeTest
6/6 Test #6: BatchDecodeTest ..................   Passed  556.81 sec

100% tests passed, 0 tests failed out of 6
```

To replicate the tests
```
cd flashinfer/libflashinfer/tests/hip
```
```
mkdir build && cd build/
```
```
cmake -DCMAKE_PREFIX_PATH=/root/libtorch -DCMAKE_CXX_COMPILER:PATH=/opt/rocm/bin/amdclang++ -DFLASHINFER_INCLUDE_DIRS=/root/flashinfer/libflashinfer/include/ ..
```
```
make
```
```
ctest
```
diptorupd added a commit that referenced this pull request Dec 5, 2025
The PR adds a list to tests to skip from the CMake `build_tests` target.
The tests in the skip list can still be individually built.

E.g.
```bash
# The `test_batch_prefill.cpp` is currently broken and added to the skip list.
cmake -DFLASHINFER_ENABLE_HIP=ON -DFLASHINFER_UNITTESTS=ON  -GNinja ..
ninja build_tests # does not build the test_batch_prefill.cpp tests
# The test file can still be built individually using
ninja test_batch_prefill_hip
```
Also added the fix to `mma_debug_utils_hip.hpp` from #52

Supersedes #52, #57
@demandal25 demandal25 deleted the fix-tests-in-master-branch branch January 5, 2026 23:51
zhenhantech pushed a commit to zhenhantech/flashinfer that referenced this pull request Jan 9, 2026
CPP test suite was using `hipified` headers. In this PR, we port over unit tests to use `gpu_iface`. This is necessary for us as the next step is to move the build infrastructure to use `gpu_iface`

This PR has been tested locally 
```
Test project /root/flashinfer/libflashinfer/tests/hip/build
    Start 1: MathTest
1/6 Test ROCm#1: MathTest .........................   Passed    3.40 sec
    Start 2: PosEncTest
2/6 Test ROCm#2: PosEncTest .......................   Passed    3.40 sec
    Start 3: CascadeTest
3/6 Test ROCm#3: CascadeTest ......................   Passed  985.27 sec
    Start 4: PageTest
4/6 Test ROCm#4: PageTest .........................   Passed  112.40 sec
    Start 5: SingleDecodeTest
5/6 Test ROCm#5: SingleDecodeTest .................   Passed   35.46 sec
    Start 6: BatchDecodeTest
6/6 Test ROCm#6: BatchDecodeTest ..................   Passed  556.81 sec

100% tests passed, 0 tests failed out of 6
```

To replicate the tests
```
cd flashinfer/libflashinfer/tests/hip
```
```
mkdir build && cd build/
```
```
cmake -DCMAKE_PREFIX_PATH=/root/libtorch -DCMAKE_CXX_COMPILER:PATH=/opt/rocm/bin/amdclang++ -DFLASHINFER_INCLUDE_DIRS=/root/flashinfer/libflashinfer/include/ ..
```
```
make
```
```
ctest
```
zhenhantech pushed a commit to zhenhantech/flashinfer that referenced this pull request Jan 9, 2026
The PR adds a list to tests to skip from the CMake `build_tests` target.
The tests in the skip list can still be individually built.

E.g.
```bash
# The `test_batch_prefill.cpp` is currently broken and added to the skip list.
cmake -DFLASHINFER_ENABLE_HIP=ON -DFLASHINFER_UNITTESTS=ON  -GNinja ..
ninja build_tests # does not build the test_batch_prefill.cpp tests
# The test file can still be built individually using
ninja test_batch_prefill_hip
```
Also added the fix to `mma_debug_utils_hip.hpp` from ROCm#52

Supersedes ROCm#52, ROCm#57
diptorupd pushed a commit to diptorupd/flashinfer that referenced this pull request Jan 28, 2026
CPP test suite was using `hipified` headers. In this PR, we port over unit tests to use `gpu_iface`. This is necessary for us as the next step is to move the build infrastructure to use `gpu_iface`

This PR has been tested locally 
```
Test project /root/flashinfer/libflashinfer/tests/hip/build
    Start 1: MathTest
1/6 Test #1: MathTest .........................   Passed    3.40 sec
    Start 2: PosEncTest
2/6 Test #2: PosEncTest .......................   Passed    3.40 sec
    Start 3: CascadeTest
3/6 Test #3: CascadeTest ......................   Passed  985.27 sec
    Start 4: PageTest
4/6 Test #4: PageTest .........................   Passed  112.40 sec
    Start 5: SingleDecodeTest
5/6 Test #5: SingleDecodeTest .................   Passed   35.46 sec
    Start 6: BatchDecodeTest
6/6 Test #6: BatchDecodeTest ..................   Passed  556.81 sec

100% tests passed, 0 tests failed out of 6
```

To replicate the tests
```
cd flashinfer/libflashinfer/tests/hip
```
```
mkdir build && cd build/
```
```
cmake -DCMAKE_PREFIX_PATH=/root/libtorch -DCMAKE_CXX_COMPILER:PATH=/opt/rocm/bin/amdclang++ -DFLASHINFER_INCLUDE_DIRS=/root/flashinfer/libflashinfer/include/ ..
```
```
make
```
```
ctest
```
diptorupd added a commit to diptorupd/flashinfer that referenced this pull request Jan 28, 2026
The PR adds a list to tests to skip from the CMake `build_tests` target.
The tests in the skip list can still be individually built.

E.g.
```bash
# The `test_batch_prefill.cpp` is currently broken and added to the skip list.
cmake -DFLASHINFER_ENABLE_HIP=ON -DFLASHINFER_UNITTESTS=ON  -GNinja ..
ninja build_tests # does not build the test_batch_prefill.cpp tests
# The test file can still be built individually using
ninja test_batch_prefill_hip
```
Also added the fix to `mma_debug_utils_hip.hpp` from ROCm#52

Supersedes ROCm#52, ROCm#57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants