Fix cpp build and tests in master branch#52
Fix cpp build and tests in master branch#52demandal25 wants to merge 2 commits intoROCm:amd-integrationfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR fixes compilation and test issues in the HIP backend by correcting API namespace usage, adding a missing include, and temporarily disabling failing tests.
- Corrects namespace usage for
load_quad_transposed_fragmentfunction call - Comments out paged prefill tests (temporarily disabling them)
- Adds missing fastdiv.cuh include required by debug utilities
Reviewed Changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| test_mfma_fp32_16x16x16fp16.cpp | Updates function call to use correct implementation namespace |
| test_batch_prefill.cpp | Comments out paged prefill test functions and test cases |
| mma_debug_utils_hip.h | Adds missing include for fastdiv.cuh |
| .gitignore | Adds NFS temporary file pattern |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| flashinfer::gpu_iface::mma::load_fragment<__half>(a_reg, &A[a_idx]); | ||
| flashinfer::gpu_iface::mma::load_quad_transposed_fragment<__half>(b_reg, &B[b_idx]); | ||
| flashinfer::gpu_iface::mma_impl::hip::load_quad_transposed_fragment<__half>(b_reg, &B[b_idx]); |
There was a problem hiding this comment.
The function load_quad_transposed_fragment is being accessed directly from the implementation namespace mma_impl::hip instead of the public API namespace mma. This bypasses the intended abstraction layer. Consider either: (1) exposing load_quad_transposed_fragment in the public mma namespace API similar to how load_fragment is exposed, or (2) if this is HIP-specific functionality that shouldn't be in the public API, document why direct access to the implementation namespace is necessary.
There was a problem hiding this comment.
I think this happened during the various merge/rebase cycles we took with the feature/hipified_prefill_v4 branch. We should just add the function to mma public API.
There was a problem hiding this comment.
@demandal25 On reviewing the test case again the use is basically to get the B matrix loaded into a CDNA3 B-matrix transposed layout. The public API already has such a function:
The compute_sfm_v does the same thing by loading the V matrix into a CDNA3 B-matrix layout:
https://github.com/demandal25/flashinfer/blob/f8ea2070e4b5686dbf31680e82a935106e2672d0/libflashinfer/include/flashinfer/attention/generic/prefill.cuh#L1227
Let me review the test case and fix it properly. The test case is kind of legacy before I worked out the layout transformations completely.
3b7485e to
f8ea207
Compare
The PR adds a list to tests to skip from the CMake `build_tests` target. The tests in the skip list can still be individually built. E.g. ```bash # The `test_batch_prefill.cpp` is currently broken and added to the skip list. cmake -DFLASHINFER_ENABLE_HIP=ON -DFLASHINFER_UNITTESTS=ON -GNinja .. ninja build_tests # does not build the test_batch_prefill.cpp tests # The test file can still be built individually using ninja test_batch_prefill_hip ``` Also added the fix to `mma_debug_utils_hip.hpp` from #52 Supersedes #52, #57
|
Superseded by #59 |
CPP test suite was using `hipified` headers. In this PR, we port over unit tests to use `gpu_iface`. This is necessary for us as the next step is to move the build infrastructure to use `gpu_iface`
This PR has been tested locally
```
Test project /root/flashinfer/libflashinfer/tests/hip/build
Start 1: MathTest
1/6 Test #1: MathTest ......................... Passed 3.40 sec
Start 2: PosEncTest
2/6 Test #2: PosEncTest ....................... Passed 3.40 sec
Start 3: CascadeTest
3/6 Test #3: CascadeTest ...................... Passed 985.27 sec
Start 4: PageTest
4/6 Test #4: PageTest ......................... Passed 112.40 sec
Start 5: SingleDecodeTest
5/6 Test #5: SingleDecodeTest ................. Passed 35.46 sec
Start 6: BatchDecodeTest
6/6 Test #6: BatchDecodeTest .................. Passed 556.81 sec
100% tests passed, 0 tests failed out of 6
```
To replicate the tests
```
cd flashinfer/libflashinfer/tests/hip
```
```
mkdir build && cd build/
```
```
cmake -DCMAKE_PREFIX_PATH=/root/libtorch -DCMAKE_CXX_COMPILER:PATH=/opt/rocm/bin/amdclang++ -DFLASHINFER_INCLUDE_DIRS=/root/flashinfer/libflashinfer/include/ ..
```
```
make
```
```
ctest
```
The PR adds a list to tests to skip from the CMake `build_tests` target. The tests in the skip list can still be individually built. E.g. ```bash # The `test_batch_prefill.cpp` is currently broken and added to the skip list. cmake -DFLASHINFER_ENABLE_HIP=ON -DFLASHINFER_UNITTESTS=ON -GNinja .. ninja build_tests # does not build the test_batch_prefill.cpp tests # The test file can still be built individually using ninja test_batch_prefill_hip ``` Also added the fix to `mma_debug_utils_hip.hpp` from #52 Supersedes #52, #57
CPP test suite was using `hipified` headers. In this PR, we port over unit tests to use `gpu_iface`. This is necessary for us as the next step is to move the build infrastructure to use `gpu_iface`
This PR has been tested locally
```
Test project /root/flashinfer/libflashinfer/tests/hip/build
Start 1: MathTest
1/6 Test ROCm#1: MathTest ......................... Passed 3.40 sec
Start 2: PosEncTest
2/6 Test ROCm#2: PosEncTest ....................... Passed 3.40 sec
Start 3: CascadeTest
3/6 Test ROCm#3: CascadeTest ...................... Passed 985.27 sec
Start 4: PageTest
4/6 Test ROCm#4: PageTest ......................... Passed 112.40 sec
Start 5: SingleDecodeTest
5/6 Test ROCm#5: SingleDecodeTest ................. Passed 35.46 sec
Start 6: BatchDecodeTest
6/6 Test ROCm#6: BatchDecodeTest .................. Passed 556.81 sec
100% tests passed, 0 tests failed out of 6
```
To replicate the tests
```
cd flashinfer/libflashinfer/tests/hip
```
```
mkdir build && cd build/
```
```
cmake -DCMAKE_PREFIX_PATH=/root/libtorch -DCMAKE_CXX_COMPILER:PATH=/opt/rocm/bin/amdclang++ -DFLASHINFER_INCLUDE_DIRS=/root/flashinfer/libflashinfer/include/ ..
```
```
make
```
```
ctest
```
The PR adds a list to tests to skip from the CMake `build_tests` target. The tests in the skip list can still be individually built. E.g. ```bash # The `test_batch_prefill.cpp` is currently broken and added to the skip list. cmake -DFLASHINFER_ENABLE_HIP=ON -DFLASHINFER_UNITTESTS=ON -GNinja .. ninja build_tests # does not build the test_batch_prefill.cpp tests # The test file can still be built individually using ninja test_batch_prefill_hip ``` Also added the fix to `mma_debug_utils_hip.hpp` from ROCm#52 Supersedes ROCm#52, ROCm#57
CPP test suite was using `hipified` headers. In this PR, we port over unit tests to use `gpu_iface`. This is necessary for us as the next step is to move the build infrastructure to use `gpu_iface`
This PR has been tested locally
```
Test project /root/flashinfer/libflashinfer/tests/hip/build
Start 1: MathTest
1/6 Test #1: MathTest ......................... Passed 3.40 sec
Start 2: PosEncTest
2/6 Test #2: PosEncTest ....................... Passed 3.40 sec
Start 3: CascadeTest
3/6 Test #3: CascadeTest ...................... Passed 985.27 sec
Start 4: PageTest
4/6 Test #4: PageTest ......................... Passed 112.40 sec
Start 5: SingleDecodeTest
5/6 Test #5: SingleDecodeTest ................. Passed 35.46 sec
Start 6: BatchDecodeTest
6/6 Test #6: BatchDecodeTest .................. Passed 556.81 sec
100% tests passed, 0 tests failed out of 6
```
To replicate the tests
```
cd flashinfer/libflashinfer/tests/hip
```
```
mkdir build && cd build/
```
```
cmake -DCMAKE_PREFIX_PATH=/root/libtorch -DCMAKE_CXX_COMPILER:PATH=/opt/rocm/bin/amdclang++ -DFLASHINFER_INCLUDE_DIRS=/root/flashinfer/libflashinfer/include/ ..
```
```
make
```
```
ctest
```
The PR adds a list to tests to skip from the CMake `build_tests` target. The tests in the skip list can still be individually built. E.g. ```bash # The `test_batch_prefill.cpp` is currently broken and added to the skip list. cmake -DFLASHINFER_ENABLE_HIP=ON -DFLASHINFER_UNITTESTS=ON -GNinja .. ninja build_tests # does not build the test_batch_prefill.cpp tests # The test file can still be built individually using ninja test_batch_prefill_hip ``` Also added the fix to `mma_debug_utils_hip.hpp` from ROCm#52 Supersedes ROCm#52, ROCm#57
The
amd-integrationbranch was failing for cpp tests, not just at runtime, but at build time. This PR does the following: