Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: Incorrect result on transposing forward DFT #507

Closed
hjabird opened this issue Aug 22, 2024 · 3 comments
Closed

[Issue]: Incorrect result on transposing forward DFT #507

hjabird opened this issue Aug 22, 2024 · 3 comments
Assignees

Comments

@hjabird
Copy link

hjabird commented Aug 22, 2024

Problem Description

In oneMKL Interfaces library, a test using the rocFFT backend fails. This test is a forward transposing DFT. I believe this a bug in rocFFT, introduced between the releases for ROCm 5.4.3 and ROCm 5.7.1. The issue is described in a comment at oneapi-src/oneMKL#559.

The DFT that fails can be described as:

lengths = {4, 4, 4}
inStrides = {16 1 4}
outStrides = {1 16 4}
batches = 2 // may not be important

This can be reproduced by building oneMKL interfaces library with the rocFFT backend for ROCm 5.7.1 or later, and running the the test. Intel's oneAPI base toolkit 2024.2 and the Codeplay's plugin for AMD GPUs are required (see the oneMKL interfaces documentation).

cmake $ONEMKL_SOURCE_DIR -DCMAKE_CXX_COMPILER=icpx -DCMAKE_C_COMPILER=icx -DENABLE_MKLGPU_BACKEND=OFF -DENABLE_MKLCPU_BACKEND=OFF -DENABLE_ROCFFT_BACKEND=ON -DTARGET_DOMAINS=dft -DBUILD_FUNCTIONAL_TESTS=ON -DHIP_CXX_COMPILER=/path/to/rocm/version/llvm/bin/clang++
make
./bin/test_main_dft_ct --gtest_filter=*omputeTests_out_of_place_COMPLEX.COMPLEX_SINGLE_out_of_place_USM/sizes_4x4x4_fwd_strides_2_4_1_16_bwd_strides_1_4_16_1_*

I've identified a bug in the rocFFT implementation that fixes the above issue in my tests: a lambda captures a variable by value where by reference is intended. I don't entirely understand why this works if reordering the DFT dimensions is otherwise valid, so it might be dumb luck.

diff --git a/library/src/plan.cpp b/library/src/plan.cpp
index 969bdd04..1ef07ded 100644
--- a/library/src/plan.cpp
+++ b/library/src/plan.cpp
@@ -237,7 +237,7 @@ void rocfft_plan_t::sort()
         return;

     bool sort_on_istride = true;
-    auto sorter          = [sort_on_istride](const rocfft_iodim& a, const rocfft_iodim& b) {
+    auto sorter          = [&sort_on_istride](const rocfft_iodim& a, const rocfft_iodim& b) {
         // move any lengths of 1 to the end
         if(a.length == 1 && b.length != 1)
             return false;

This fix was tested on the rocFFT version corresponding to ROCm 5.7.1 only.

Operating System

Ubuntu 22.04

CPU

AMD EPYC 7402 w/MI210, i9-12900K w/w6800

GPU

AMD Instinct MI210, AMD Radeon Pro W6800

ROCm Version

ROCm 6.1.0, ROCm 6.0.0, ROCm 5.7.1

ROCm Component

rocFFT

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

@evetsso
Copy link
Contributor

evetsso commented Aug 29, 2024

0953683 addresses this problem. @hjabird the lambda change you mentioned doesn't look to me like it would have any effect. If you're able to try building rocFFT from source, would you be able to check whether this commit fixes things?

@hjabird
Copy link
Author

hjabird commented Aug 29, 2024 via email

@s-Nick
Copy link

s-Nick commented Aug 29, 2024

Thank you for the quick fix @evetsso. I can confirm you that the current develop branch works fine for our tests.

s-Nick added a commit to s-Nick/oneMKL that referenced this issue Aug 30, 2024
Due to rocFFt internal bug ROCm/rocFFT#507
Add cmake version checks based upon rocFFT version.
Add exception to deal with faulty cases for affected rocFFT version.
RocFFT versions taken from https://github.com/ROCm/rocFFT/blob/develop/CHANGELOG.md
s-Nick added a commit to s-Nick/oneMKL that referenced this issue Aug 30, 2024
Due to rocFFt internal bug ROCm/rocFFT#507
Add cmake version checks based upon rocFFT version.
Add exception to deal with faulty cases for affected rocFFT version.
RocFFT versions taken from https://github.com/ROCm/rocFFT/blob/develop/CHANGELOG.md
s-Nick added a commit to s-Nick/oneMKL that referenced this issue Aug 30, 2024
Due to rocFFt internal bug ROCm/rocFFT#507
Add cmake version checks based upon rocFFT version.
Add exception to deal with faulty cases for affected rocFFT version.
RocFFT versions taken from https://github.com/ROCm/rocFFT/blob/develop/CHANGELOG.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants