Skip to content

[libclc] replace float remquo with amd ocml implementation#177131

Merged
wenju-he merged 5 commits intollvm:mainfrom
wenju-he:remquo-use-amdgpu-ocml-remquo-float
Jan 26, 2026
Merged

[libclc] replace float remquo with amd ocml implementation#177131
wenju-he merged 5 commits intollvm:mainfrom
wenju-he:remquo-use-amdgpu-ocml-remquo-float

Conversation

@wenju-he
Copy link
Contributor

@wenju-he wenju-he commented Jan 21, 2026

Current implementation has two issues:

  • unconditionally soft flushes denormal.
  • can't pass OpenCL CTS test "test_bruteforce remquo" on intel gpu.

This PR upstreams remquo implementation from
https://github.com/ROCm/llvm-project/tree/amd-staging/amd/device-libs/ocml/src/remainderF_base.h It supports denormal and can pass OpenCL CTS test.
Number of LLVM IR instructions of function _Z6remquoffPU3AS5i increased from 96 to 680.

Current implementation has two issues:
* unconditionally soft flushes denormal.
* can't pass OpenCL CTS test "test_bruteforce remquo" on intel gpu.

This PR upstreams remquo implementation from
https://github.com/ROCm/llvm-project/tree/amd-staging/amd/device-libs/ocml/src/remainderF_base.h
It supports denormal and can pass OpenCL CTS test.
Note __oclc_finite_only_opt is set to false as there is no  dynamic
dispatching for generic implementation.
Number of LLVM IR instructions of function _Z6remquoffPU3AS5i increased
from 96 to 678.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the current float remquo implementation with AMD's OCML version to fix denormal handling and OpenCL CTS test failures. The new implementation supports denormal numbers properly and passes the "test_bruteforce remquo" test on Intel GPUs.

Changes:

  • Replaced the existing remquo algorithm with AMD OCML implementation from ROCm
  • Removed soft flushing of denormals that was causing issues
  • Added support for proper denormal handling and edge cases

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
libclc/clc/lib/generic/math/clc_remquo.inc Complete rewrite of remquo implementation using AMD OCML algorithm with denormal support
libclc/clc/lib/generic/math/clc_remquo.cl Updated includes to support new implementation (added fabs, copysign, frexp, nan, native_recip, rint, isfinite, isnan)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +18 to +39
ex = ({
int _exp;
__clc_frexp(ax, &_exp);
_exp;
}) -
1;
ax = __clc_ldexp(({
int _exp;
__clc_frexp(ax, &_exp);
}),
bits);
ey = ({
int _exp;
__clc_frexp(ay, &_exp);
_exp;
}) -
1;
ay = __clc_ldexp(({
int _exp;
__clc_frexp(ay, &_exp);
}),
1);
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The frexp function is called twice on ax with identical logic but only the exponent is used the first time and only the mantissa the second time. This is inefficient and duplicative. Consider storing both the mantissa and exponent from a single frexp call and reusing them.

Suggested change
ex = ({
int _exp;
__clc_frexp(ax, &_exp);
_exp;
}) -
1;
ax = __clc_ldexp(({
int _exp;
__clc_frexp(ax, &_exp);
}),
bits);
ey = ({
int _exp;
__clc_frexp(ay, &_exp);
_exp;
}) -
1;
ay = __clc_ldexp(({
int _exp;
__clc_frexp(ay, &_exp);
}),
1);
int _exp_ax;
float mant_ax = __clc_frexp(ax, &_exp_ax);
ex = _exp_ax - 1;
ax = __clc_ldexp(mant_ax, bits);
int _exp_ay;
float mant_ay = __clc_frexp(ay, &_exp_ay);
ey = _exp_ay - 1;
ay = __clc_ldexp(mant_ay, 1);

Copilot uses AI. Check for mistakes.
Comment on lines 91 to 99
bool __oclc_finite_only_opt = false;
if (!__oclc_finite_only_opt) {
ret = y == 0.0f ? __clc_nan(0) : ret;
q7 = y == 0.0f ? 0 : q7;
bool c = !__clc_isnan(y) && __clc_isfinite(x);
ret = c ? ret : __clc_nan(0);
q7 = c ? q7 : 0;
}

Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name __oclc_finite_only_opt uses a naming convention that suggests it's a configuration constant or macro, but it's declared as a local boolean variable that's always false. This is confusing and misleading. Consider renaming it to something like finite_only_disabled or removing it entirely if it's meant to be a temporary placeholder.

Suggested change
bool __oclc_finite_only_opt = false;
if (!__oclc_finite_only_opt) {
ret = y == 0.0f ? __clc_nan(0) : ret;
q7 = y == 0.0f ? 0 : q7;
bool c = !__clc_isnan(y) && __clc_isfinite(x);
ret = c ? ret : __clc_nan(0);
q7 = c ? q7 : 0;
}
ret = y == 0.0f ? __clc_nan(0) : ret;
q7 = y == 0.0f ? 0 : q7;
bool c = !__clc_isnan(y) && __clc_isfinite(x);
ret = c ? ret : __clc_nan(0);
q7 = c ? q7 : 0;

Copilot uses AI. Check for mistakes.
int qsgn = 1 + (((__clc_as_int(x) ^ __clc_as_int(y)) >> 31) << 1);
float t = __clc_fma(y, -(float)qsgn, x);
ret = c ? t
: (__builtin_isfpclass(__builtin_canonicalizef(0x1p-149f), 0x0040)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should definitely not be inlining the DAZ_OPT hack. Either preserve it, or unconditionally canonicalize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, changed to unconditionally canonicalize

@llvmbot llvmbot added the libclc libclc OpenCL library label Jan 21, 2026
wenju-he and others added 3 commits January 21, 2026 11:20
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@wenju-he wenju-he requested a review from arsenm January 21, 2026 10:37
@github-actions
Copy link

github-actions bot commented Jan 21, 2026

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff origin/main HEAD --extensions cl,inc -- libclc/clc/lib/generic/math/clc_remquo.cl libclc/clc/lib/generic/math/clc_remquo.inc --diff_from_common_commit

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.
diff --git a/libclc/clc/lib/generic/math/clc_remquo.inc b/libclc/clc/lib/generic/math/clc_remquo.inc
index 79eef077b..836dc703c 100644
--- a/libclc/clc/lib/generic/math/clc_remquo.inc
+++ b/libclc/clc/lib/generic/math/clc_remquo.inc
@@ -66,7 +66,7 @@ _CLC_DEF _CLC_OVERLOAD float __clc_remquo(float x, float y,
   } else {
     ret = x;
     q7 = 0;
-    bool c = (ay < 0x1.0p+127f & 2.0f * ax > ay) | (ax > 0.5f * ay);
+    bool c = (ay<0x1.0p+127f & 2.0f * ax> ay) | (ax > 0.5f * ay);
 
     int qsgn = 1 + (((__clc_as_int(x) ^ __clc_as_int(y)) >> 31) << 1);
     float t = __clc_fma(y, -(float)qsgn, x);

@wenju-he wenju-he requested a review from arsenm January 22, 2026 02:01
@wenju-he wenju-he merged commit 20c15c7 into llvm:main Jan 26, 2026
10 of 11 checks passed
@wenju-he wenju-he deleted the remquo-use-amdgpu-ocml-remquo-float branch January 26, 2026 00:11
Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Jan 29, 2026
Current implementation has two issues:
* unconditionally soft flushes denormal.
* can't pass OpenCL CTS test "test_bruteforce remquo" on intel gpu.

This PR upstreams remquo implementation from
https://github.com/ROCm/llvm-project/tree/amd-staging/amd/device-libs/ocml/src/remainderF_base.h
It supports denormal and can pass OpenCL CTS test. Number of LLVM IR
instructions of function _Z6remquoffPU3AS5i increased from 96 to 680.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
sshrestha-aa pushed a commit to sshrestha-aa/llvm-project that referenced this pull request Feb 4, 2026
Current implementation has two issues:
* unconditionally soft flushes denormal.
* can't pass OpenCL CTS test "test_bruteforce remquo" on intel gpu.

This PR upstreams remquo implementation from
https://github.com/ROCm/llvm-project/tree/amd-staging/amd/device-libs/ocml/src/remainderF_base.h
It supports denormal and can pass OpenCL CTS test. Number of LLVM IR
instructions of function _Z6remquoffPU3AS5i increased from 96 to 680.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@kwk
Copy link
Contributor

kwk commented Feb 11, 2026

I ran into this issue.

cd /home/fedora/src/llvm-project/main/build/runtimes/runtimes-bins/libclc && /home/fedora/src/llvm-project/main/build/bin/clang-23 -c --target=spirv32-- -x ir -o /home/fedora/src/llvm-project/main/build/./lib/clang/23/lib/spirv32--/libclc.spv /home/fedora/src/llvm-project/main/build/runtimes/runtimes-bins/libclc/obj.libclc.dir/spirv-mesa3d-/builtins.link.spirv-mesa3d-.bc
fatal error: error in backend: unable to legalize instruction: %88:fid(s32) = G_FCANONICALIZE %87:fid (in function: _Z12__clc_remquoffPi)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: /home/fedora/src/llvm-project/main/build/bin/clang-23 -c --target=spirv32-- -x ir -o /home/fedora/src/llvm-project/main/build/./lib/clang/23/lib/spirv32--/libclc.spv /home/fedora/src/llvm-project/main/build/runtimes/runtimes-bins/libclc/obj.libclc.dir/spirv-mesa3d-/builtins.link.spirv-mesa3d-.bc
1.      Code generation
2.      Running pass 'Function Pass Manager' on module '/home/fedora/src/llvm-project/main/build/runtimes/runtimes-bins/libclc/obj.libclc.dir/spirv-mesa3d-/builtins.link.spirv-mesa3d-.bc'.
3.      Running pass 'Legalizer' on function '@_Z12__clc_remquoffPi'

ret = c ? t : __builtin_elementwise_canonicalize(x); was added in this PR.

@jhuber6
Copy link
Contributor

jhuber6 commented Feb 12, 2026

I ran into this issue.

cd /home/fedora/src/llvm-project/main/build/runtimes/runtimes-bins/libclc && /home/fedora/src/llvm-project/main/build/bin/clang-23 -c --target=spirv32-- -x ir -o /home/fedora/src/llvm-project/main/build/./lib/clang/23/lib/spirv32--/libclc.spv /home/fedora/src/llvm-project/main/build/runtimes/runtimes-bins/libclc/obj.libclc.dir/spirv-mesa3d-/builtins.link.spirv-mesa3d-.bc
fatal error: error in backend: unable to legalize instruction: %88:fid(s32) = G_FCANONICALIZE %87:fid (in function: _Z12__clc_remquoffPi)
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: /home/fedora/src/llvm-project/main/build/bin/clang-23 -c --target=spirv32-- -x ir -o /home/fedora/src/llvm-project/main/build/./lib/clang/23/lib/spirv32--/libclc.spv /home/fedora/src/llvm-project/main/build/runtimes/runtimes-bins/libclc/obj.libclc.dir/spirv-mesa3d-/builtins.link.spirv-mesa3d-.bc
1.      Code generation
2.      Running pass 'Function Pass Manager' on module '/home/fedora/src/llvm-project/main/build/runtimes/runtimes-bins/libclc/obj.libclc.dir/spirv-mesa3d-/builtins.link.spirv-mesa3d-.bc'.
3.      Running pass 'Legalizer' on function '@_Z12__clc_remquoffPi'

ret = c ? t : __builtin_elementwise_canonicalize(x); was added in this PR.

Either tell the SPIR-V backend people to support the canonicalize node or put #ifdef __SPIRV__ around this usage.

@arsenm
Copy link
Contributor

arsenm commented Feb 12, 2026

SPIRV must implement canonicalize

@tstellar
Copy link
Collaborator

SPIRV must implement canonicalize

So should we revert this until that happens?

@wenju-he
Copy link
Contributor Author

SPIRV must implement canonicalize

So should we revert this until that happens?

revert in #181443

I discussed the issue with Ben Ashbaugh. We can propose a SPIR-V extension to add canonicalize instruction to SPIR-V. What do you think?

wenju-he added a commit that referenced this pull request Feb 14, 2026
…181443)

Reverts #177131
It broke SPIRV target: error in backend: unable to legalize instruction:
%88:fid(s32) = G_FCANONICALIZE
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 14, 2026
…entation" (#181443)

Reverts llvm/llvm-project#177131
It broke SPIRV target: error in backend: unable to legalize instruction:
%88:fid(s32) = G_FCANONICALIZE
manasij7479 pushed a commit to manasij7479/llvm-project that referenced this pull request Feb 18, 2026
…lvm#181443)

Reverts llvm#177131
It broke SPIRV target: error in backend: unable to legalize instruction:
%88:fid(s32) = G_FCANONICALIZE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

libclc libclc OpenCL library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants