[libclc] Refine __clc_fp*_subnormals_supported by wenju-he · Pull Request #157633 · llvm/llvm-project

wenju-he · 2025-09-09T09:16:15Z

Remove the dependency on the libclc build-time configuration for __clc_fp*_subnormals_supported. The check is now implemented with LLVM intrinsics so it can be resolved during target lowering or at runtime.
-fdenormal-fp-math=dynamic build flag is required to defer denormal handling.
Remove cmake option ENABLE_RUNTIME_SUBNORMAL and related code.

Resolves #153148

…al_if_not_supported Remove the dependency on the libclc build-time configuration for __clc_fp*_subnormals_supported. The check is now implemented with LLVM intrinsics so it can be resolved during target lowering or at runtime. Improve __clc_flush_denormal_if_not_supported implementation as well. It doesn't use __clc_fp*_subnormals_supported which canonicalizes sNaN and thus the new implementation is more foldable. Remove cmake option ENABLE_RUNTIME_SUBNORMAL and related code. Resolves llvm#153148 Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>

Copilot

Pull Request Overview

This PR refactors the subnormal support detection and handling in libclc by replacing build-time configuration with runtime intrinsic-based checks. The implementation now uses LLVM's __builtin_canonicalizef and __builtin_isfpclass intrinsics to determine subnormal support at runtime rather than relying on cmake configuration options.

Key changes:

Replaces build-time subnormal configuration with runtime intrinsic-based detection
Removes cmake option ENABLE_RUNTIME_SUBNORMAL and related build logic
Improves __clc_flush_denormal_if_not_supported implementation to be more optimizable

Reviewed Changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`libclc/clc/lib/generic/math/clc_subnormal_config.cl`	New implementation using LLVM intrinsics for runtime subnormal detection
`libclc/clc/include/clc/math/clc_subnormal_config.h`	Removes unused function declaration
`libclc/clc/include/clc/math/math.h`	Refactors flush denormal function and removes header dependency
Multiple math files	Removes unnecessary include of subnormal config header
Multiple SOURCES files	Removes subnormal config files from build lists
`libclc/CMakeLists.txt`	Removes cmake option and related build configuration

libclc/clc/lib/generic/math/clc_subnormal_config.cl

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

arsenm

Is the build using -Xclang -fdenormal-fp-math-f32=dynamic already? If not it should do that globally

wenju-he · 2025-09-09T10:55:26Z

Is the build using -Xclang -fdenormal-fp-math-f32=dynamic already? If not it should do that globally

Thanks, I forgot that part. Done in 4608f77

arsenm · 2025-09-09T11:08:58Z

libclc/clc/include/clc/math/math.h

+  // Avoid calling __clc_fp32_subnormals_supported here: it uses
+  // llvm.canonicalize, which quiets sNaN.
+  return __builtin_fabsf(x) < 0x1p-149f
+             ? __builtin_elementwise_copysign(0.0f, x)
+             : x;


This function no longer matches the name or behavior. This is an unconditional flush to zero

If denormal is not supported

returns signed zero for denormal input since __builtin_fabsf result is 0

return x if x is not denormal

If denormal is supported

returns x as is regardless if x is denormal or not.

So this function only flushes to zero if denormal is not supported, right?

There's no support check here. __builtin_fabsf(x) < 0x1p-149f will be true for 0 or a denormal value, regardless of whether the fcmp flushes the input

Also you can use __builtin_elementwise_abs to be consistently using elementwise builtins

There's no support check here. __builtin_fabsf(x) < 0x1p-149f will be true for 0 or a denormal value, regardless of whether the fcmp flushes the input

Also you can use __builtin_elementwise_abs to be consistently using elementwise builtins

thanks @arsenm. Now I see what you mean. Renamed __clc_flush_denormal_if_not_supported to __clc_soft_flush_denormal in 3f665ce and use __builtin_elementwise_abs

Renamed __clc_flush_denormal_if_not_supported to __clc_soft_flush_denormal in 3f665ce and use __builtin_elementwise_abs

kindly ping @arsenm for review.

The name is now right, but the usage is not. You do not want to unconditionally force a denormal flush, pretty much anywhere.

I also believe most of the explicit flushes / canonicalizes in the math function implementations can be deleted with some shuffling around. I removed most of these in the rocm-device-libs implementations a few years ago. A few more remain but I believe they can also be avoided

arsenm · 2025-09-24T04:08:07Z

libclc/clc/lib/clspv/math/clc_sw_fma.cl

+  a = __clc_soft_flush_denormal(a);
+  b = __clc_soft_flush_denormal(b);
+  c = __clc_soft_flush_denormal(c);


Unconditionally forcing flush of denormals is not desirable. In this context I'm not sure why it's trying to flush in the first place.

The below code extracting the exponent can be replaced with frexp, and the return c on the above paths is missing a canonicalize.

But on a deeper level I don't think libclc should be trying to provide a software FMA implementation in the first place; that's a decision for the compiler when codegening llvm.fma, surely compiler-rt already has an implementation?

Unconditionally forcing flush of denormals is not desirable. In this context I'm not sure why it's trying to flush in the first place.

The below code extracting the exponent can be replaced with frexp, and the return c on the above paths is missing a canonicalize.

But on a deeper level I don't think libclc should be trying to provide a software FMA implementation in the first place; that's a decision for the compiler when codegening llvm.fma, surely compiler-rt already has an implementation?

Deleted clc_sw_fma in 7b290a2

Now clc_fma is implemented with __builtin_elementwise_fma

surely compiler-rt already has an implementation?

It doesn't. LLVM libc has one but it uses FP64, so I don't think it is of much help. I'd expect most targets that don't have hardware fma don't have fp64 either.

I think dropping sw fma would impact:

SPIR-V, which then starts generating the GLS.std.450 extended instruction FMA. The problem there is that instruction is (AFAICT) allowed to round intermediate products, but the OpenCL spec doesn't allow that. I'm not sure if drivers actually implement it as fused or not.
Arguably the lowering @llvm.fma to this instruction is bug in LLVM as @llvm.fma is specified to be fused without fast math flags.

Not all old R600 targets have FMA, I think this change would be breaking them. These are >10 years old GPUs at this point though.

I think r600 always had FMA, it's just not "fast" on all of them. In any case, the backed is obligated to implement llvm.fma correctly

SPIR-V don't need __clc_sw_fma anymore if SPV_KHR_fma extension is enabled. The extension is already implemented in llvm-spirv and soon will be implemented in llvm SPIR-V backend: #173057

libclc/CMakeLists.txt

arsenm

I think dropping the soft FMA is beyond the scope of this patch, but it is something I think should be done

arsenm · 2025-10-03T09:59:53Z

libclc/clc/lib/generic/math/clc_subnormal_config.cl

+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+_CLC_DEF bool __clc_fp64_subnormals_supported() {
+#ifdef CLC_SPIRV
+  // SPIR-V doesn't support llvm.canonicalize for now.


Can you just fix that instead of special casing it here? It's not difficult to implement

Can you just fix that instead of special casing it here? It's not difficult to implement

done in ccf7a6e

arsenm · 2025-10-03T10:01:03Z

libclc/clc/lib/generic/math/clc_remquo.inc

+  x = __clc_soft_flush_denormal(x);
+  y = __clc_soft_flush_denormal(y);


I don't think this is necessary. In the rocm-device-libs version of this, I managed to delete the explicit canonicalizes

https://github.com/ROCm/llvm-project/blob/0e9e3946cb257d1ed7b119333db451805865b36b/amd/device-libs/ocml/src/remainderF_base.h#L47

See this series of patches:
ROCm@b3beb93
ROCm@9a7bc19
ROCm@e9198f7

Should just copy what these did

https://github.com/ROCm/llvm-project/blob/0e9e3946cb257d1ed7b119333db451805865b36b/amd/device-libs/ocml/src/remainderF_base.h#L47

See this series of patches: ROCm@b3beb93 ROCm@9a7bc19 ROCm@e9198f7

Should just copy what these did

I have tried to port both libclc __clc_remquo and ocml remquo2 to replace intel gpu implementation at https://github.com/intel/intel-graphics-compiler/blob/fc97dc482697b320667a52914f1225556f0856e8/IGC/BiFModule/Implementation/Math/remquo.cl#L12-L104, however, the ported code can't pass OpenCL CTS test ./test_bruteforce remquo on intel gpu.
Can I copy intel gpu implementation to overwrite libclc __clc_remquo?

arsenm · 2025-10-03T10:02:14Z

libclc/clc/include/clc/math/math.h

+  // Avoid calling __clc_fp32_subnormals_supported here: it uses
+  // llvm.canonicalize, which quiets sNaN.
+  return __builtin_fabsf(x) < 0x1p-149f
+             ? __builtin_elementwise_copysign(0.0f, x)
+             : x;


The name is now right, but the usage is not. You do not want to unconditionally force a denormal flush, pretty much anywhere.

I also believe most of the explicit flushes / canonicalizes in the math function implementations can be deleted with some shuffling around. I removed most of these in the rocm-device-libs implementations a few years ago. A few more remain but I believe they can also be avoided

This reverts commit 7b290a2.

…ng llvm.canonicalize

arsenm · 2025-10-10T09:09:33Z

libclc/clc/include/clc/math/math.h

-  }
-  return x;
+_CLC_OVERLOAD _CLC_INLINE float __clc_soft_flush_denormal(float x) {
+  // Avoid calling __clc_fp32_subnormals_supported here: it uses


You might have less trouble just using canonicalize for now and trying to relax it later

You might have less trouble just using canonicalize for now and trying to relax it later

do you mean reverting __clc_soft_flush_denormal to use __clc_fp32_subnormals_supported which uses llvm.canonicalize, or just replacing use of __clc_fp32_subnormals_supported with __builtin_elementwise_canonicalize?

Restored function __clc_flush_denormal_if_not_supported and its original implementation in 23d0ff7 and 34cd062

…supported

github-actions · 2026-01-26T04:51:19Z

✅ With the latest revision this PR passed the C/C++ code formatter.

wenju-he · 2026-01-26T04:52:29Z

I think dropping the soft FMA is beyond the scope of this patch, but it is something I think should be done

restored clc_sw_fma in b61e32b

Copilot

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

libclc/clc/lib/generic/SOURCES

libclc/clc/lib/generic/math/clc_subnormal_config.cl

arsenm · 2026-01-26T14:16:23Z

libclc/clc/include/clc/math/math.h

+static _CLC_INLINE float __clc_flush_denormal_if_not_supported(float x) {
  int ix = __clc_as_int(x);
  if (!__clc_fp32_subnormals_supported() && ((ix & EXPBITS_SP32) == 0) &&
      ((ix & MANTBITS_SP32) != 0)) {
    ix &= SIGNBIT_SP32;
    x = __clc_as_float(ix);
  }
  return x;
 }


This shouldn't do all of this bithacking. This is canonicalize, which at worst costs an extra instruction you might not have needed. With the pattern from #172998, you can have this conditional flush that doesn't have the cost of signaling nan quieting and can fold away without DAZ

changed to use snan check and __builtin_elementwise_canonicalize in 5eb6990.
Please review if it meets the expectation.

arsenm · 2026-01-26T14:16:49Z

libclc/clc/lib/generic/math/clc_subnormal_config.cl

+  // SPIR-V doesn't support llvm.canonicalize. Synthesize a subnormal by halving
+  // the smallest normal. If subnormals are not supported it will flush to +0.
+  float smallest_normal = 0x1p-126f;
+  float sub =


Don't need multiply, and I think this logic should be inverted. Denormal support is the base case, DAZ is an aberration. i.e., this is what I did in device-libs is about
is_daz() { return __builtin_isfpclass(__builtin_canonicalizef(0x1p-149f), __FPCLASS_POSZERO) }

arsenm · 2026-01-26T14:19:41Z

libclc/clc/lib/generic/math/clc_subnormal_config.cl

+#ifdef cl_khr_fp64
+#pragma OPENCL EXTENSION cl_khr_fp64 : enable
+_CLC_DEF bool __clc_fp64_subnormals_supported() {
+  // SPIR-V doesn't support llvm.canonicalize. Synthesize a subnormal by halving


Fix SPIRV, this is not a reasonable workaround. This is not a difficult to implement intrinsic

changed SPIRV path to use __clc_fabs(0x1p-149f) in 7ff4dcf

I've made an attempt to add the lowering in #178439

…calize otherwise

…nicalize

This PR is extracted from llvm#157633. `-fdenormal-fp-math=dynamic` is required to defer denormal handling and should be used for libclc library compilation. Additionally, if the default ieee value is incompatible with the user code's denormal-fp-math setting, this mismatch prevents libclc functions from being inlined.

This PR is extracted from #157633. `-fdenormal-fp-math=dynamic` is required to defer denormal handling and should be used for libclc library compilation. Additionally, if the default ieee value is incompatible with the user code's denormal-fp-math setting, this mismatch prevents libclc functions from being inlined.

wenju-he requested review from Copilot and frasercrmck September 9, 2025 09:16

llvmbot added the libclc libclc OpenCL library label Sep 9, 2025

wenju-he requested a review from arsenm September 9, 2025 09:16

Copilot AI reviewed Sep 9, 2025

View reviewed changes

libclc/clc/lib/generic/math/clc_subnormal_config.cl Outdated Show resolved Hide resolved

libclc/clc/lib/generic/math/clc_subnormal_config.cl Outdated Show resolved Hide resolved

Apply suggestions from code review

96ec9dc

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

arsenm reviewed Sep 9, 2025

View reviewed changes

wenju-he added 2 commits September 9, 2025 12:25

use __builtin_elementwise_canonicalize

d52fcdb

set -fdenormal-fp-math-f32=dynamic build flag globally

4608f77

wenju-he requested a review from arsenm September 9, 2025 10:55

arsenm reviewed Sep 9, 2025

View reviewed changes

rename __clc_flush_denormal_if_not_supportedto __clc_soft_flush_denormal

3f665ce

wenju-he requested a review from arsenm September 10, 2025 23:03

arsenm reviewed Sep 24, 2025

View reviewed changes

Maetveis reviewed Oct 1, 2025

View reviewed changes

libclc/CMakeLists.txt Outdated Show resolved Hide resolved

wenju-he added 2 commits October 3, 2025 07:46

delete clc_sw_fma

7b290a2

-fdenormal-fp-math-f32 -> -fdenormal-fp-math

3da9705

wenju-he requested review from Maetveis and arsenm October 3, 2025 06:03

Maetveis reviewed Oct 3, 2025

View reviewed changes

libclc/CMakeLists.txt Outdated Show resolved Hide resolved

remove -Xclang before -fdenormal-fp-math=dynamic

7d21a1a

wenju-he requested a review from Maetveis October 3, 2025 07:05

arsenm reviewed Oct 3, 2025

View reviewed changes

wenju-he added 2 commits October 7, 2025 04:49

Revert "delete clc_sw_fma"

b61e32b

This reverts commit 7b290a2.

support SPIR-V: implement __clc_fp32_subnormals_supported without usi…

ccf7a6e

…ng llvm.canonicalize

arsenm reviewed Oct 10, 2025

View reviewed changes

wenju-he mentioned this pull request Jan 5, 2026

[libclc] Initial support for cross-compiling OpenCL libraries #174022

Merged

wenju-he added 4 commits January 26, 2026 05:36

revert change to __clc_soft_flush_denormal

23d0ff7

Merge branch 'main' into refine-__clc_fp32_subnormals_supported

ddb79ae

Merge branch 'main' into refine-__clc_fp32_subnormals_supported

3e2ac98

revert name __clc_soft_flush_denormal to __clc_flush_denormal_if_not_…

34cd062

…supported

clang-format clc_subnormal_config.cl

b14455b

wenju-he requested a review from Copilot January 26, 2026 04:59

Copilot AI reviewed Jan 26, 2026

View reviewed changes

libclc/clc/lib/generic/SOURCES Show resolved Hide resolved

libclc/clc/lib/generic/math/clc_subnormal_config.cl Outdated Show resolved Hide resolved

wenju-he changed the title ~~[libclc] Refine __clc_fp*_subnormals_supported and __clc_flush_denormal_if_not_supported~~ [libclc] Refine __clc_fp*_subnormals_supported Jan 26, 2026

update comment: Should be -> Expected

7757ba0

wenju-he requested a review from arsenm January 26, 2026 05:03

fix build: include clc_subnormal_config.h

fe2d4f6

arsenm reviewed Jan 26, 2026

View reviewed changes

wenju-he added 2 commits January 27, 2026 12:29

__clc_fp32_subnormals_supported: use __clc_fabs for SPIR-V and canoni…

7ff4dcf

…calize otherwise

__clc_flush_denormal_if_not_supported: use __builtin_elementwise_cano…

5eb6990

…nicalize

wenju-he requested a review from arsenm January 27, 2026 11:32

wenju-he mentioned this pull request Feb 25, 2026

[libclc] Compile with -fdenormal-fp-math=dynamic #183262

Merged

		x = __clc_soft_flush_denormal(x);
		y = __clc_soft_flush_denormal(y);

Conversation

wenju-he commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

arsenm left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wenju-he commented Sep 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Maetveis Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wenju-he Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wenju-he commented Jan 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

wenju-he commented Sep 9, 2025 •

edited

Loading

arsenm left a comment •

edited

Loading

Maetveis Oct 3, 2025 •

edited

Loading

wenju-he Oct 7, 2025 •

edited

Loading

github-actions bot commented Jan 26, 2026 •

edited

Loading

Maetveis Jan 28, 2026 •

edited

Loading