Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable FTZ/DAZ when compiling shared libraries by default. #80475

Merged
merged 5 commits into from
Apr 25, 2024

Conversation

jcranmer-intel
Copy link
Contributor

@jcranmer-intel jcranmer-intel commented Feb 2, 2024

This fixes #57589, and aligns Clang with the behavior of current versions of gcc. There is a new option, -mdaz-ftz, to control the linking of the file that sets FTZ/DAZ on startup, and this flag is on by default if -ffast-math is present and -shared isn't.

This also partially reverts fa7cd54 in that it disables the attempt to set the IR denormal-fp-math attribute based on whether or not -ffast-math is applied as it is insufficiently reliable.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Feb 2, 2024
@llvmbot
Copy link
Member

llvmbot commented Feb 2, 2024

@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang

Author: Joshua Cranmer (jcranmer-intel)

Changes

This fixes #57589, and aligns Clang with the behavior of current versions of gcc. There is a new option, -mdaz-ftz, to control the linking of the file that sets FTZ/DAZ on startup, and this flag is on by default if -ffast-math is present and -shared isn't.


Full diff: https://github.com/llvm/llvm-project/pull/80475.diff

5 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+5)
  • (modified) clang/docs/UsersManual.rst (+10-5)
  • (modified) clang/include/clang/Driver/Options.td (+5)
  • (modified) clang/lib/Driver/ToolChain.cpp (+13-2)
  • (modified) clang/test/Driver/linux-ld.c (+34)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index cd8a82f281f52..904e292a2eb4b 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -117,6 +117,11 @@ C23 Feature Support
 Non-comprehensive list of changes in this release
 -------------------------------------------------
 
+* Code compiled with ``-shared`` and ``-ffast-math`` will no longer enable
+  flush-to-zero floating-point mode by default. This decision can be overridden
+  with use of ``-mdaz-ftz``. This behavior now matches GCC's behavior.
+  (`#57589 <https://github.com/llvm/llvm-project/issues/57589>`_)
+
 New Compiler Flags
 ------------------
 
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index 7391e4cf3a9ae..257af3c8d4d39 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -1506,7 +1506,8 @@ floating point semantic models: precise (the default), strict, and fast.
 
    * ``-ffp-contract=fast``
 
-   Note: ``-ffast-math`` causes ``crtfastmath.o`` to be linked with code. See
+   Note: ``-ffast-math`` causes ``crtfastmath.o`` to be linked with code unless
+   ``-shared`` or ``-mno-daz-ftz`` is present. See
    :ref:`crtfastmath.o` for more details.
 
 .. option:: -fno-fast-math
@@ -1560,7 +1561,8 @@ floating point semantic models: precise (the default), strict, and fast.
      ``-ffp-contract``.
 
    Note: ``-fno-fast-math`` implies ``-fdenormal-fp-math=ieee``.
-   ``-fno-fast-math`` causes ``crtfastmath.o`` to not be linked with code.
+   ``-fno-fast-math`` causes ``crtfastmath.o`` to not be linked with code
+   unless ``-mdaz-ftz`` is present.
 
 .. option:: -fdenormal-fp-math=<value>
 
@@ -1907,10 +1909,13 @@ by using ``#pragma STDC FENV_ROUND`` with a value other than ``FE_DYNAMIC``.
 
 A note about ``crtfastmath.o``
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-``-ffast-math`` and ``-funsafe-math-optimizations`` cause ``crtfastmath.o`` to be
-automatically linked,  which adds a static constructor that sets the FTZ/DAZ
+``-ffast-math`` and ``-funsafe-math-optimizations`` without the ``-shared``
+option cause ``crtfastmath.o`` to be
+automatically linked, which adds a static constructor that sets the FTZ/DAZ
 bits in MXCSR, affecting not only the current compilation unit but all static
-and shared libraries included in the program.
+and shared libraries included in the program. This decision can be overridden
+by using either the flag ``-mdaz-ftz`` or ``-mno-daz-ftz`` to respectively
+link or not link ``crtfastmath.o``.
 
 .. _FLT_EVAL_METHOD:
 
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 73071a6648541..68f806d46c9e0 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -2554,6 +2554,11 @@ defm protect_parens : BoolFOption<"protect-parens",
           "floating-point expressions are evaluated">,
   NegFlag<SetFalse>>;
 
+defm daz_ftz : SimpleMFlag<"daz-ftz",
+  "Globally set", "Do not globally set",
+  " the denormals-are-zero (DAZ) and flush-to-zero (FTZ) bits in the "
+  "floating-point control register on program startup.">;
+
 def ffor_scope : Flag<["-"], "ffor-scope">, Group<f_Group>;
 def fno_for_scope : Flag<["-"], "fno-for-scope">, Group<f_Group>;
 
diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp
index 388030592b483..5c003e153fa52 100644
--- a/clang/lib/Driver/ToolChain.cpp
+++ b/clang/lib/Driver/ToolChain.cpp
@@ -1271,9 +1271,14 @@ void ToolChain::AddCCKextLibArgs(const ArgList &Args,
 
 bool ToolChain::isFastMathRuntimeAvailable(const ArgList &Args,
                                            std::string &Path) const {
+  // Don't implicitly link in mode-changing libraries in a shared library, since
+  // this can have very deleterious effects. See the various links from
+  // https://github.com/llvm/llvm-project/issues/57589 for more information.
+  bool Default = !Args.hasArg(options::OPT_shared);
+
   // Do not check for -fno-fast-math or -fno-unsafe-math when -Ofast passed
   // (to keep the linker options consistent with gcc and clang itself).
-  if (!isOptimizationLevelFast(Args)) {
+  if (Default && !isOptimizationLevelFast(Args)) {
     // Check if -ffast-math or -funsafe-math.
     Arg *A =
       Args.getLastArg(options::OPT_ffast_math, options::OPT_fno_fast_math,
@@ -1282,8 +1287,14 @@ bool ToolChain::isFastMathRuntimeAvailable(const ArgList &Args,
 
     if (!A || A->getOption().getID() == options::OPT_fno_fast_math ||
         A->getOption().getID() == options::OPT_fno_unsafe_math_optimizations)
-      return false;
+      Default = false;
   }
+
+  // Whatever decision came as a result of the above implicit settings, either
+  // -mdaz-ftz or -mno-daz-ftz is capable of overriding it.
+  if (!Args.hasFlag(options::OPT_mdaz_ftz, options::OPT_mno_daz_ftz, Default))
+    return false;
+
   // If crtfastmath.o exists add it to the arguments.
   Path = GetFilePath("crtfastmath.o");
   return (Path != "crtfastmath.o"); // Not found.
diff --git a/clang/test/Driver/linux-ld.c b/clang/test/Driver/linux-ld.c
index b8efd64cd91f0..137d84cca883a 100644
--- a/clang/test/Driver/linux-ld.c
+++ b/clang/test/Driver/linux-ld.c
@@ -1569,6 +1569,40 @@
 // RUN:        --gcc-toolchain="" \
 // RUN:        --sysroot=%S/Inputs/basic_linux_tree 2>&1 \
 // RUN:   | FileCheck --check-prefix=CHECK-NOCRTFASTMATH %s
+// Don't link crtfastmath.o with -shared
+// RUN: %clang --target=x86_64-unknown-linux -no-pie -### %s -ffast-math -shared \
+// RUN:        --gcc-toolchain="" \
+// RUN:        --sysroot=%S/Inputs/basic_linux_tree 2>&1 \
+// RUN:   | FileCheck --check-prefix=CHECK-NOCRTFASTMATH %s
+// RUN: %clang --target=x86_64-unknown-linux -no-pie -### %s -Ofast -shared \
+// RUN:        --gcc-toolchain="" \
+// RUN:        --sysroot=%S/Inputs/basic_linux_tree 2>&1 \
+// RUN:   | FileCheck --check-prefix=CHECK-NOCRTFASTMATH %s
+// Check for effects of -mdaz-ftz
+// RUN: %clang --target=x86_64-unknown-linux -no-pie -### %s -ffast-math -shared -mdaz-ftz \
+// RUN:        --gcc-toolchain="" \
+// RUN:        --sysroot=%S/Inputs/basic_linux_tree 2>&1 \
+// RUN:   | FileCheck --check-prefix=CHECK-CRTFASTMATH %s
+// RUN: %clang --target=x86_64-unknown-linux -no-pie -### %s -ffast-math -mdaz-ftz \
+// RUN:        --gcc-toolchain="" \
+// RUN:        --sysroot=%S/Inputs/basic_linux_tree 2>&1 \
+// RUN:   | FileCheck --check-prefix=CHECK-CRTFASTMATH %s
+// RUN: %clang --target=x86_64-unknown-linux -no-pie -### %s -mdaz-ftz \
+// RUN:        --gcc-toolchain="" \
+// RUN:        --sysroot=%S/Inputs/basic_linux_tree 2>&1 \
+// RUN:   | FileCheck --check-prefix=CHECK-CRTFASTMATH %s
+// RUN: %clang --target=x86_64-unknown-linux -no-pie -### %s -ffast-math -shared -mno-daz-ftz \
+// RUN:        --gcc-toolchain="" \
+// RUN:        --sysroot=%S/Inputs/basic_linux_tree 2>&1 \
+// RUN:   | FileCheck --check-prefix=CHECK-NOCRTFASTMATH %s
+// RUN: %clang --target=x86_64-unknown-linux -no-pie -### %s -ffast-math -mno-daz-ftz \
+// RUN:        --gcc-toolchain="" \
+// RUN:        --sysroot=%S/Inputs/basic_linux_tree 2>&1 \
+// RUN:   | FileCheck --check-prefix=CHECK-NOCRTFASTMATH %s
+// RUN: %clang --target=x86_64-unknown-linux -no-pie -### %s -mno-daz-ftz \
+// RUN:        --gcc-toolchain="" \
+// RUN:        --sysroot=%S/Inputs/basic_linux_tree 2>&1 \
+// RUN:   | FileCheck --check-prefix=CHECK-NOCRTFASTMATH %s
 // CHECK-CRTFASTMATH: usr/lib/gcc/x86_64-unknown-linux/10.2.0{{/|\\\\}}crtfastmath.o
 // CHECK-NOCRTFASTMATH-NOT: crtfastmath.o
 

@jyknight
Copy link
Member

jyknight commented Feb 2, 2024

I think there is a bit of a problematic interaction with getDenormalModeForType here. "-shared" is (should be) a flag used only for linking, but that function is calling isFastMathRuntimeAvailable to affect the default denormal math mode for compilation. That's not going to work.

Probably we should effectively revert the x86-linux parts of fa7cd54.

I wonder if, instead, we should just have -ffast-math always downgrade -fdenormal-fp-math=ieee to -fdenormal-fp-math=preserve-sign, under the rationale of "you asked for fast math, and preserve-sign mode might let the compiler generate faster code"?

@arsenm
Copy link
Contributor

arsenm commented Feb 5, 2024

I wonder if, instead, we should just have -ffast-math always downgrade -fdenormal-fp-math=ieee to -fdenormal-fp-math=preserve-sign, under the rationale of "you asked for fast math, and preserve-sign mode might let the compiler generate faster code"?

This could also use denormal-fp-math=dynamic. It is not always safe to run preserve-sign code under IEEE settings

@jyknight
Copy link
Member

jyknight commented Feb 5, 2024

It is not always safe to run preserve-sign code under IEEE settings

I can see that this is used in a bunch of optimization/constant-folding passes, but I don't have a feel for what the actual impact is:

  1. Which value allows generating the "fastest" math code -- disregarding correctness? I'd assume that "dynamic" is least optimizable, "ieee" in the middle, and "preserve-sign" is likely to generate the "fastest" code?
  2. What is the likely actual impact of choosing the wrong value? Ie, what does "it is not always safe" mean? Is this wrong beyond what is usually expected from -ffast-math optimizations?

I note that we're frequently getting it wrong in the other direction today, as there are many users who link in code compiled without -ffast-math (and thus, with -fdenormal-fp-math=ieee) into a binary compiled with -ffast-math -- which sets the fp control words to enable MXCSR_DAZ | MXCSR_FTZ. In addition, I think there's even more users who compile some TUs with -ffast-math, and link that into a binary compiled without.

@arsenm
Copy link
Contributor

arsenm commented Feb 5, 2024

  • Which value allows generating the "fastest" math code -- disregarding correctness? I'd assume that "dynamic" is least optimizable, "ieee" in the middle, and "preserve-sign" is likely to generate the "fastest" code?

This depends on the target and operations. For some functions, on modern hardware, the preserve-sign code will be worse for AMDGPU. In other cases it's better

What is the likely actual impact of choosing the wrong value? Ie, what does "it is not always safe" mean? Is this wrong beyond what is usually expected from -ffast-math optimizations?

The main example is the conversions performed between fcmp with 0 and llvm.is.fpclass. You'll get different results depending on whether the input is implicitly flushed in fcmp vs. not in the is.fpclass

@jcranmer-intel
Copy link
Contributor Author

I think there is a bit of a problematic interaction with getDenormalModeForType here. "-shared" is (should be) a flag used only for linking, but that function is calling isFastMathRuntimeAvailable to affect the default denormal math mode for compilation. That's not going to work.

The logic is already somewhat broken already: if -ffast-math isn't being added into the link line, or we're a shared library linked into a -ffast-math executable, we're not going to get the right value of denormal math mode. So it's a matter of which behavior is going to be the least likely to be incorrect in practice.

I wonder if, instead, we should just have -ffast-math always downgrade -fdenormal-fp-math=ieee to -fdenormal-fp-math=preserve-sign, under the rationale of "you asked for fast math, and preserve-sign mode might let the compiler generate faster code"?

One of the issues is that we have optimizations that are going to make assumptions about how denormals work in operations (particularly around being able to lower llvm.is.fpclass to fcmp). So you risk misoptimization if the value is wrong. Arguably, the only safe value is dynamic (compiler doesn't know what it's set to).

(We're wrong in another direction, which is that we don't attempt to set the flag to non-ieee for anything other x86. I tried looking up how the equivalents of FTZ/DAZ worked on other architectures and came to with the conclusion that the whole thing is a bundle of sadness that should have never been invented.)

@jyknight
Copy link
Member

jyknight commented Feb 5, 2024

You'll get different results depending on whether the input is implicitly flushed in fcmp vs. not in the is.fpclass

This sounds intuitively like the sort of semantics-breaking optimization which is expected from -ffast-math. If the only issues are things like getting a slightly-wrong answer from is.fpclass if you somehow pass a denormal to it, when you've disabled denormal handling...that just doesn't seem so bad.

Arguably, the only safe value is dynamic (compiler doesn't know what it's set to).

Yes, we could change the default to "dynamic" for all builds (except on platforms where FTZ/DAZ flags don't exist). Potentially doing so is justified regardless of -ffast-math, since users can of course use fesetenv or _mm_setcsr to enable these flags themselves. (Sidenote: "dynamic" isn't even documented).

So it's a matter of which behavior is going to be the least likely to be incorrect in practice.

So, alternatively...we could just go with the simplest solution, and use "ieee" as the default even under -ffast-math.

@jcranmer-intel
Copy link
Contributor Author

(Sidenote: "dynamic" isn't even documented).

It's not a selectable enum of the Clang -fdenormal-fp-math flag, but it is one for the LLVM function attribute denormal-fp-math.

@andykaylor
Copy link
Contributor

I don't know anything about how non-x86 targets implement DAZ/FTZ, but for x86-based targets, I think trying to make any assumptions about the setting is bound to be wrong. In theory, it's part of the floating-point environment and shouldn't be modified during execution unless fenv-access is enabled.

In practical terms, I can't see any reason that you would ever want a shared library to be able to control this. I don't like the idea of adding an option to maintain that behavior, and I think we should make it a high priority to stop doing it by default when a shared library is compiled with fast-math enabled.

More generally, if you're going to be mixing fast-math with non-fast-math within a program, you don't want FTZ/DAZ to be enabled just because some parts of the program are using fast-math, so the current behavior of linking with crtfastmath.o is fairly reckless.

@andykaylor
Copy link
Contributor

(Sidenote: "dynamic" isn't even documented).

It's not a selectable enum of the Clang -fdenormal-fp-math flag, but it is one for the LLVM function attribute denormal-fp-math.

This is also a bit of a problem. After inlining we can have instructions with fast-math disabled appearing in the middle of functions that were compiled with fast-math enabled. What happens to the "denormal-fp-math" attribute in such cases? I wanted to show this in compiler explorer, but "#pragma float_control()" doesn't update the "denormal-fp-math attribute.

https://godbolt.org/z/KEP3YEEe9

I may have mentioned a few times that I don't like function attributes controlling fast-math behaviors.

@arsenm
Copy link
Contributor

arsenm commented Feb 6, 2024

I may have mentioned a few times that I don't like function attributes controlling fast-math behaviors.

It doesn't control it, it's informative. You just get undefined behavior if you end up calling mismatched mode functions.

It does control it in the AMDGPU entry point function case, since we are more or less directly programming the initial register state on kernel launch in the compiler

@jyknight
Copy link
Member

jyknight commented Feb 6, 2024

So, alternatively...we could just go with the simplest solution, and use "ieee" as the default even under -ffast-math.

This is feeling like the best option to me, at this point. Easily implementable, and doesn't seem to make things significantly worse than they are today, since the denormal mode can't be controlled by compilation options -- so being wrong in both directions is going to happen, anyhow.

We'd simply delete Linux::getDefaultDenormalModeForType, which thus causes -ffast-math to simply not change the compiler's understanding of denormals handling -- like is already the case on everything except x86/x86_64 Linux.

Other more-involved changes, such as exposing "-fdenormal-fp-math=dynamic" in Clang, and potentially making it the default, could be done later if desired. I'd suggest that this change doesn't need to block on any such discussions.

@arsenm
Copy link
Contributor

arsenm commented Feb 6, 2024

So, alternatively...we could just go with the simplest solution, and use "ieee" as the default even under -ffast-math.

+1. There hasn't been a performance reason to use FTZ/DAZ since ~2011. Maybe there's still a power benefit? But in that case you could still explicitly request the flush separate from -ffast-math

@andykaylor
Copy link
Contributor

I may have mentioned a few times that I don't like function attributes controlling fast-math behaviors.

It doesn't control it, it's informative. You just get undefined behavior if you end up calling mismatched mode functions.

What I meant to say was that I don't like function attributes controlling compiler behavior related to fast-math options. I'd like to have the abilitity to switch things on and off at the instruction level. That's not really relevant to this PR though.

It does control it in the AMDGPU entry point function case, since we are more or less directly programming the initial register state on kernel launch in the compiler

Do you only set the register for kernel entries? Is the attribute ignored for other functions?

This is actually similar to how the Intel C/C++ compiler behaves when compiling hosted code. We insert code into the main() function to set MXCSR if that function was compiled with FTZ/DAZ enabled. I think this makes more sense than linking with crtfastmath.o, and it's a lot easier to explain the behavior to users.

@andykaylor
Copy link
Contributor

So, alternatively...we could just go with the simplest solution, and use "ieee" as the default even under -ffast-math.

+1. There hasn't been a performance reason to use FTZ/DAZ since ~2011. Maybe there's still a power benefit? But in that case you could still explicitly request the flush separate from -ffast-math

I don't think this is correct. I know there have been improvements, but there are still cases where setting ftz can have a significant performance impact.

@arsenm
Copy link
Contributor

arsenm commented Feb 7, 2024

Do you only set the register for kernel entries?

Yes, it's the pre-initialized state. Non kernels can't be arbitrarily invoked from the host

Is the attribute ignored for other functions?

No, it's an informative attribute about that the mode is. The compiler isn't trying to fix up the modes for you. If you enter the function with the wrong mode, you get what you get.

@jyknight
Copy link
Member

jyknight commented Feb 7, 2024

So, alternatively...we could just go with the simplest solution, and use "ieee" as the default even under -ffast-math.

+1. There hasn't been a performance reason to use FTZ/DAZ since ~2011. Maybe there's still a power benefit? But in that case you could still explicitly request the flush separate from -ffast-math

I don't think this is correct. I know there have been improvements, but there are still cases where setting ftz can have a significant performance impact.

Just to be clear: I'm not proposing entirely eliminating the "link against crtfastmath.o" behavior, when linking a binary with -ffast-math (though, separately from this review, that may be worth considering!). I only meant we should stop attempting to infer anything about -fdenormal-fp-math due to using -ffast-math. (as per the other paragraph in my last comment).

@andykaylor
Copy link
Contributor

Just to be clear: I'm not proposing entirely eliminating the "link against crtfastmath.o" behavior, when linking a binary with -ffast-math (though, separately from this review, that may be worth considering!). I only meant we should stop attempting to infer anything about -fdenormal-fp-math due to using -ffast-math. (as per the other paragraph in my last comment).

Are youe suggesting that we should continue linking against crtfastmath.o when it is available and fast-math is enabled but we should set the "denormal-fp-math" attributes to "ieee" or "dynamic"? I want to make sure I'm understanding you correctly.

Personally, I don't like linking with crtfastmath.o to get this behavior. I would prefer to insert code into whatever function(s), if any, we identify as a relevant entry point. That would require having the attribute set, at least for the entry function(s). Linking with crtfastmath.o depends on finding it, which I think makes it dependent on the GNU toolchain, and the static initializer is less visible to users debugging their program.

For x86-based CPU targets, we really shouldn't be making assumptions about the FTZ/DAZ state for any function where we aren't setting it, but if it's useful for other targets, we shouldn't take that away.

@jyknight
Copy link
Member

jyknight commented Feb 8, 2024

I'm proposing a simple change we can make now, in order to unblock this PR which at least gets rid of crtfastmath for shared libraries! I don't think this needs to be the end of the story; additional, more large-scale changes can be made afterwards...

Are youe suggesting that we should continue linking against crtfastmath.o when it is available and fast-math is enabled

I'm suggesting no additional changes regarding when to link crtfastmath vs what's present in this PR.

but we should set the "denormal-fp-math" attributes to "ieee" or "dynamic"? I want to make sure I'm understanding you correctly.

I'm suggesting that we modify Clang so that -ffast-math doesn't affect denormal-fp-math, by (as I mentioned before) removing the overload Linux::getDefaultDenormalModeForType. This makes Clang for Linux X86 and X86-64 work the same as every other OS/CPU combo.

Yes, this means we have the wrong mode, sometimes. But: that is not a new problem -- we use the wrong mode all the time already. And ISTM the issues that arise are within the expected bounds of -ffast-math wrongness.

For x86-based CPU targets, we really shouldn't be making assumptions about the FTZ/DAZ state for any function where we aren't setting it

If you'd make another PR proposing to expose -fdenormal-fp-math=dynamic to the Clang command-line, and making that the default for most targets, that seems potentially interesting to discuss. I don't really know the impact, and I also don't think it needs to be concluded before making the changes for this PR.

@andykaylor
Copy link
Contributor

I'm suggesting that we modify Clang so that -ffast-math doesn't affect denormal-fp-math, by (as I mentioned before) removing the overload Linux::getDefaultDenormalModeForType. This makes Clang for Linux X86 and X86-64 work the same as every other OS/CPU combo.

I don't see why the current denormal-fp-math setting behavior is a blocking issue for this change. Yes, compiling a shared library with ffast-math would lead to the "denormal-fp-math" attribute being set to a potentially misleading value, but that's already true for any file that is compiled with -ffast-math if the file containing the entry function isn't compiled with -ffast-math. I'd see fixing that as a separate issue.

I'd like to see this change land, but with the "-mdaz-ftz" option removed (because I don't think it's useful). That would fix the critical problem of fast-math infecting shared libraries with the ftz setting, and we could straighten out the other problems, which are relatively minor, afterwards.

@jyknight
Copy link
Member

jyknight commented Feb 8, 2024

I don't see why the current denormal-fp-math setting behavior is a blocking issue for this change

Because this PR as-is causes us to start parsing the "-shared" flag for compilation actions in order to determine which denormal-fp-math setting to use. Users should not pass that to compile actions, and if they do, it should result in a -Wunused-command-line-argument diagnostic, NOT a behavior change.

@jcranmer-intel
Copy link
Contributor Author

I'd like to see this change land, but with the "-mdaz-ftz" option removed (because I don't think it's useful). That would fix the critical problem of fast-math infecting shared libraries with the ftz setting, and we could straighten out the other problems, which are relatively minor, afterwards.

There is, without this change, no way to control whether or not crtfastmath.o is linked independent of all of the other fast-math options. The -mdaz-ftz option would at least add a flag to explicitly control the parameter (for the people who care), and we can then have discussions about different ways to effect setting DAZ/FTZ bits or what options imply -mdaz-ftz in future PRs. That alone makes it a worthy addition IMHO; the compatibility with gcc is another nice feature.

I'm suggesting that we modify Clang so that -ffast-math doesn't affect denormal-fp-math, by (as I mentioned before) removing the overload Linux::getDefaultDenormalModeForType. This makes Clang for Linux X86 and X86-64 work the same as every other OS/CPU combo.

That sounds to me ultimately like it should be a separate PR, although you're suggesting that should happen first?

@andykaylor
Copy link
Contributor

I'd like to see this change land, but with the "-mdaz-ftz" option removed (because I don't think it's useful). That would fix the critical problem of fast-math infecting shared libraries with the ftz setting, and we could straighten out the other problems, which are relatively minor, afterwards.

There is, without this change, no way to control whether or not crtfastmath.o is linked independent of all of the other fast-math options. The -mdaz-ftz option would at least add a flag to explicitly control the parameter (for the people who care), and we can then have discussions about different ways to effect setting DAZ/FTZ bits or what options imply -mdaz-ftz in future PRs. That alone makes it a worthy addition IMHO; the compatibility with gcc is another nice feature.

You can always link crtfastmath.o directly, of course. Ultimately, I don't think the compiler should ever be adding the crtfastmath.o file. I would prefer to insert code directly into the entry function as @arsenm indicated the AMDGPU backend does for kernels. That would then be controlled by the -fdenormal-fp-math option or something more explicitly linked to the entry function.

I don't want to add -mdaz-ftz because once we do we're kind of stuck with it. If you don't add it here, we'd continue the current behavior of linking with crtfastmath.o normally but we'd stop infecting shared libraries with it.

@andykaylor
Copy link
Contributor

I don't see why the current denormal-fp-math setting behavior is a blocking issue for this change

Because this PR as-is causes us to start parsing the "-shared" flag for compilation actions in order to determine which denormal-fp-math setting to use. Users should not pass that to compile actions, and if they do, it should result in a -Wunused-command-line-argument diagnostic, NOT a behavior change.

We're already parsing -nostartfiles and -nostdlib to determine the default setting. I don't see how looking at -shared makes it any worse.

I would agree that this whole model is broken. If I use -ffast-math to create a bunch of object files (including the one containing main()) and then I link without that option, crtfastmath.o doesn't get linked, even though we used "denorm-fp-math"="preserveSign" on every function.

I think we're in agreement that this needs to be fixed for x86-based targets. We just need to agree on how to do it. I think #57589 is a pretty egregious problem, and I'd like to see it fixed without delay, but given that it requires setting -ffast-math on your link line maybe it can wait for a proper fix.

@andykaylor
Copy link
Contributor

I just created #81204 to describe the linking versus compiling problem.

@jcranmer-intel
Copy link
Contributor Author

I'd like to see this change land, but with the "-mdaz-ftz" option removed (because I don't think it's useful). That would fix the critical problem of fast-math infecting shared libraries with the ftz setting, and we could straighten out the other problems, which are relatively minor, afterwards.

There is, without this change, no way to control whether or not crtfastmath.o is linked independent of all of the other fast-math options. The -mdaz-ftz option would at least add a flag to explicitly control the parameter (for the people who care), and we can then have discussions about different ways to effect setting DAZ/FTZ bits or what options imply -mdaz-ftz in future PRs. That alone makes it a worthy addition IMHO; the compatibility with gcc is another nice feature.

You can always link crtfastmath.o directly, of course. Ultimately, I don't think the compiler should ever be adding the crtfastmath.o file. I would prefer to insert code directly into the entry function as @arsenm indicated the AMDGPU backend does for kernels. That would then be controlled by the -fdenormal-fp-math option or something more explicitly linked to the entry function.

I don't want to add -mdaz-ftz because once we do we're kind of stuck with it. If you don't add it here, we'd continue the current behavior of linking with crtfastmath.o normally but we'd stop infecting shared libraries with it.

I don't think it's unreasonable to switch the logic of -mdaz-ftz from linking a file with a global initializer that sets the flags to making it emit the entry-point call to such a function instead, it still largely follows the same logic to me. And having a command line flag makes it easier for users to access rather than manually linking in a file located who-knows-where in the toolchain (although I suspect anyone who cares hard enough would rather just write the calls to set FTZ/DAZ than track it down from the toolchain).

@andykaylor
Copy link
Contributor

I don't think it's unreasonable to switch the logic of -mdaz-ftz from linking a file with a global initializer that sets the flags to making it emit the entry-point call to such a function instead, it still largely follows the same logic to me. And having a command line flag makes it easier for users to access rather than manually linking in a file located who-knows-where in the toolchain (although I suspect anyone who cares hard enough would rather just write the calls to set FTZ/DAZ than track it down from the toolchain).

I suppose there's also some portability value in supporting this gcc option. I guess I'd be OK with it.

@@ -117,6 +117,11 @@ C23 Feature Support
Non-comprehensive list of changes in this release
-------------------------------------------------

* Code compiled with ``-shared`` and ``-ffast-math`` will no longer enable
flush-to-zero floating-point mode by default. This decision can be overridden
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to revise this to clarify that this only refers to the use of -shared and -ffast-math on the command-line used to link the program.

@@ -117,6 +117,11 @@ C23 Feature Support
Non-comprehensive list of changes in this release
-------------------------------------------------

* Code compiled with ``-shared`` and ``-ffast-math`` will no longer enable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Code compiled with ``-shared`` and ``-ffast-math`` will no longer enable
* Code compiled with ``-shared`` and either ``-ffast-math`` or ``-funsafe-math-optimizations`` will no longer enable

clang/test/Driver/linux-ld.c Outdated Show resolved Hide resolved
@jcranmer-intel
Copy link
Contributor Author

I did a thorough investigation into how the denormal-fp-math=preserve-sign,preserve-sign attribute affects the resulting IR for all of the SPEC benchmarks (which actually do run into subnormals), and the basic summary I found is that the only differences I found were that the FTZ/DAZ mode prevented a few inferences that are still true in denormal-flushing: things like x >= 0 implies x >= -1.

So overall, the practical effect of the denormal-fp-math attribute being set incorrectly doesn't appear to matter.

@arsenm
Copy link
Contributor

arsenm commented Feb 28, 2024

So overall, the practical effect of the denormal-fp-math attribute being set incorrectly doesn't appear to matter.

It matters more for AMDGPU, where we need to care because some instructions just don't respect denormals. We legalize some operations differently depending on the mode

@jcranmer-intel
Copy link
Contributor Author

It matters more for AMDGPU, where we need to care because some instructions just don't respect denormals. We legalize some operations differently depending on the mode

But the shared library stuff isn't an issue for AMDGPU, right?

@arsenm
Copy link
Contributor

arsenm commented Feb 29, 2024

But the shared library stuff isn't an issue for AMDGPU, right?

No, we don't support shared library linking yet

This fixes llvm#57589, and aligns
Clang with the behavior of current versions of gcc. There is a new option,
-mdaz-ftz, to control the linking of the file that sets FTZ/DAZ on startup, and
this flag is on by default if -ffast-math is present and -shared isn't.
The guessing of whether or not FTZ/DAZ mode would be enabled is
insufficiently accurate.
@jcranmer-intel
Copy link
Contributor Author

Ping for review

Linux::getDefaultDenormalModeForType(const llvm::opt::ArgList &DriverArgs,
const JobAction &JA,
const llvm::fltSemantics *FPType) const {
switch (getTriple().getArch()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should shared libraries be using dynamic mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. But we can't reliably tell from the individual file compile commands if we're going to be linked as a shared library or not, which is why I'm dropping this attempt to set it at all in the first place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So getDefaultDenormalModeForType() in ToolChain.h should be returning DenormalMode::Dynamic() rather than DenormalMode::getIEEE(), right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we shouldn't. This was discussed before: #80475 (comment) and surrounding comments.
But in short: doing so would regress normal code, only for the (arguable) benefit of code which is explicitly asks to play fast-and-loose with the rules via -ffast-math.

I think the change is good as-is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think that had been resolved, but I agree that this change is good as-is.

It's not just fast-math that can change ftz/daz though. Users can also modify MXCSR directly. In C, FENV_ACCESS needs to be enabled, but in practice I find that users would like to be able to do that without taking on the other penalties associated with enabling fenv access (such as generally diminished optimization). It would be nice if LLVM IR had a way to allow this.

Copy link
Member

@jyknight jyknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One final nit before you submit, though: please update the PR description and the release notes to also mention that x86 no longer modifies -fdenormal-fp-math based on -ffast-math.

// Don't implicitly link in mode-changing libraries in a shared library, since
// this can have very deleterious effects. See the various links from
// https://github.com/llvm/llvm-project/issues/57589 for more information.
bool Default = !Args.hasArg(options::OPT_shared);
Copy link
Member

@MaskRay MaskRay Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasArgNoClaim

We don't want this check to drop the warning: argument unused during compilation: '-shared' [-Wunused-command-line-argument] warning for clang -shared -c a.c

@andykaylor
Copy link
Contributor

One final nit before you submit, though: please update the PR description and the release notes to also mention that x86 no longer modifies -fdenormal-fp-math based on -ffast-math.

The Clang User's Manual makes some contradictory statements about this. In one place it says, -fno-fast-math resets -fdenormal-fp-math=<value> to its target-dependent default, but just a few lines later it says "-fno-fast-math implies -fdenormal-fp-math=ieee" which isn't true for all targets. It says -fno-unsafe-math-optimizations implies '-fdenormal-fp-math=ieee` which, unfortunately, is currently be true.

The User's Manual also says that -ffp-model=fast sets denormal_fp_math to IEEE, which is currently true, but really shouldn't be.

I'm working on a PR to fix the fp-model handling, which I'll update after this PR lands. I can fix the other issues I've mentioned here in that patch. There are a few points that I am not quite clear on. I'm going to start a Discourse topic to summarize what we've agreed on here and discuss the things I'm not sure about.

@jcranmer-intel jcranmer-intel merged commit 63ecd2a into llvm:main Apr 25, 2024
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shared library compiled with -ffast-math modifies FPU state
6 participants