-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable FTZ/DAZ when compiling shared libraries by default. #80475
Conversation
@llvm/pr-subscribers-clang-driver @llvm/pr-subscribers-clang Author: Joshua Cranmer (jcranmer-intel) ChangesThis fixes #57589, and aligns Clang with the behavior of current versions of gcc. There is a new option, -mdaz-ftz, to control the linking of the file that sets FTZ/DAZ on startup, and this flag is on by default if -ffast-math is present and -shared isn't. Full diff: https://github.com/llvm/llvm-project/pull/80475.diff 5 Files Affected:
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index cd8a82f281f52..904e292a2eb4b 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -117,6 +117,11 @@ C23 Feature Support
Non-comprehensive list of changes in this release
-------------------------------------------------
+* Code compiled with ``-shared`` and ``-ffast-math`` will no longer enable
+ flush-to-zero floating-point mode by default. This decision can be overridden
+ with use of ``-mdaz-ftz``. This behavior now matches GCC's behavior.
+ (`#57589 <https://github.com/llvm/llvm-project/issues/57589>`_)
+
New Compiler Flags
------------------
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index 7391e4cf3a9ae..257af3c8d4d39 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -1506,7 +1506,8 @@ floating point semantic models: precise (the default), strict, and fast.
* ``-ffp-contract=fast``
- Note: ``-ffast-math`` causes ``crtfastmath.o`` to be linked with code. See
+ Note: ``-ffast-math`` causes ``crtfastmath.o`` to be linked with code unless
+ ``-shared`` or ``-mno-daz-ftz`` is present. See
:ref:`crtfastmath.o` for more details.
.. option:: -fno-fast-math
@@ -1560,7 +1561,8 @@ floating point semantic models: precise (the default), strict, and fast.
``-ffp-contract``.
Note: ``-fno-fast-math`` implies ``-fdenormal-fp-math=ieee``.
- ``-fno-fast-math`` causes ``crtfastmath.o`` to not be linked with code.
+ ``-fno-fast-math`` causes ``crtfastmath.o`` to not be linked with code
+ unless ``-mdaz-ftz`` is present.
.. option:: -fdenormal-fp-math=<value>
@@ -1907,10 +1909,13 @@ by using ``#pragma STDC FENV_ROUND`` with a value other than ``FE_DYNAMIC``.
A note about ``crtfastmath.o``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-``-ffast-math`` and ``-funsafe-math-optimizations`` cause ``crtfastmath.o`` to be
-automatically linked, which adds a static constructor that sets the FTZ/DAZ
+``-ffast-math`` and ``-funsafe-math-optimizations`` without the ``-shared``
+option cause ``crtfastmath.o`` to be
+automatically linked, which adds a static constructor that sets the FTZ/DAZ
bits in MXCSR, affecting not only the current compilation unit but all static
-and shared libraries included in the program.
+and shared libraries included in the program. This decision can be overridden
+by using either the flag ``-mdaz-ftz`` or ``-mno-daz-ftz`` to respectively
+link or not link ``crtfastmath.o``.
.. _FLT_EVAL_METHOD:
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 73071a6648541..68f806d46c9e0 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -2554,6 +2554,11 @@ defm protect_parens : BoolFOption<"protect-parens",
"floating-point expressions are evaluated">,
NegFlag<SetFalse>>;
+defm daz_ftz : SimpleMFlag<"daz-ftz",
+ "Globally set", "Do not globally set",
+ " the denormals-are-zero (DAZ) and flush-to-zero (FTZ) bits in the "
+ "floating-point control register on program startup.">;
+
def ffor_scope : Flag<["-"], "ffor-scope">, Group<f_Group>;
def fno_for_scope : Flag<["-"], "fno-for-scope">, Group<f_Group>;
diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp
index 388030592b483..5c003e153fa52 100644
--- a/clang/lib/Driver/ToolChain.cpp
+++ b/clang/lib/Driver/ToolChain.cpp
@@ -1271,9 +1271,14 @@ void ToolChain::AddCCKextLibArgs(const ArgList &Args,
bool ToolChain::isFastMathRuntimeAvailable(const ArgList &Args,
std::string &Path) const {
+ // Don't implicitly link in mode-changing libraries in a shared library, since
+ // this can have very deleterious effects. See the various links from
+ // https://github.com/llvm/llvm-project/issues/57589 for more information.
+ bool Default = !Args.hasArg(options::OPT_shared);
+
// Do not check for -fno-fast-math or -fno-unsafe-math when -Ofast passed
// (to keep the linker options consistent with gcc and clang itself).
- if (!isOptimizationLevelFast(Args)) {
+ if (Default && !isOptimizationLevelFast(Args)) {
// Check if -ffast-math or -funsafe-math.
Arg *A =
Args.getLastArg(options::OPT_ffast_math, options::OPT_fno_fast_math,
@@ -1282,8 +1287,14 @@ bool ToolChain::isFastMathRuntimeAvailable(const ArgList &Args,
if (!A || A->getOption().getID() == options::OPT_fno_fast_math ||
A->getOption().getID() == options::OPT_fno_unsafe_math_optimizations)
- return false;
+ Default = false;
}
+
+ // Whatever decision came as a result of the above implicit settings, either
+ // -mdaz-ftz or -mno-daz-ftz is capable of overriding it.
+ if (!Args.hasFlag(options::OPT_mdaz_ftz, options::OPT_mno_daz_ftz, Default))
+ return false;
+
// If crtfastmath.o exists add it to the arguments.
Path = GetFilePath("crtfastmath.o");
return (Path != "crtfastmath.o"); // Not found.
diff --git a/clang/test/Driver/linux-ld.c b/clang/test/Driver/linux-ld.c
index b8efd64cd91f0..137d84cca883a 100644
--- a/clang/test/Driver/linux-ld.c
+++ b/clang/test/Driver/linux-ld.c
@@ -1569,6 +1569,40 @@
// RUN: --gcc-toolchain="" \
// RUN: --sysroot=%S/Inputs/basic_linux_tree 2>&1 \
// RUN: | FileCheck --check-prefix=CHECK-NOCRTFASTMATH %s
+// Don't link crtfastmath.o with -shared
+// RUN: %clang --target=x86_64-unknown-linux -no-pie -### %s -ffast-math -shared \
+// RUN: --gcc-toolchain="" \
+// RUN: --sysroot=%S/Inputs/basic_linux_tree 2>&1 \
+// RUN: | FileCheck --check-prefix=CHECK-NOCRTFASTMATH %s
+// RUN: %clang --target=x86_64-unknown-linux -no-pie -### %s -Ofast -shared \
+// RUN: --gcc-toolchain="" \
+// RUN: --sysroot=%S/Inputs/basic_linux_tree 2>&1 \
+// RUN: | FileCheck --check-prefix=CHECK-NOCRTFASTMATH %s
+// Check for effects of -mdaz-ftz
+// RUN: %clang --target=x86_64-unknown-linux -no-pie -### %s -ffast-math -shared -mdaz-ftz \
+// RUN: --gcc-toolchain="" \
+// RUN: --sysroot=%S/Inputs/basic_linux_tree 2>&1 \
+// RUN: | FileCheck --check-prefix=CHECK-CRTFASTMATH %s
+// RUN: %clang --target=x86_64-unknown-linux -no-pie -### %s -ffast-math -mdaz-ftz \
+// RUN: --gcc-toolchain="" \
+// RUN: --sysroot=%S/Inputs/basic_linux_tree 2>&1 \
+// RUN: | FileCheck --check-prefix=CHECK-CRTFASTMATH %s
+// RUN: %clang --target=x86_64-unknown-linux -no-pie -### %s -mdaz-ftz \
+// RUN: --gcc-toolchain="" \
+// RUN: --sysroot=%S/Inputs/basic_linux_tree 2>&1 \
+// RUN: | FileCheck --check-prefix=CHECK-CRTFASTMATH %s
+// RUN: %clang --target=x86_64-unknown-linux -no-pie -### %s -ffast-math -shared -mno-daz-ftz \
+// RUN: --gcc-toolchain="" \
+// RUN: --sysroot=%S/Inputs/basic_linux_tree 2>&1 \
+// RUN: | FileCheck --check-prefix=CHECK-NOCRTFASTMATH %s
+// RUN: %clang --target=x86_64-unknown-linux -no-pie -### %s -ffast-math -mno-daz-ftz \
+// RUN: --gcc-toolchain="" \
+// RUN: --sysroot=%S/Inputs/basic_linux_tree 2>&1 \
+// RUN: | FileCheck --check-prefix=CHECK-NOCRTFASTMATH %s
+// RUN: %clang --target=x86_64-unknown-linux -no-pie -### %s -mno-daz-ftz \
+// RUN: --gcc-toolchain="" \
+// RUN: --sysroot=%S/Inputs/basic_linux_tree 2>&1 \
+// RUN: | FileCheck --check-prefix=CHECK-NOCRTFASTMATH %s
// CHECK-CRTFASTMATH: usr/lib/gcc/x86_64-unknown-linux/10.2.0{{/|\\\\}}crtfastmath.o
// CHECK-NOCRTFASTMATH-NOT: crtfastmath.o
|
I think there is a bit of a problematic interaction with getDenormalModeForType here. "-shared" is (should be) a flag used only for linking, but that function is calling isFastMathRuntimeAvailable to affect the default denormal math mode for compilation. That's not going to work. Probably we should effectively revert the x86-linux parts of fa7cd54. I wonder if, instead, we should just have |
This could also use denormal-fp-math=dynamic. It is not always safe to run preserve-sign code under IEEE settings |
I can see that this is used in a bunch of optimization/constant-folding passes, but I don't have a feel for what the actual impact is:
I note that we're frequently getting it wrong in the other direction today, as there are many users who link in code compiled without |
This depends on the target and operations. For some functions, on modern hardware, the preserve-sign code will be worse for AMDGPU. In other cases it's better
The main example is the conversions performed between fcmp with 0 and llvm.is.fpclass. You'll get different results depending on whether the input is implicitly flushed in fcmp vs. not in the is.fpclass |
The logic is already somewhat broken already: if
One of the issues is that we have optimizations that are going to make assumptions about how denormals work in operations (particularly around being able to lower (We're wrong in another direction, which is that we don't attempt to set the flag to non-ieee for anything other x86. I tried looking up how the equivalents of FTZ/DAZ worked on other architectures and came to with the conclusion that the whole thing is a bundle of sadness that should have never been invented.) |
This sounds intuitively like the sort of semantics-breaking optimization which is expected from
Yes, we could change the default to "dynamic" for all builds (except on platforms where FTZ/DAZ flags don't exist). Potentially doing so is justified regardless of
So, alternatively...we could just go with the simplest solution, and use "ieee" as the default even under |
It's not a selectable enum of the Clang |
I don't know anything about how non-x86 targets implement DAZ/FTZ, but for x86-based targets, I think trying to make any assumptions about the setting is bound to be wrong. In theory, it's part of the floating-point environment and shouldn't be modified during execution unless fenv-access is enabled. In practical terms, I can't see any reason that you would ever want a shared library to be able to control this. I don't like the idea of adding an option to maintain that behavior, and I think we should make it a high priority to stop doing it by default when a shared library is compiled with fast-math enabled. More generally, if you're going to be mixing fast-math with non-fast-math within a program, you don't want FTZ/DAZ to be enabled just because some parts of the program are using fast-math, so the current behavior of linking with crtfastmath.o is fairly reckless. |
This is also a bit of a problem. After inlining we can have instructions with fast-math disabled appearing in the middle of functions that were compiled with fast-math enabled. What happens to the "denormal-fp-math" attribute in such cases? I wanted to show this in compiler explorer, but "#pragma float_control()" doesn't update the "denormal-fp-math attribute. https://godbolt.org/z/KEP3YEEe9 I may have mentioned a few times that I don't like function attributes controlling fast-math behaviors. |
It doesn't control it, it's informative. You just get undefined behavior if you end up calling mismatched mode functions. It does control it in the AMDGPU entry point function case, since we are more or less directly programming the initial register state on kernel launch in the compiler |
This is feeling like the best option to me, at this point. Easily implementable, and doesn't seem to make things significantly worse than they are today, since the denormal mode can't be controlled by compilation options -- so being wrong in both directions is going to happen, anyhow. We'd simply delete Linux::getDefaultDenormalModeForType, which thus causes Other more-involved changes, such as exposing "-fdenormal-fp-math=dynamic" in Clang, and potentially making it the default, could be done later if desired. I'd suggest that this change doesn't need to block on any such discussions. |
+1. There hasn't been a performance reason to use FTZ/DAZ since ~2011. Maybe there's still a power benefit? But in that case you could still explicitly request the flush separate from -ffast-math |
What I meant to say was that I don't like function attributes controlling compiler behavior related to fast-math options. I'd like to have the abilitity to switch things on and off at the instruction level. That's not really relevant to this PR though.
Do you only set the register for kernel entries? Is the attribute ignored for other functions? This is actually similar to how the Intel C/C++ compiler behaves when compiling hosted code. We insert code into the main() function to set MXCSR if that function was compiled with FTZ/DAZ enabled. I think this makes more sense than linking with crtfastmath.o, and it's a lot easier to explain the behavior to users. |
I don't think this is correct. I know there have been improvements, but there are still cases where setting ftz can have a significant performance impact. |
Yes, it's the pre-initialized state. Non kernels can't be arbitrarily invoked from the host
No, it's an informative attribute about that the mode is. The compiler isn't trying to fix up the modes for you. If you enter the function with the wrong mode, you get what you get. |
Just to be clear: I'm not proposing entirely eliminating the "link against crtfastmath.o" behavior, when linking a binary with |
Are youe suggesting that we should continue linking against crtfastmath.o when it is available and fast-math is enabled but we should set the "denormal-fp-math" attributes to "ieee" or "dynamic"? I want to make sure I'm understanding you correctly. Personally, I don't like linking with crtfastmath.o to get this behavior. I would prefer to insert code into whatever function(s), if any, we identify as a relevant entry point. That would require having the attribute set, at least for the entry function(s). Linking with crtfastmath.o depends on finding it, which I think makes it dependent on the GNU toolchain, and the static initializer is less visible to users debugging their program. For x86-based CPU targets, we really shouldn't be making assumptions about the FTZ/DAZ state for any function where we aren't setting it, but if it's useful for other targets, we shouldn't take that away. |
I'm proposing a simple change we can make now, in order to unblock this PR which at least gets rid of crtfastmath for shared libraries! I don't think this needs to be the end of the story; additional, more large-scale changes can be made afterwards...
I'm suggesting no additional changes regarding when to link crtfastmath vs what's present in this PR.
I'm suggesting that we modify Clang so that Yes, this means we have the wrong mode, sometimes. But: that is not a new problem -- we use the wrong mode all the time already. And ISTM the issues that arise are within the expected bounds of
If you'd make another PR proposing to expose |
I don't see why the current denormal-fp-math setting behavior is a blocking issue for this change. Yes, compiling a shared library with ffast-math would lead to the "denormal-fp-math" attribute being set to a potentially misleading value, but that's already true for any file that is compiled with -ffast-math if the file containing the entry function isn't compiled with -ffast-math. I'd see fixing that as a separate issue. I'd like to see this change land, but with the "-mdaz-ftz" option removed (because I don't think it's useful). That would fix the critical problem of fast-math infecting shared libraries with the ftz setting, and we could straighten out the other problems, which are relatively minor, afterwards. |
Because this PR as-is causes us to start parsing the "-shared" flag for compilation actions in order to determine which denormal-fp-math setting to use. Users should not pass that to compile actions, and if they do, it should result in a |
There is, without this change, no way to control whether or not
That sounds to me ultimately like it should be a separate PR, although you're suggesting that should happen first? |
You can always link crtfastmath.o directly, of course. Ultimately, I don't think the compiler should ever be adding the crtfastmath.o file. I would prefer to insert code directly into the entry function as @arsenm indicated the AMDGPU backend does for kernels. That would then be controlled by the -fdenormal-fp-math option or something more explicitly linked to the entry function. I don't want to add -mdaz-ftz because once we do we're kind of stuck with it. If you don't add it here, we'd continue the current behavior of linking with crtfastmath.o normally but we'd stop infecting shared libraries with it. |
We're already parsing -nostartfiles and -nostdlib to determine the default setting. I don't see how looking at -shared makes it any worse. I would agree that this whole model is broken. If I use -ffast-math to create a bunch of object files (including the one containing main()) and then I link without that option, crtfastmath.o doesn't get linked, even though we used "denorm-fp-math"="preserveSign" on every function. I think we're in agreement that this needs to be fixed for x86-based targets. We just need to agree on how to do it. I think #57589 is a pretty egregious problem, and I'd like to see it fixed without delay, but given that it requires setting -ffast-math on your link line maybe it can wait for a proper fix. |
I just created #81204 to describe the linking versus compiling problem. |
I don't think it's unreasonable to switch the logic of |
I suppose there's also some portability value in supporting this gcc option. I guess I'd be OK with it. |
clang/docs/ReleaseNotes.rst
Outdated
@@ -117,6 +117,11 @@ C23 Feature Support | |||
Non-comprehensive list of changes in this release | |||
------------------------------------------------- | |||
|
|||
* Code compiled with ``-shared`` and ``-ffast-math`` will no longer enable | |||
flush-to-zero floating-point mode by default. This decision can be overridden |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to revise this to clarify that this only refers to the use of -shared and -ffast-math on the command-line used to link the program.
clang/docs/ReleaseNotes.rst
Outdated
@@ -117,6 +117,11 @@ C23 Feature Support | |||
Non-comprehensive list of changes in this release | |||
------------------------------------------------- | |||
|
|||
* Code compiled with ``-shared`` and ``-ffast-math`` will no longer enable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Code compiled with ``-shared`` and ``-ffast-math`` will no longer enable | |
* Code compiled with ``-shared`` and either ``-ffast-math`` or ``-funsafe-math-optimizations`` will no longer enable |
14e683e
to
4202a86
Compare
I did a thorough investigation into how the So overall, the practical effect of the |
It matters more for AMDGPU, where we need to care because some instructions just don't respect denormals. We legalize some operations differently depending on the mode |
But the shared library stuff isn't an issue for AMDGPU, right? |
No, we don't support shared library linking yet |
This fixes llvm#57589, and aligns Clang with the behavior of current versions of gcc. There is a new option, -mdaz-ftz, to control the linking of the file that sets FTZ/DAZ on startup, and this flag is on by default if -ffast-math is present and -shared isn't.
The guessing of whether or not FTZ/DAZ mode would be enabled is insufficiently accurate.
4202a86
to
5b9cd16
Compare
Ping for review |
Linux::getDefaultDenormalModeForType(const llvm::opt::ArgList &DriverArgs, | ||
const JobAction &JA, | ||
const llvm::fltSemantics *FPType) const { | ||
switch (getTriple().getArch()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should shared libraries be using dynamic mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably. But we can't reliably tell from the individual file compile commands if we're going to be linked as a shared library or not, which is why I'm dropping this attempt to set it at all in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So getDefaultDenormalModeForType() in ToolChain.h should be returning DenormalMode::Dynamic() rather than DenormalMode::getIEEE(), right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we shouldn't. This was discussed before: #80475 (comment) and surrounding comments.
But in short: doing so would regress normal code, only for the (arguable) benefit of code which is explicitly asks to play fast-and-loose with the rules via -ffast-math
.
I think the change is good as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think that had been resolved, but I agree that this change is good as-is.
It's not just fast-math that can change ftz/daz though. Users can also modify MXCSR directly. In C, FENV_ACCESS needs to be enabled, but in practice I find that users would like to be able to do that without taking on the other penalties associated with enabling fenv access (such as generally diminished optimization). It would be nice if LLVM IR had a way to allow this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One final nit before you submit, though: please update the PR description and the release notes to also mention that x86 no longer modifies -fdenormal-fp-math
based on -ffast-math
.
clang/lib/Driver/ToolChain.cpp
Outdated
// Don't implicitly link in mode-changing libraries in a shared library, since | ||
// this can have very deleterious effects. See the various links from | ||
// https://github.com/llvm/llvm-project/issues/57589 for more information. | ||
bool Default = !Args.hasArg(options::OPT_shared); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hasArgNoClaim
We don't want this check to drop the warning: argument unused during compilation: '-shared' [-Wunused-command-line-argument]
warning for clang -shared -c a.c
The Clang User's Manual makes some contradictory statements about this. In one place it says, The User's Manual also says that I'm working on a PR to fix the fp-model handling, which I'll update after this PR lands. I can fix the other issues I've mentioned here in that patch. There are a few points that I am not quite clear on. I'm going to start a Discourse topic to summarize what we've agreed on here and discuss the things I'm not sure about. |
This fixes #57589, and aligns Clang with the behavior of current versions of gcc. There is a new option, -mdaz-ftz, to control the linking of the file that sets FTZ/DAZ on startup, and this flag is on by default if -ffast-math is present and -shared isn't.
This also partially reverts fa7cd54 in that it disables the attempt to set the IR denormal-fp-math attribute based on whether or not -ffast-math is applied as it is insufficiently reliable.