-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implements the @nonans decorator to enable some LLVM vectorizations #31862
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably the best way to test this is like https://github.com/JuliaLang/julia/blob/master/test/llvmpasses/aliasscopes.jl or https://github.com/JuliaLang/julia/blob/master/test/llvmpasses/simdloop.ll
For me the big question is, whether this has to be a function attribute or if we can have it as a "local" attribute attached to the instructions, I would prefer it as instruction local since I suspect that the function attribute will interact interestingly with inlining.
Can't this be turned on automatically by |
I imagine this would nice. I was trying to follow the "minimum viable pull request" path and didn't think about that. I'm sure there are further improvements to be made, though. LLVM can actually take other arguments like this, and clang sets them all, but I couldn't yet find any other optimizations that depend on these attributes apart from this one. So that's something else we might consider beyond this PR. Perhaps this meta-node could actually take multiple arguments to control all these flags, and avoid a profusion of different specific meta-nodes? |
From what I could gather it really has to be a function attribute, and this is the relevant test in the LLVM code https://github.com/llvm/llvm-project/blob/fd254e429ea103be8bab6271855c04919d33f9fb/llvm/lib/Analysis/IVDescriptors.cpp#L590 (link edited) |
It should be already without this. |
|
These are instruction flags, this function attribute is something separate, apparently very little used. Please check #31442 for the context of this PR. |
This is for fmax, fmin right? It seems we are not treating them in Line 67 in 208d99c
|
Yes, it's a different way to do the same thing so it is applicable to instructions just fine. |
@yuyichao Setting the fast flag for operations enables indeed pretty much every optimization we might expect. The main contribution here is just noticing that for a few specific cases they are not sufficient to activate the LLVM loop vectorizer. Investigations on #31442 revealed LLVM requires this function attribute to be set in addition to work in this situation. This is what this PR does, what seems to have finally enabled vectorization for functions like |
That really seems like a LLVM bug, espeically since That is not to say we don't work around LLVM bugs, but all of those workaround/fixes are not user visible. I don't think we should introduce a new feature just to work around a LLVM bug. So it comes to if we should have this new feature. There were indeed talk about exposing more fine grain fast math flag control to the user. It'll be nice if they could be all introduced at the same time though and AFAICT, function wise this one is not much more important than the other ones (I find Another issue with doing this is that we areally don't need another one of these macros with completely different semantics... |
This definitely looks like an LLVM quirk, I can see very few instances where these flags are used that might impact things like optimization. Unfortunately it's still there in 9.0 and I guess we should find a way to deal with it. Clang seems mostly focused in controlling things from command line arguments, and this sits well with this API. That's probably why this was used even though it's unusual. Julia on the other hand seems to focus on fine control along the code without offering a lot of control over every single optimization. I personally find this very good, and I think what clang does seems more useful to debug the optimizations than anything else. Although I can also imagine this kind of control might be a life-saver sometimes. My current proposal was just to illustrate a working solution, I didn't really mean to introduce a new macro. Activating this via It might not even really require a meta-node too, maybe this could be passed as an argument to the Although we might do all this, it does not actually seem necessary for #31442 and #30320, what is really my main goal. Also, controlling these function attributes doesn't seem to me to closely relate to increasing control over the instruction flags. These are all factors related to fast math mode, but somehow orthogonal to each other, and I really don't feel capable to comment on other feature requests relative to fast math mode anyway. I hope I can just be given some lines along which I can continue to work on this solution, and I will gladly follow the path the community indicates me to be most productive use of my time. My proposals are then either 1. this PR the way it is, 2. activate a similar meta node from |
Just to raise this again, how do function attributes interact with Julia inlining. Part of the So what happens when we inline a method that has a meta node for a function attribute/what happens when we inline into a function that has this attribute. So if a method relies on the semantics that we normally guarantee and we inline it into a context in which those semantics we're relaxed, is that okay? In the case of So if we can find a local solution that achieves the same thing I would strongly prefer that. If the function attribute is the only way forward we need to document clearly what the semantics are with respect to inlining and give guidance whether this is expected to work on generic code. |
It sounds to me like Julia might have made some design decisions that are great despite having some conflicts with how LLVM was built. I certainly applaud striving for the best language we can build. If a proper fix to this whole thing actually involves submitting non-trivial patches to LLVM, though, this is definitely beyond my abilities at the moment. I don't suppose we would like to just give up on optimizing Being able to explore all the potential of LLVM probably requires us having tools that will contradict these guaranteed semantics. At least while we don't figure a way around it. Additionally, having fine control may also contradict the desire to minimize introducing language features such as new meta nodes with specific semantics. Maybe it's one of those "the great tragedy of science..." situations. I cannot really see a good solution for this issue that works much different that this one. I also don't see how this might be easily offered as a nice feature widely available to all Julia users. I still believe it might be nice to offer it as an obscure feature so we at least can have optimized versions of Maybe we can just move it into a "non-recommended" module, put an underscore somewhere in the macro name and maybe rename it to something more appropriate? |
Maybe the really best solution would be to rely on the LLVM intrinsics |
I've studied the code a bit more and wrote the LLVM dev list. It looks like we really may just fix this upstream. We should probably just close this PR once a concrete solution is available there. |
@nlw0 It's been nearly a year since that last message. Has this been fixed upstream? |
The state, as I understand, is that they decided to do a large change in LLVM where flags such as fastmath will be tied to parameters, and not functions or instructions. They have started doing the changes already, although unfortunately I could not follow closely what was going on anymore. This sounds like a pretty major thing that will impact Julia in more ways, and I guess some Julia core developers must be aware of it? I'm not sure what version of LLVM will have this change. I imagine once Julia is changed to use this new LLVM version we will naturally get this specific issue solved. Function flags should become either optional or obsolete then. |
Codecov Report
@@ Coverage Diff @@
## master #31862 +/- ##
==========================================
- Coverage 81.31% 81.30% -0.02%
==========================================
Files 372 372
Lines 63023 62988 -35
==========================================
- Hits 51250 51214 -36
- Misses 11773 11774 +1 |
The LLVM loop vectorizer depends on some special conditions when applied to floating-point operations, what is often fulfilled by
@fastmath
activating fast-mode instructions. For a few specific operations, though, notably<
and>
forminimum
ormaximum
, vectorization also depends on setting a function attribute called "no-nans-fp-math", what Julia apparently doesn't currently do.This patch introduces a Julia function decorator
@nonans
that works similar to@noinline
or@polly
, introducing a meta node that allows us to set the attribute incodegen.cpp
. This patch can be tested by running the following script, where the function does not get vectorized for floats unless the decorator is used.One issue related to this PR is #27104. A follow-up PR can also address #31442 and maybe even revisit #30320.
Another conceivably related issue is #19418, but not really. It seems hard to find a test for the
@noinline
check currently performed atjulia/src/codegen.cpp
Line 5617 in 6d02fe5