Implements the @nonans decorator to enable some LLVM vectorizations #31862

nlw0 · 2019-04-28T22:08:43Z

The LLVM loop vectorizer depends on some special conditions when applied to floating-point operations, what is often fulfilled by @fastmath activating fast-mode instructions. For a few specific operations, though, notably < and > for minimum or maximum, vectorization also depends on setting a function attribute called "no-nans-fp-math", what Julia apparently doesn't currently do.

This patch introduces a Julia function decorator @nonans that works similar to @noinline or @polly, introducing a meta node that allows us to set the attribute in codegen.cpp. This patch can be tested by running the following script, where the function does not get vectorized for floats unless the decorator is used.

function myfun(itr)
    @fastmath begin
        @inbounds begin
            v = itr[1]
            @simd for i in 2:2^16
                y = itr[i]
                v = ifelse(v < y, v, y)
            end
            return v
        end
    end
end

@nonans function myfun_nonan(itr)
    @fastmath begin
        @inbounds begin
            v = itr[1]
            @simd for i in 2:2^16
                y = itr[i]
                v = ifelse(v < y, v, y)
            end
            return v
        end
    end
end

datai = rand(Int32, 2^16);
dataf = rand(Float32, 2^16);
@code_llvm myfun(datai)
@code_llvm myfun_nonan(datai)
@code_llvm myfun(dataf)
@code_llvm myfun_nonan(dataf)

One issue related to this PR is #27104. A follow-up PR can also address #31442 and maybe even revisit #30320.

Another conceivably related issue is #19418, but not really. It seems hard to find a test for the @noinline check currently performed at

julia/src/codegen.cpp

Line 5617 in 6d02fe5

if (jl_has_meta(stmts, noinline_sym)) {

…tion attribute

vchuravy

Probably the best way to test this is like https://github.com/JuliaLang/julia/blob/master/test/llvmpasses/aliasscopes.jl or https://github.com/JuliaLang/julia/blob/master/test/llvmpasses/simdloop.ll

For me the big question is, whether this has to be a function attribute or if we can have it as a "local" attribute attached to the instructions, I would prefer it as instruction local since I suspect that the function attribute will interact interestingly with inlining.

base/expr.jl

base/compiler/ssair/legacy.jl

base/expr.jl

antoine-levitt · 2019-04-29T05:24:30Z

Can't this be turned on automatically by @fastmath?

nlw0 · 2019-04-29T17:06:52Z

Can't this be turned on automatically by @fastmath?

I imagine this would nice. I was trying to follow the "minimum viable pull request" path and didn't think about that. I'm sure there are further improvements to be made, though.

LLVM can actually take other arguments like this, and clang sets them all, but I couldn't yet find any other optimizations that depend on these attributes apart from this one. So that's something else we might consider beyond this PR. Perhaps this meta-node could actually take multiple arguments to control all these flags, and avoid a profusion of different specific meta-nodes?

nlw0 · 2019-04-29T17:16:06Z

For me the big question is, whether this has to be a function attribute or if we can have it as a "local" attribute attached to the instructions, I would prefer it as instruction local since I suspect that the function attribute will interact interestingly with inlining.

From what I could gather it really has to be a function attribute, and this is the relevant test in the LLVM code https://github.com/llvm/llvm-project/blob/fd254e429ea103be8bab6271855c04919d33f9fb/llvm/lib/Analysis/IVDescriptors.cpp#L590 (link edited)

yuyichao · 2019-04-29T17:16:36Z

Can't this be turned on automatically by @fastmath?

It should be already without this.

yuyichao · 2019-04-29T17:19:19Z

it really has to be a function argument,

http://llvm.org/docs/LangRef.html#fast-math-flags

nlw0 · 2019-04-29T17:22:21Z

it really has to be a function argument,

http://llvm.org/docs/LangRef.html#fast-math-flags

These are instruction flags, this function attribute is something separate, apparently very little used. Please check #31442 for the context of this PR.

vchuravy · 2019-04-29T17:39:26Z

This is for fmax, fmin right? It seems we are not treating them in @simd

julia/src/llvm-simdloop.cpp

Line 67 in 208d99c

static unsigned getReduceOpcode(Instruction *J, Instruction *operand)

yuyichao · 2019-04-29T18:13:30Z

These are instruction flags, this function attribute is something separate, apparently very little used. Please check #31442 for the context of this PR.

Yes, it's a different way to do the same thing so it is applicable to instructions just fine.

nlw0 · 2019-04-30T19:51:33Z

@yuyichao Setting the fast flag for operations enables indeed pretty much every optimization we might expect. The main contribution here is just noticing that for a few specific cases they are not sufficient to activate the LLVM loop vectorizer. Investigations on #31442 revealed LLVM requires this function attribute to be set in addition to work in this situation. This is what this PR does, what seems to have finally enabled vectorization for functions like minimum. What is proposed here is a (suitably) minimal solution based on this investigation. I am only starting to contribute to Julia, though, I don't know if this is a good solution, and it would be great to hear the opinion from experienced developers like yourself. Would you suggest a different approach?

yuyichao · 2019-04-30T21:06:36Z

That really seems like a LLVM bug, espeically since @fastmath doesn't do it so this should really be fixed in LLVM.

That is not to say we don't work around LLVM bugs, but all of those workaround/fixes are not user visible. I don't think we should introduce a new feature just to work around a LLVM bug.

So it comes to if we should have this new feature. There were indeed talk about exposing more fine grain fast math flag control to the user. It'll be nice if they could be all introduced at the same time though and AFAICT, function wise this one is not much more important than the other ones (I find afn and contract more useful FWIW). Also, I believe last time this comes up Keno was asking for the exact opposite (i.e. with nan bug with other fast math flags).

Another issue with doing this is that we areally don't need another one of these macros with completely different semantics... @fastmath already has a different semantics with @inbound and this introduces yet another one........ We are going to be stuck with the current @fastmath semantics for a while even though it's very inflexible so if we really want to add this function what we should do is to introduce a new version of @fastmath that has a similar implementation as @inbounds which can take flags very easily.

nlw0 · 2019-05-01T21:14:09Z

This definitely looks like an LLVM quirk, I can see very few instances where these flags are used that might impact things like optimization. Unfortunately it's still there in 9.0 and I guess we should find a way to deal with it.

Clang seems mostly focused in controlling things from command line arguments, and this sits well with this API. That's probably why this was used even though it's unusual. Julia on the other hand seems to focus on fine control along the code without offering a lot of control over every single optimization. I personally find this very good, and I think what clang does seems more useful to debug the optimizations than anything else. Although I can also imagine this kind of control might be a life-saver sometimes.

My current proposal was just to illustrate a working solution, I didn't really mean to introduce a new macro. Activating this via @fastmath seems perfect to me. I was afraid this might be considered too big of a change, but now I'm thinking one might even consider the current behavior just a bug/missing feature in @fastmath, since an expected behavior is not observed. I also needed the meta node anyway since @fastmath doesn't seem to create that, so I just replicated what I saw from other similar uses of the meta node in the codebase, and naturally defined a macro with the same name. It doesn't have to be like that.

It might not even really require a meta-node too, maybe this could be passed as an argument to the emit_function, although using the meta node seems much more practical, flexible and even elegant. I also don't like the profusion of macros and symbols, though... Maybe this could be a meta node that can take arguments that just set the LLVM function attributes, really directly mapping the arguments to them? And then given @fastmath on a function we either have a simple meta node, or something where we set this attribute and any other ones related to fast math mode (althought they are not really used as far as i know). So either a Expr(:meta, :fastmath) that activates no-nans-fp-math and whatever else is necessary, or some kind of Expr(:meta, :llvmfunattr, "no-nans-fp-math", "no-infs-fp-math", "use-soft-float")... By the way, even on clang they don't seem sure which function attributes are important to set: https://github.com/llvm-mirror/clang/blob/6e0cfff80b3c22e0ee832692042faabce09e4e89/lib/CodeGen/CGCall.cpp#L1743

Although we might do all this, it does not actually seem necessary for #31442 and #30320, what is really my main goal. Also, controlling these function attributes doesn't seem to me to closely relate to increasing control over the instruction flags. These are all factors related to fast math mode, but somehow orthogonal to each other, and I really don't feel capable to comment on other feature requests relative to fast math mode anyway. I hope I can just be given some lines along which I can continue to work on this solution, and I will gladly follow the path the community indicates me to be most productive use of my time.

My proposals are then either 1. this PR the way it is, 2. activate a similar meta node from @fastmath instead or 3. work on a more generic way to perform this low-level control of LLVM function attributes. Do any of these look fine? I guess what you propose is kind of between 2 and 3?

vchuravy · 2019-05-01T21:34:43Z

Just to raise this again, how do function attributes interact with Julia inlining. Part of the @fastmath and @inbounds macro is a guarantee that they operate on the local scope (except through explicit opt-in in the case of inbounds).

So what happens when we inline a method that has a meta node for a function attribute/what happens when we inline into a function that has this attribute.

So if a method relies on the semantics that we normally guarantee and we inline it into a context in which those semantics we're relaxed, is that okay? In the case of @simd we had to learn the hard way that it wasn't and that it was not correct in generic code. That is why @simd currently is a compromise, we enable only the optimizations that we believe are safe (minimally changing in behavior) and per default turn off those that have proven themselves to be harmful.

So if we can find a local solution that achieves the same thing I would strongly prefer that. If the function attribute is the only way forward we need to document clearly what the semantics are with respect to inlining and give guidance whether this is expected to work on generic code.

nlw0 · 2019-05-03T21:39:38Z

It sounds to me like Julia might have made some design decisions that are great despite having some conflicts with how LLVM was built. I certainly applaud striving for the best language we can build. If a proper fix to this whole thing actually involves submitting non-trivial patches to LLVM, though, this is definitely beyond my abilities at the moment.

I don't suppose we would like to just give up on optimizing fcmp loops. Unrolling them by ourselves like we have to do now doesn't look great either.

Being able to explore all the potential of LLVM probably requires us having tools that will contradict these guaranteed semantics. At least while we don't figure a way around it. Additionally, having fine control may also contradict the desire to minimize introducing language features such as new meta nodes with specific semantics. Maybe it's one of those "the great tragedy of science..." situations.

I cannot really see a good solution for this issue that works much different that this one. I also don't see how this might be easily offered as a nice feature widely available to all Julia users. I still believe it might be nice to offer it as an obscure feature so we at least can have optimized versions of minimum etc, and the optimization will just not be activated anywhere else.

Maybe we can just move it into a "non-recommended" module, put an underscore somewhere in the macro name and maybe rename it to something more appropriate?

nlw0 · 2019-05-04T09:37:16Z

Maybe the really best solution would be to rely on the LLVM intrinsics minnum and minnan (renamed minimum), which might not require the function attribute. I had no success with it at first but I'll give it another try.

nlw0 · 2019-05-06T23:20:14Z

I've studied the code a bit more and wrote the LLVM dev list. It looks like we really may just fix this upstream. We should probably just close this PR once a concrete solution is available there.

Peiffap · 2020-05-03T16:59:11Z

@nlw0 It's been nearly a year since that last message. Has this been fixed upstream?

nlw0 · 2020-05-04T12:00:04Z

The state, as I understand, is that they decided to do a large change in LLVM where flags such as fastmath will be tied to parameters, and not functions or instructions. They have started doing the changes already, although unfortunately I could not follow closely what was going on anymore. This sounds like a pretty major thing that will impact Julia in more ways, and I guess some Julia core developers must be aware of it?

I'm not sure what version of LLVM will have this change. I imagine once Julia is changed to use this new LLVM version we will naturally get this specific issue solved. Function flags should become either optional or obsolete then.

codecov · 2023-08-27T23:02:24Z

Codecov Report

Merging #31862 (6d02fe5) into master (8cc2f12) will decrease coverage by 0.02%.
Report is 10522 commits behind head on master.
The diff coverage is n/a.

❗ Current head 6d02fe5 differs from pull request most recent head 65c6cc4. Consider uploading reports for the commit 65c6cc4 to get more accurate results

@@            Coverage Diff             @@
##           master   #31862      +/-   ##
==========================================
- Coverage   81.31%   81.30%   -0.02%     
==========================================
  Files         372      372              
  Lines       63023    62988      -35     
==========================================
- Hits        51250    51214      -36     
- Misses      11773    11774       +1

see 8 files with indirect coverage changes

Implements nonans decorator to activate the LLVM no-nans-fp-math func…

cd43b09

…tion attribute

vchuravy reviewed Apr 28, 2019

View reviewed changes

base/expr.jl Show resolved Hide resolved

base/compiler/ssair/legacy.jl Outdated Show resolved Hide resolved

base/expr.jl Show resolved Hide resolved

nlw0 added 3 commits April 30, 2019 20:18

Merge branch 'master' into nic/nonans

d1bc0a5

Alternative _meta macro for @nonans

5817e85

Test for @nonans activating corresponding LLVM function attribute

65c6cc4

yuyichao mentioned this pull request Jun 11, 2020

@fastmath log is slower #36214

Closed

musm mentioned this pull request Jun 11, 2020

Remove at-fastmath from base #36246

Open

N5N3 mentioned this pull request Jan 22, 2022

Help LLVM better vectorize reduction over skipmissing #43859

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implements the @nonans decorator to enable some LLVM vectorizations #31862

Implements the @nonans decorator to enable some LLVM vectorizations #31862

nlw0 commented Apr 28, 2019 •

edited

Loading

vchuravy left a comment

antoine-levitt commented Apr 29, 2019

nlw0 commented Apr 29, 2019 •

edited

Loading

nlw0 commented Apr 29, 2019 •

edited

Loading

yuyichao commented Apr 29, 2019

yuyichao commented Apr 29, 2019

nlw0 commented Apr 29, 2019 •

edited

Loading

vchuravy commented Apr 29, 2019

yuyichao commented Apr 29, 2019

nlw0 commented Apr 30, 2019

yuyichao commented Apr 30, 2019

nlw0 commented May 1, 2019 •

edited

Loading

vchuravy commented May 1, 2019

nlw0 commented May 3, 2019

nlw0 commented May 4, 2019

nlw0 commented May 6, 2019

Peiffap commented May 3, 2020

nlw0 commented May 4, 2020 •

edited

Loading

codecov bot commented Aug 27, 2023

Implements the @nonans decorator to enable some LLVM vectorizations #31862

Are you sure you want to change the base?

Implements the @nonans decorator to enable some LLVM vectorizations #31862

Conversation

nlw0 commented Apr 28, 2019 • edited Loading

vchuravy left a comment

Choose a reason for hiding this comment

antoine-levitt commented Apr 29, 2019

nlw0 commented Apr 29, 2019 • edited Loading

nlw0 commented Apr 29, 2019 • edited Loading

yuyichao commented Apr 29, 2019

yuyichao commented Apr 29, 2019

nlw0 commented Apr 29, 2019 • edited Loading

vchuravy commented Apr 29, 2019

yuyichao commented Apr 29, 2019

nlw0 commented Apr 30, 2019

yuyichao commented Apr 30, 2019

nlw0 commented May 1, 2019 • edited Loading

vchuravy commented May 1, 2019

nlw0 commented May 3, 2019

nlw0 commented May 4, 2019

nlw0 commented May 6, 2019

Peiffap commented May 3, 2020

nlw0 commented May 4, 2020 • edited Loading

codecov bot commented Aug 27, 2023

Codecov Report

nlw0 commented Apr 28, 2019 •

edited

Loading

nlw0 commented Apr 29, 2019 •

edited

Loading

nlw0 commented Apr 29, 2019 •

edited

Loading

nlw0 commented Apr 29, 2019 •

edited

Loading

nlw0 commented May 1, 2019 •

edited

Loading

nlw0 commented May 4, 2020 •

edited

Loading