Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LLVM intrinsics for floor/ceil/trunc/abs. #8364

Merged
merged 1 commit into from
Dec 9, 2014

Conversation

ArchRobison
Copy link
Contributor

This PR adds intrinsics for floor/ceil/trunc. For this example:

function f(b,a)
    @simd for i=1:length(a)
        @inbounds b[i]=floor(a[i])
    end
end

with LLVM 3.5 I'm seeing an improvement of about 28x on a Haswell processor for Float32 arrays. For Float64 arrays, I'm getting about 4x improvement. Even with just LLVM 3.3, which can't vectorize floor, I'm seeing about 2.8x improvement for both Float32 and Float64.

The patch improves the "vector" floor from base/math.jl by about 1.4x (Float32) or 1.8x (Float64) using LLVM 3.3.

While writing this patch, I discovered that the tests do not test whether "vector" floor, ceil, and trunc work at all. I'm happy to write such tests. I'm just not sure whether they belong in test/math.jl or test/numbers.jl. Latter is where the scalar tests for floor, ceil, and trunc are.

@ViralBShah
Copy link
Member

Perhaps best to have all the tests related to the same function in the same place - in numbers.jl.

@JeffBezanson
Copy link
Member

Of course it's a bit silly to have a test/math.jl; everything is math :) Could be separated into specfun.jl and a couple other things maybe.

@ArchRobison
Copy link
Contributor Author

I added tests to numbers.jl. Is there a form of == that tests not only for numerical equality, but type equality? The tests that I added don't check the return type of vector trunc, round, etc.

@jakebolewski
Copy link
Member

There is the @inferred macro in base/test.jl.

@ArchRobison
Copy link
Contributor Author

I'd prefer to have something that I can just use in place of ==, so that I don't have have write twice as many lines of test (a line for value checking and a line for type checking). It's a feature that cuts across other tests, so is probably best not tangled with this PR. I'll leave my tests as is for now.

@eschnett
Copy link
Contributor

Would === check type equality as well?

If you're testing floating point numbers, don't you want to use isequal instead of ==?

@ArchRobison
Copy link
Contributor Author

Goldilocks would say the problem is a bear:

  • == is too lenient: it checks only numerical equality, not type equality.
  • === is too picky: it distinguishes arrays by address.
  • isequal is both too picky and too lenient: it distinguishes -0 from 0, but considers Float32[0,0,0] to be equal to Float64[0,0,0])

@JeffBezanson
Copy link
Member

It's not unheard-of to write something like

let ==(x,y) = isequal(x,y) && typeof(x)==typeof(y)
    @test foo() == bar()
end

@ArchRobison
Copy link
Contributor Author

What's the syntax for using "Base.==" from inside the local definition of ==?

@JeffBezanson
Copy link
Member

Ugh, this makes me wince, but I believe the only working syntax for that is Base.(:(==)).

@ArchRobison
Copy link
Contributor Author

Now I'm wincing less at what used earlier this morning:

let ==(x,y) = !(x!=y || typeof(x)!=typeof(y))

@vchuravy
Copy link
Member

I would rather alias it (or use another equivalence operator that isn't used yet)

let (x, y) = ==(x,y),  ==(x, y) = isequal(x,y) && typeof(x)  typeof(y)
   ...
end

@ArchRobison
Copy link
Contributor Author

I liked the alias idea. I liked using another equivalence operator. PR updated to "wiggle" out of the problem.

@simonbyrne
Copy link
Contributor

Fantastic, thanks for doing this: while you're at it, would it be possible to also add a rint or nearbyint function? (in case you're wondering, they only differ by how they raise floating-point exceptions). At the moment, there is no other way to get round-to-even semantics, which is required to implement @printf hex-float format.

@simonbyrne
Copy link
Contributor

Also, there is an llvm intrinsic for round as well (which was the slowest of the bunch in #5983).

@ArchRobison
Copy link
Contributor Author

I've added used of LLVM 3.4 (and later) fabs intrinsic. It generates slightly better code than what Julia currently generates. Sadly, LLVM 3.4 copysign generated slightly better scalar code, but is not vectorizable by LLVM, even up to my copy of LLVM trunk. So I recommend not using the LLVM intrinsic until LLVM can vectorize it.

Per remarks for #5983, what would be the Julia interface for usingrint and nearbyint?

@ArchRobison
Copy link
Contributor Author

I think I have a solution for round: use Julia instead of calling C :-)

function round(x::Float32)
    y = trunc(x)
    ifelse(x==y,y,trunc(x-y+x))
end

Please check this proof of correctness:

  • If x is +/- 0, infinity, or an integer, then x==y is true and the ifelse returns y.
  • If x is a Nan, then y is the same NaN value. The NaN propagates through the trunc(x-y+x).
  • Otherwise x is finite. x-y will be computed exactly and have the same fractional part as x. When we add the x-y to x, the resulting sum is exact. To see this, note that we need at most one extra bit for the carry-out of the addition, but the least-significant 1 in the fractions sum to 10 (binary), so we can toss the 0 to save a bit without loss of accuracy. If the fraction's absolute value was less than 0.5, then the sum will not cross into the next integer, otherwise it will.

With this definition of rounding, and LLVM trunc intrinsic support, I am seeing 10x improvement in an @simd loop using LLVM 3.5, and some improvement for scalar execution even with LLVM 3.3.

Of course you can cheat and do the proof by exhaustive search.

@ArchRobison
Copy link
Contributor Author

Here's a version with lower latency, since the 2*x can issue while trunc(x) is running.

function round(x::Float32)
    y = trunc(x)
    ifelse(x==y,y,trunc(2*x-y))
end

The rewrite of x-y+x as 2*x-y okay because:

  • The computation of 2*x is exact when y!=trunc(x).
  • The subtraction in 2*x-y is exact, because it must have the same result as x-y+x rounded to nearest, and the proof of the previous version showed that x-y+x is computed exactly.

@timholy
Copy link
Member

timholy commented Sep 19, 2014

That's very clever.

@simonbyrne
Copy link
Contributor

+Inf

The logic seems correct to me. The nice thing about Float32 is that you can also just check all the values:

function round(x::Float32)
    y = trunc(x)
    ifelse(x==y,y,trunc(2*x-y))
end

function check_round()
    for u in 0x0000_0000:0xffff_ffff
        x = reinterpret(Float32,u)
        isequal(round(x),Base.round(x)) || error("Invalid round: ", x)
    end
end

check_round()

which passes okay (and only takes 77 seconds on my underpowered laptop).

EDIT: Oops, I see you've done that already.

@StefanKarpinski
Copy link
Member

Very nice. I do love being able to check every value. It's hard to trust anything else – proofs can be wrong!

@ArchRobison
Copy link
Contributor Author

Thanks @simonbyrne for showing how to do the exhaustive proof in Julia. I has been using a C variant because I didn't know about reinterpret. For the record, the Intel compiler's code for vectorized roundf is about 1.26x faster than the Julia code on a Haswell processor that I tried, but the icc code requires copysign, which LLVM can't vectorize.

I'll update the pull request with the fast round.

@eschnett
Copy link
Contributor

LLVM can't vectorize copysign? That's... very surprising; copysign is a rather simple function that is very easy to vectorize. One hopes that this changes soon.

@JeffBezanson
Copy link
Member

I can never remember the differences between rint and nearbyint. Those names are just terrible.

@@ -1312,6 +1312,21 @@ for x = 2^24-10:2^24+10
@test iceil(y) == i
end

# rounding vectors
let ≈(x,y) = x==y || typeof(x)==typeof(y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean && instead of || here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be &&. The || is a mistake left over from my earlier != work-around.

@jwmerrill
Copy link

Aren't different rounding modes required for doing interval arithmetic correctly?

@ArchRobison
Copy link
Contributor Author

Different rounding modes are required for interval arithmetic, but it requires changing the mode frequently. E.g., a+b requires a "round up +" and a "round down +" operation. Multiplication gets trickier. So lexical scoping, or explicitly writing the operations with different names, seems more appropriate than having global state control.

@simonbyrne
Copy link
Contributor

At the risk of getting sidetracked: according to the standard, rounding mode handling doesn't have to be always dynamically scoped, there just has to be a way to set it so that it is dynamically scoped (precedence of these is also language-defined). C just happened to only implement the dynamic-mode interface, and other languages seem to have just copied that.

Add tests for vector trunc, round, floor, ceil.
Add fast algorithm for round.
@ArchRobison
Copy link
Contributor Author

Holiday clearance time 😃 I think this PR is in a good state to commit, and (as noted before) if ccall is improved, we can redo ceil_llvm, floor_llvm, trunc_llvm, sqrt_llvm, powi_llvm, abs_float, and copysign_float.

@simonbyrne
Copy link
Contributor

+1

JeffBezanson added a commit that referenced this pull request Dec 9, 2014
Add LLVM intrinsics for floor/ceil/trunc/abs.
@JeffBezanson JeffBezanson merged commit 15fafce into JuliaLang:master Dec 9, 2014
floor(x::Float64) = ccall((:floor, Base.libm_name), Float64, (Float64,), x)
function round(x::Float64)
y = trunc(x)
ifelse(x==y,y,trunc(2.0*x-y))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise I should have said this beforehand, but won't 2.0*x overflow for large values of x?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes 2.0*x can overlow. But in that case, x is so large that x==trunc(x), so the other arm of the ifelse is taken.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large values of x are already integers so x == y and y is returned. At least that's my analysis.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, of course.

@ViralBShah
Copy link
Member

Sweet!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.