-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LLVM intrinsics for floor/ceil/trunc/abs. #8364
Conversation
Perhaps best to have all the tests related to the same function in the same place - in numbers.jl. |
Of course it's a bit silly to have a |
869b764
to
327f62e
Compare
I added tests to numbers.jl. Is there a form of |
There is the |
I'd prefer to have something that I can just use in place of |
Would If you're testing floating point numbers, don't you want to use |
Goldilocks would say the problem is a bear:
|
It's not unheard-of to write something like
|
What's the syntax for using "Base.==" from inside the local definition of |
Ugh, this makes me wince, but I believe the only working syntax for that is |
Now I'm wincing less at what used earlier this morning:
|
327f62e
to
3848b4c
Compare
I would rather alias it (or use another equivalence operator that isn't used yet) let ≈(x, y) = ==(x,y), ==(x, y) = isequal(x,y) && typeof(x) ≈ typeof(y)
...
end |
3848b4c
to
b389e26
Compare
|
Fantastic, thanks for doing this: while you're at it, would it be possible to also add a rint or nearbyint function? (in case you're wondering, they only differ by how they raise floating-point exceptions). At the moment, there is no other way to get round-to-even semantics, which is required to implement |
Also, there is an llvm intrinsic for |
b389e26
to
bfc600f
Compare
I've added used of LLVM 3.4 (and later) Per remarks for #5983, what would be the Julia interface for using |
I think I have a solution for
Please check this proof of correctness:
With this definition of rounding, and LLVM Of course you can cheat and do the proof by exhaustive search. |
Here's a version with lower latency, since the
The rewrite of
|
That's very clever. |
The logic seems correct to me. The nice thing about function round(x::Float32)
y = trunc(x)
ifelse(x==y,y,trunc(2*x-y))
end
function check_round()
for u in 0x0000_0000:0xffff_ffff
x = reinterpret(Float32,u)
isequal(round(x),Base.round(x)) || error("Invalid round: ", x)
end
end
check_round() which passes okay (and only takes 77 seconds on my underpowered laptop). EDIT: Oops, I see you've done that already. |
Very nice. I do love being able to check every value. It's hard to trust anything else – proofs can be wrong! |
Thanks @simonbyrne for showing how to do the exhaustive proof in Julia. I has been using a C variant because I didn't know about I'll update the pull request with the fast |
bfc600f
to
81c614a
Compare
LLVM can't vectorize |
I can never remember the differences between |
@@ -1312,6 +1312,21 @@ for x = 2^24-10:2^24+10 | |||
@test iceil(y) == i | |||
end | |||
|
|||
# rounding vectors | |||
let ≈(x,y) = x==y || typeof(x)==typeof(y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean &&
instead of ||
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should be &&
. The ||
is a mistake left over from my earlier !=
work-around.
Aren't different rounding modes required for doing interval arithmetic correctly? |
Different rounding modes are required for interval arithmetic, but it requires changing the mode frequently. E.g., a+b requires a "round up +" and a "round down +" operation. Multiplication gets trickier. So lexical scoping, or explicitly writing the operations with different names, seems more appropriate than having global state control. |
01101d7
to
7bba8a1
Compare
At the risk of getting sidetracked: according to the standard, rounding mode handling doesn't have to be always dynamically scoped, there just has to be a way to set it so that it is dynamically scoped (precedence of these is also language-defined). C just happened to only implement the dynamic-mode interface, and other languages seem to have just copied that. |
2ef98c5
to
0388647
Compare
96fc421
to
c06a3c0
Compare
4b07849
to
7e58dfa
Compare
7e58dfa
to
a5429b0
Compare
913d8fd
to
1f26721
Compare
1f26721
to
2a30d71
Compare
Add tests for vector trunc, round, floor, ceil. Add fast algorithm for round.
2a30d71
to
bde8f79
Compare
Holiday clearance time 😃 I think this PR is in a good state to commit, and (as noted before) if |
+1 |
Add LLVM intrinsics for floor/ceil/trunc/abs.
floor(x::Float64) = ccall((:floor, Base.libm_name), Float64, (Float64,), x) | ||
function round(x::Float64) | ||
y = trunc(x) | ||
ifelse(x==y,y,trunc(2.0*x-y)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realise I should have said this beforehand, but won't 2.0*x
overflow for large values of x
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes 2.0*x
can overlow. But in that case, x
is so large that x==trunc(x)
, so the other arm of the ifelse
is taken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Large values of x
are already integers so x == y
and y
is returned. At least that's my analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, of course.
Sweet! |
This PR adds intrinsics for floor/ceil/trunc. For this example:
with LLVM 3.5 I'm seeing an improvement of about 28x on a Haswell processor for
Float32 arrays
. ForFloat64
arrays, I'm getting about 4x improvement. Even with just LLVM 3.3, which can't vectorizefloor
, I'm seeing about 2.8x improvement for bothFloat32
andFloat64
.The patch improves the "vector"
floor
frombase/math.jl
by about 1.4x (Float32
) or 1.8x (Float64
) using LLVM 3.3.While writing this patch, I discovered that the tests do not test whether "vector"
floor
,ceil
, andtrunc
work at all. I'm happy to write such tests. I'm just not sure whether they belong intest/math.jl
ortest/numbers.jl
. Latter is where the scalar tests forfloor
,ceil
, andtrunc
are.