-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Halide::round behave as documented #7012
Conversation
Lots of broken things |
Yeah, the breakages are pre-existing: I added testing for tie-breaking behavior for our GPU backends, and it turns out they're all inconsistent right now. I'll work through them. |
I think the llvm 16 breakages are unrelated though |
Adding @pranavb-ca for HVX, @vksnk for C backend |
Don't think so -- EDIT: or the LLVM nightlies on the buildbots got a broken pull maybe? Will rebuild them. |
At this point, all the buildbots should have rebuilt LLVM in place, so I think you have a real bug here |
On linux
|
still failures that seem like non-flakes :-/ |
Seems to be wasm float16 related. Does wasm support float16 at all? |
That doesn't explain (say) correctness_vector_reductions failing on 32-bit Windows (where we never test wasm): https://buildbot.halide-lang.org/master/#/builders/7/builds/386
Nope. |
Yeah, it doesn't explain all the failures. There's also something weird going on with vector_reductions, even though that should be unaffected. One bug at a time... |
FYI: on OSX, vector_reductions eventually fails with |
On osx-arm:
|
failing for |
Only remaining failure is one of the preexisting flakes on the windows builbot with the dodgy GPU |
winbot1? If so I'm just take that one offline. It's stupid slow as well. |
Looks like win-worker-2. I noticed it flaking on AJ's PR as well: #6884 |
Urg, so we basically have winbot1, which is less flaky but stupid-slow, and winbot2 which is slightly less slow but flaky, and winbot3, which is reasonable but we'd prefer not to have a single winbot |
LGTM, let me pull into google3 for some torture testing |
FYI, I'm failing on arm32-android builds with |
Annoyingly, arm32 has a round-to-nearest-even instruction, but llvm doesn't seem to want to emit it. I'll just lower it to bitwise ops on that target. |
OK, with that last change in place, i now have a fairly large number of failures that all seem to be variants on recursion-too-deep ending with SIGSEGV or similar; stand by |
|
@vksnk -- there's an xtensa-related issue that is (hopefully) the final blocker here; LMK when you are back in the office. (Basically: we are missing a float32->uint32 vector reinterpret, or, better yet, a vector round-to-even) |
OK, I think we're good to go. |
I want to wait for LLVM on x86-32 to get fixed before we land this; a proposed fix is pending now, so hopefully only another day or so |
* Clean up some pointless code * Improve comment on Halide::round * Make Halide::round round to even as documented * Explicitly set the rounding mode in the C backend * Use rint on ptx, which is documented to round to even * round to even on win32 * the nvidia libdevice is buggy for doubles See https://reviews.llvm.org/D85236 * Add missing include to C output * Fix rounding in opencl * Don't test opencl with doubles if CLDoubles is not enabled * Work around hexagon issue * Don't try to emit roundeven on wasm * wasm doesn't support float16 * Add vectorizable lowering for round on platforms without roundeven * Use rint on metal for Halide::round * Make round an intrinsic * Constant-fold round in simplifier * d3d12 fix * Bounds of Call::round * Teach the mullapudi cost model about round * Handle PureIntrinsics of const args in bounds * scatter, undef, and require aren't pure * metal doesn't support doubles * More parens * Add missing return * Add vector versions of rint for wasm * Use nearbyint for wasm instead of rint * revert change to mangling * d3d12 doesn't like double input/output buffers * Lower round on arm-32 not-linux * Don't simplify lowering of round-to-nearest-ties-to-even in codegen * Fix infinite loop in round lowering on arm-32-notlinux * Take care to never revisit args in bounds call visitor * Remove defunct comment Co-authored-by: Steven Johnson <[email protected]>
99% of the time, Halide::round does what it is documented to do, and what our tests verify that it does: It rounds to the nearest integer, with ties rounding to even.
However, if you have unvectorized code, and have been messing with the rounding mode flags, then it can behave differently. For vectorized code (on x86 at least) it explicitly uses the round-to-nearest-even flag on the roundps instruction.
This PR bumps that to 100%, by explicitly making it round ties to even in all cases, matching our docs and tests.
The current behavior just gave me a nasty Heisenbug where a test was passing on my dev machine and failing in CI because the rounding mode flag was somehow not the default one in the CI process.
Note that this means that Halide::round continues to behave differently to std::round on ties. Another option here would be to make the Halide rounding functions match the C standard library, excluding those that have behavior dependent on the rounding mode. This has a few problems though. First, it would break a bunch of Halide code, including our tests. Second, std::round makes the dubious choice to round ties away from zero, which is the one rounding mode not supported by x86 vector instructions. It's slow to emulate. I don't think we want to hand users that foot-gun. Third, the only way to round ties to even with the C standard library is by first setting the rounding mode flag. So if you exclude all problematic C standard library rounding modes, you're just left with ceil and floor. There's no sane round-to-nearest function. In our C backend implementation of round I just explicitly set the rounding mode and call nearbyintf.
I also looked to other languages for inspiration and discovered that in GLSL, round handles ties in an implementation-defined way, so (╯°□°)╯︵ ┻━┻
Not 100% sure I've done the right thing for hexagon, as it seems to have code that ensures a nearbyint implementation is available. I'm hoping CI will tell me if I got it wrong.