Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Halide::round behave as documented #7012

Merged
merged 41 commits into from
Sep 26, 2022
Merged

Make Halide::round behave as documented #7012

merged 41 commits into from
Sep 26, 2022

Conversation

abadams
Copy link
Member

@abadams abadams commented Sep 13, 2022

99% of the time, Halide::round does what it is documented to do, and what our tests verify that it does: It rounds to the nearest integer, with ties rounding to even.

However, if you have unvectorized code, and have been messing with the rounding mode flags, then it can behave differently. For vectorized code (on x86 at least) it explicitly uses the round-to-nearest-even flag on the roundps instruction.

This PR bumps that to 100%, by explicitly making it round ties to even in all cases, matching our docs and tests.

The current behavior just gave me a nasty Heisenbug where a test was passing on my dev machine and failing in CI because the rounding mode flag was somehow not the default one in the CI process.

Note that this means that Halide::round continues to behave differently to std::round on ties. Another option here would be to make the Halide rounding functions match the C standard library, excluding those that have behavior dependent on the rounding mode. This has a few problems though. First, it would break a bunch of Halide code, including our tests. Second, std::round makes the dubious choice to round ties away from zero, which is the one rounding mode not supported by x86 vector instructions. It's slow to emulate. I don't think we want to hand users that foot-gun. Third, the only way to round ties to even with the C standard library is by first setting the rounding mode flag. So if you exclude all problematic C standard library rounding modes, you're just left with ceil and floor. There's no sane round-to-nearest function. In our C backend implementation of round I just explicitly set the rounding mode and call nearbyintf.

I also looked to other languages for inspiration and discovered that in GLSL, round handles ties in an implementation-defined way, so (╯°□°)╯︵ ┻━┻

Not 100% sure I've done the right thing for hexagon, as it seems to have code that ensures a nearbyint implementation is available. I'm hoping CI will tell me if I got it wrong.

@steven-johnson
Copy link
Contributor

Lots of broken things

@abadams
Copy link
Member Author

abadams commented Sep 14, 2022

Yeah, the breakages are pre-existing: I added testing for tie-breaking behavior for our GPU backends, and it turns out they're all inconsistent right now. I'll work through them.

@abadams
Copy link
Member Author

abadams commented Sep 14, 2022

I think the llvm 16 breakages are unrelated though

@steven-johnson
Copy link
Contributor

Adding @pranavb-ca for HVX, @vksnk for C backend

@steven-johnson
Copy link
Contributor

steven-johnson commented Sep 14, 2022

I think the llvm 16 breakages are unrelated though

Don't think so -- main tests cleanly for me on local x86-64-linux with a freshly built LLVM16

EDIT: or the LLVM nightlies on the buildbots got a broken pull maybe? Will rebuild them.

@steven-johnson
Copy link
Contributor

At this point, all the buildbots should have rebuilt LLVM in place, so I think you have a real bug here

@steven-johnson
Copy link
Contributor

On linux correctness_math:

dlopenbuf failed
undefined PLT symbol roundevenf (30) ./libhalide_kernels30.soterminate called after throwing an instance of 'Halide::RuntimeError'

@steven-johnson
Copy link
Contributor

still failures that seem like non-flakes :-/

@abadams
Copy link
Member Author

abadams commented Sep 15, 2022

Seems to be wasm float16 related. Does wasm support float16 at all?

@steven-johnson
Copy link
Contributor

Seems to be wasm float16 related.

That doesn't explain (say) correctness_vector_reductions failing on 32-bit Windows (where we never test wasm): https://buildbot.halide-lang.org/master/#/builders/7/builds/386

Does wasm support float16 at all?

Nope.

@abadams
Copy link
Member Author

abadams commented Sep 15, 2022

Yeah, it doesn't explain all the failures. There's also something weird going on with vector_reductions, even though that should be unaffected. One bug at a time...

@steven-johnson
Copy link
Contributor

FYI: on OSX, vector_reductions eventually fails with HalideJITMemoryManager: unable to find address for _roundevenf (when running without sse41 or other simd)... unfortunately the error gets swallowed by "unhandled exception"

@steven-johnson
Copy link
Contributor

On osx-arm:

FAILED TEST: vector_reductions
vector_reductions: Testing with target(arm-64-osx-jit)
Intrinsic name not mangled correctly for type arguments! Should be: llvm.aarch64.neon.uaddlp.v1i64.v2i32
ptr @llvm.aarch64.neon.uaddlp.i64.v2i32
Internal Error at /Users/halidenightly/build_bot/worker/halide-testbranch-main-llvm16-arm-64-osx-make/halide-source/src/CodeGen_LLVM.cpp:645 triggered by user code at : Condition failed: !verifyFunction(*function, &llvm::errs()): 

@steven-johnson
Copy link
Contributor

failing for host-d3d12compute :-/

@abadams
Copy link
Member Author

abadams commented Sep 22, 2022

Only remaining failure is one of the preexisting flakes on the windows builbot with the dodgy GPU

@steven-johnson
Copy link
Contributor

Only remaining failure is one of the preexisting flakes on the windows builbot with the dodgy GPU

winbot1? If so I'm just take that one offline. It's stupid slow as well.

@abadams
Copy link
Member Author

abadams commented Sep 22, 2022

Looks like win-worker-2. I noticed it flaking on AJ's PR as well: #6884

@steven-johnson
Copy link
Contributor

Looks like win-worker-2. I noticed it flaking on AJ's PR as well: #6884

Urg, so we basically have winbot1, which is less flaky but stupid-slow, and winbot2 which is slightly less slow but flaky, and winbot3, which is reasonable but we'd prefer not to have a single winbot

@steven-johnson
Copy link
Contributor

LGTM, let me pull into google3 for some torture testing

@steven-johnson
Copy link
Contributor

LGTM, let me pull into google3 for some torture testing

FYI, I'm failing on arm32-android builds with ld.lld: error: undefined symbol: roundevenf

@abadams
Copy link
Member Author

abadams commented Sep 22, 2022

Annoyingly, arm32 has a round-to-nearest-even instruction, but llvm doesn't seem to want to emit it. I'll just lower it to bitwise ops on that target.

@steven-johnson
Copy link
Contributor

OK, with that last change in place, i now have a fairly large number of failures that all seem to be variants on recursion-too-deep ending with SIGSEGV or similar; stand by

@steven-johnson
Copy link
Contributor

#0  0x000055555ada3e9a in Halide::Internal::Simplify::visit(Halide::Internal::Add const*, Halide::Internal::Simplify::ExprInfo*) ()
#1  0x000055555ad95ab8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#2  0x000055555ae0095c in Halide::Internal::Simplify::visit(Halide::Internal::Load const*, Halide::Internal::Simplify::ExprInfo*) ()
#3  0x000055555ad95bc8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#4  0x000055555aeaf457 in Halide::Internal::Simplify::visit(Halide::Internal::Mul const*, Halide::Internal::Simplify::ExprInfo*) ()
#5  0x000055555ad95ae8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#6  0x000055555add37b0 in Halide::Internal::Simplify::visit(Halide::Internal::Call const*, Halide::Internal::Simplify::ExprInfo*) ()
#7  0x000055555ad95be8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#8  0x000055555aec0c78 in Halide::Internal::Simplify::visit(Halide::Internal::Reinterpret const*, Halide::Internal::Simplify::ExprInfo*) ()
#9  0x000055555ad95a9e in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#10 0x000055555adcf7d5 in Halide::Internal::Simplify::visit(Halide::Internal::Call const*, Halide::Internal::Simplify::ExprInfo*) ()
#11 0x000055555ad95be8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#12 0x000055555adcf9dc in Halide::Internal::Simplify::visit(Halide::Internal::Call const*, Halide::Internal::Simplify::ExprInfo*) ()
#13 0x000055555ad95be8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#14 0x000055555aee8d55 in Halide::Internal::Simplify::visit(Halide::Internal::Sub const*, Halide::Internal::Simplify::ExprInfo*) ()
#15 0x000055555ad95ac8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide--Type <RET> for more, q to quit, c to continue without paging--
::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#16 0x000055555aec0c78 in Halide::Internal::Simplify::visit(Halide::Internal::Reinterpret const*, Halide::Internal::Simplify::ExprInfo*) ()
#17 0x000055555ad95a9e in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#18 0x000055555ada3edb in Halide::Internal::Simplify::visit(Halide::Internal::Add const*, Halide::Internal::Simplify::ExprInfo*) ()
#19 0x000055555ad95ab8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#20 0x000055555adda236 in Halide::Internal::Simplify::visit(Halide::Internal::Cast const*, Halide::Internal::Simplify::ExprInfo*) ()
#21 0x000055555ad95a91 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#22 0x000055555adcf7d5 in Halide::Internal::Simplify::visit(Halide::Internal::Call const*, Halide::Internal::Simplify::ExprInfo*) ()
#23 0x000055555ad95be8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#24 0x000055555adda236 in Halide::Internal::Simplify::visit(Halide::Internal::Cast const*, Halide::Internal::Simplify::ExprInfo*) ()
#25 0x000055555ad95a91 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#26 0x000055555aee8d7b in Halide::Internal::Simplify::visit(Halide::Internal::Sub const*, Halide::Internal::Simplify::ExprInfo*) ()
#27 0x000055555ad95ac8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#28 0x000055555adcf7d5 in Halide::Internal::Simplify::visit(Halide::Internal::Call const*, Halide::Internal::Simplify::ExprInfo*) ()
#29 0x000055555ad95be8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#30 0x000055555adcf7d5 in Halide::Internal::Simplify::visit(Halide::Internal::Call const*, Halide::Internal::Simplify::ExprInfo*) ()
--Type <RET> for more, q to quit, c to continue without paging--
#31 0x000055555ad95be8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#32 0x000055555aec0c78 in Halide::Internal::Simplify::visit(Halide::Internal::Reinterpret const*, Halide::Internal::Simplify::ExprInfo*) ()
#33 0x000055555ad95a9e in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#34 0x000055555aee8d7b in Halide::Internal::Simplify::visit(Halide::Internal::Sub const*, Halide::Internal::Simplify::ExprInfo*) ()
#35 0x000055555ad95ac8 in Halide::Expr Halide::Internal::VariadicVisitor<Halide::Internal::Simplify, Halide::Expr, Halide::Internal::Stmt>::dispatch_expr<Halide::Internal::Simplify::ExprInfo*&>(Halide::Internal::BaseExprNode const*, Halide::Internal::Simplify::ExprInfo*&) ()
#36 0x000055555ad94a88 in Halide::Internal::simplify(Halide::Expr const&, bool, Halide::Internal::Scope<Halide::Internal::Interval> const&, Halide::Internal::Scope<Halide::Internal::ModulusRemainder> const&) ()
#37 0x000055555aada86b in Halide::Internal::lower_round_to_nearest_ties_to_even(Halide::Expr const&) ()
#38 0x000055555aae6ea4 in Halide::Internal::(anonymous namespace)::CodeGen_ARM::visit(Halide::Internal::Call const*) ()
#39 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#40 0x000055555aab4c64 in Halide::Internal::CodeGen_LLVM::visit(Halide::Internal::Let const*) ()
#41 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#42 0x000055555aae6eb3 in Halide::Internal::(anonymous namespace)::CodeGen_ARM::visit(Halide::Internal::Call const*) ()
#43 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#44 0x000055555aab4c64 in Halide::Internal::CodeGen_LLVM::visit(Halide::Internal::Let const*) ()
#45 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#46 0x000055555aae6eb3 in Halide::Internal::(anonymous namespace)::CodeGen_ARM::visit(Halide::Internal::Call const*) ()
#47 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#48 0x000055555aab4c64 in Halide::Internal::CodeGen_LLVM::visit(Halide::Internal::Let const*) ()
#49 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
--Type <RET> for more, q to quit, c to continue without paging--
#50 0x000055555aae6eb3 in Halide::Internal::(anonymous namespace)::CodeGen_ARM::visit(Halide::Internal::Call const*) ()
#51 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#52 0x000055555aab4c64 in Halide::Internal::CodeGen_LLVM::visit(Halide::Internal::Let const*) ()
#53 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#54 0x000055555aae6eb3 in Halide::Internal::(anonymous namespace)::CodeGen_ARM::visit(Halide::Internal::Call const*) ()
#55 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#56 0x000055555aab4c64 in Halide::Internal::CodeGen_LLVM::visit(Halide::Internal::Let const*) ()
#57 0x000055555aa9a52a in Halide::Internal::CodeGen_LLVM::codegen(Halide::Expr const&) ()
#58 0x000055555aae6eb3 in Halide::Internal::(anonymous namespace)::CodeGen_ARM::visit(Halide::Internal::Call const*) ()

...

#63127 0x000055555aab4d86 in Halide::Internal::CodeGen_LLVM::visit(Halide::Internal::LetStmt const*) ()
#63128 0x000055555aa9ca73 in Halide::Internal::CodeGen_LLVM::codegen(Halide::Internal::Stmt const&) ()
#63129 0x000055555aab7e89 in Halide::Internal::CodeGen_LLVM::visit(Halide::Internal::Block const*) ()
#63130 0x000055555aa975c8 in Halide::Internal::CodeGen_LLVM::compile_func(Halide::Internal::LoweredFunc const&, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char> > const&, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char> > const&) ()
#63131 0x00007ffff7d33aa0 in ?? () from /usr/grte/v5/lib64/libc.so.6
#63132 0x00007fffffffb490 in ?? ()
#63133 0x0000000000000000 in ?? ()

@steven-johnson
Copy link
Contributor

@vksnk -- there's an xtensa-related issue that is (hopefully) the final blocker here; LMK when you are back in the office. (Basically: we are missing a float32->uint32 vector reinterpret, or, better yet, a vector round-to-even)

@steven-johnson
Copy link
Contributor

OK, I think we're good to go.

@steven-johnson
Copy link
Contributor

I want to wait for LLVM on x86-32 to get fixed before we land this; a proposed fix is pending now, so hopefully only another day or so

@abadams abadams merged commit d5a8118 into main Sep 26, 2022
ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024
* Clean up some pointless code

* Improve comment on Halide::round

* Make Halide::round round to even as documented

* Explicitly set the rounding mode in the C backend

* Use rint on ptx, which is documented to round to even

* round to even on win32

* the nvidia libdevice is buggy for doubles

See https://reviews.llvm.org/D85236

* Add missing include to C output

* Fix rounding in opencl

* Don't test opencl with doubles if CLDoubles is not enabled

* Work around hexagon issue

* Don't try to emit roundeven on wasm

* wasm doesn't support float16

* Add vectorizable lowering for round on platforms without roundeven

* Use rint on metal for Halide::round

* Make round an intrinsic

* Constant-fold round in simplifier

* d3d12 fix

* Bounds of Call::round

* Teach the mullapudi cost model about round

* Handle PureIntrinsics of const args in bounds

* scatter, undef, and require aren't pure

* metal doesn't support doubles

* More parens

* Add missing return

* Add vector versions of rint for wasm

* Use nearbyint for wasm instead of rint

* revert change to mangling

* d3d12 doesn't like double input/output buffers

* Lower round on arm-32 not-linux

* Don't simplify lowering of round-to-nearest-ties-to-even in codegen

* Fix infinite loop in round lowering on arm-32-notlinux

* Take care to never revisit args in bounds call visitor

* Remove defunct comment

Co-authored-by: Steven Johnson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants