Fix LLVM speed regression #651

syrusakbary · 2019-08-09T07:07:23Z

When canonicalizing all the NaNs with the LLVM backend, the performance was slowed down 2x.

This PR fixes that by reverting the canonicalization

@nlewycky

syrusakbary · 2019-08-09T07:07:33Z

bors try

bors · 2019-08-09T07:36:06Z

try

Build succeeded

syrusakbary · 2019-08-13T00:23:03Z

Previous timing:

wasmer run --backend=llvm --disable-cache --enable-simd   18.95s user 0.08s system 99% cpu 19.114 total

With the changes from this PR:

/Users/syrusakbary/Development/wasmer/target/release/wasmer run --backend=llv  14.21s user 0.03s system 99% cpu 14.263 total

…een canonicalized and don't canonicalize subsequent calculation results from them.

nlewycky · 2019-08-13T00:32:20Z

bors try

bors · 2019-08-13T00:46:51Z

try

Build succeeded

…a canonical nan or not.

bjfish · 2019-08-16T01:14:50Z

lib/llvm-backend/src/code.rs

@@ -399,6 +408,31 @@ fn canonicalize_nans(
    canonicalized
 }

+// Replaces any NaN with the canonical QNaN, otherwise leaves the value alone.


Style idea: maybe consider the opposite name: make_canonical_nan.

penzn · 2019-08-16T17:32:54Z

Hi, I have reported the regression originally in wasmerio/c-wasm-simd128-example#1. If I reset master to commit right before 11f66d2 then I see the timings from the article, but timings from the branch in this PR is still similar to master. Is that me or are you seeing that too?

Hywan · 2019-10-31T10:08:51Z

@syrusakbary What's the status of this PR?

934: In LLVM backend, track which floats are guaranteed to be arithmetic, which makes the canonicalization a no-op. r=nlewycky a=nlewycky # Description This is a reimplementation of the patch in PR #651. Extend state.rs ExtraInfo to track more information about floats. In addition to tracking whether the value has a pending canonicalization of NaNs, also track whether the value is known to be arithmetic (which includes infinities, regular values, and non-signalling NaNs (aka. "arithmetic NaNs" in the webassembly spec)). When the value is arithmetic, the correct sequence of operations to canonicalize the value is a no-op. Therefore, we create a lattice where pending+arithmetic=arithmetic. Also, this extends the tracking to track all values, including non-SIMD integers. That's why there are more places where pending canonicalizations are applied. Looking at c-wasm-simd128-example, this provides no performance change to the non-SIMD case (takes 58s on my noisy dev machine). The SIMD case drops from 46s to 29s. # Review - [ ] Add a short description of the the change to the CHANGELOG.md file Co-authored-by: Nick Lewycky <[email protected]>

nlewycky · 2019-12-02T18:14:41Z

This PR is obsoleted by PR #883 and PR #934.

nlewycky · 2019-12-02T18:19:20Z

@penzn Yes, we noticed the same thing. It's a bit of a pain because LLVM doesn't say a lot about what happens with NaN values on operations, even as simple as an fadd instruction. It happened to work before, because it lowers to an addition that happens to preserve NaNs. Unfortunately, while the work I did to try to bring performance back did help on many examples, it hasn't helped to improve the c-wasm-simd128-example when built with SIMD. The primary difference there remains whether the LLVM autovectorizer manages to vectorize the fdiv instruction. That can be affected by many things, we've also changed the LLVM pass list to improve optimizations which should improve the chance that we get the vectorized fdiv in the simd example, but unfortunately we still don't. I'd still like to figure out how to get guaranteed correctness while preserving the performance, especially since it should be possible given that the x86-64 instructions give us what we need.

penzn · 2019-12-03T20:38:39Z

@nlewycky I just rebuilt the benchmark with -munimplemented-simd128, which leads to vector fdiv in the wasm module, and performance is the same as with scalar fdiv. It also looks relatively close to native (given that I keep caching on). Can you elaborate a bit on what you are seeing, I'd be happy to help.

Maybe we should open an issue for this.

CC @topperc

syrusakbary requested review from losfair, MarkMcCaskey and nlewycky as code owners August 9, 2019 07:07

bors bot added a commit that referenced this pull request Aug 9, 2019

Try #651:

cb35da5

syrusakbary mentioned this pull request Aug 9, 2019

Analysis of benchmark wasmerio/c-wasm-simd128-example#1

Open

nlewycky force-pushed the feature/llvm-nan-fix branch from 73ab8bf to 9303e5b Compare August 13, 2019 00:25

syrusakbary and others added 4 commits August 12, 2019 17:27

Revert 11f66d2

f21ff14

Skip canonicalization from LLVM

7f31bc0

Canonicalize NaN values once, track which stack values have already b…

ffb70be

…een canonicalized and don't canonicalize subsequent calculation results from them.

Turn tests back on.

12d5f6f

nlewycky force-pushed the feature/llvm-nan-fix branch from 9303e5b to 12d5f6f Compare August 13, 2019 00:32

bors bot added a commit that referenced this pull request Aug 13, 2019

Try #651:

11bd6a6

nlewycky added 3 commits August 12, 2019 18:06

Order no_f32_ncnan and no_f64_ncnan consistently.

c30a64a

Abs may only change the sign bit, it can't change whether a value is …

9af01a4

…a canonical nan or not.

Merge branch 'master' into feature/llvm-nan-fix

1097f88

bjfish approved these changes Aug 16, 2019

View reviewed changes

bjfish reviewed Aug 16, 2019

View reviewed changes

nlewycky mentioned this pull request Nov 7, 2019

In LLVM backend, track which floats are guaranteed to be arithmetic, which makes the canonicalization a no-op. #934

Merged

1 task

nlewycky closed this Dec 2, 2019

epilys deleted the feature/llvm-nan-fix branch May 4, 2022 04:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix LLVM speed regression #651

Fix LLVM speed regression #651

syrusakbary commented Aug 9, 2019

syrusakbary commented Aug 9, 2019

bors bot commented Aug 9, 2019

syrusakbary commented Aug 13, 2019

nlewycky commented Aug 13, 2019

bors bot commented Aug 13, 2019

bjfish Aug 16, 2019

penzn commented Aug 16, 2019

Hywan commented Oct 31, 2019

nlewycky commented Dec 2, 2019

nlewycky commented Dec 2, 2019

penzn commented Dec 3, 2019

Fix LLVM speed regression #651

Fix LLVM speed regression #651

Conversation

syrusakbary commented Aug 9, 2019

syrusakbary commented Aug 9, 2019

bors bot commented Aug 9, 2019

try

Build succeeded

syrusakbary commented Aug 13, 2019

nlewycky commented Aug 13, 2019

bors bot commented Aug 13, 2019

try

Build succeeded

bjfish Aug 16, 2019

Choose a reason for hiding this comment

penzn commented Aug 16, 2019

Hywan commented Oct 31, 2019

nlewycky commented Dec 2, 2019

nlewycky commented Dec 2, 2019

penzn commented Dec 3, 2019