enable the SLP Vectorizer optimization pass by default #26594

KristofferC · 2018-03-23T12:46:28Z

The justification for this is that it seems to pretty much negligble impact compilation time but has serious performance improvements for linear algebra and other operations with static arrays ( #26398 (comment))

SLP Disabled:

Sysimg build time: User time (seconds): 323.47

Running tests for StaticArrays

@testinf      |    3      3
  3.388018 seconds (4.51 M allocations: 303.673 MiB, 4.26% gc time)
SVector       |   53     53
  1.610420 seconds (946.33 k allocations: 54.435 MiB, 1.03% gc time)
MVector       |   52     52
  0.899716 seconds (366.94 k allocations: 20.813 MiB, 0.69% gc time)
SMatrix       |   68     68
  2.104827 seconds (1.26 M allocations: 71.847 MiB, 0.78% gc time)
MMatrix       |   71     71
  1.987582 seconds (754.25 k allocations: 43.105 MiB, 0.68% gc time)
SArray        |   92     92
  6.809774 seconds (4.14 M allocations: 226.083 MiB, 0.88% gc time)
MArray        |  101    101
  5.667526 seconds (2.80 M allocations: 148.517 MiB, 0.76% gc time)
FieldVector   |   27     27
  1.362807 seconds (693.47 k allocations: 40.276 MiB, 1.09% gc time)
Scalar        |    8      8
  3.657003 seconds (1.80 M allocations: 103.858 MiB, 2.87% gc time)
SUnitRange    |   10     10
  0.139713 seconds (30.73 k allocations: 1.809 MiB)
SizedArray    |   49       1     50
  2.333507 seconds (1.05 M allocations: 60.101 MiB, 1.03% gc time)
SDiagonal     |   71     71
 13.601758 seconds (13.72 M allocations: 720.386 MiB, 2.94% gc time)
Custom types  |    2      2
  0.068887 seconds (9.19 k allocations: 583.006 KiB)
Core definitions and constructors |   57     57
  1.817816 seconds (446.50 k allocations: 27.372 MiB, 0.46% gc time)
AbstractArray interface |   54     54
  3.961401 seconds (1.79 M allocations: 100.007 MiB, 1.16% gc time)
Indexing      |   73     73
  6.983536 seconds (3.72 M allocations: 211.556 MiB, 2.49% gc time)
Map, reduce, mapreduce, broadcast |   67     67
  9.380575 seconds (8.80 M allocations: 481.281 MiB, 1.86% gc time)
Array math    |  121    121
  3.541558 seconds (2.94 M allocations: 166.468 MiB, 1.89% gc time)
Broadcast sizes |   30     30
Broadcast     |   77      12     89
  8.707391 seconds (4.76 M allocations: 274.434 MiB, 1.17% gc time)
Linear algebra |   86     86
  7.220394 seconds (3.60 M allocations: 205.664 MiB, 1.42% gc time)
Matrix multiplication |   61       1     62
 28.005206 seconds (32.25 M allocations: 1.397 GiB, 4.13% gc time)

SLP Enabled

Sysimg build time: User time (seconds): 329.58

Running tests for StaticArrays

@testinf      |    3      3
  3.524660 seconds (4.51 M allocations: 304.837 MiB, 5.88% gc time)
SVector       |   53     53
  1.594506 seconds (947.30 k allocations: 54.376 MiB, 1.05% gc time)
MVector       |   52     52
  0.907068 seconds (367.29 k allocations: 20.782 MiB, 0.68% gc time)
SMatrix       |   68     68
  2.124502 seconds (1.26 M allocations: 71.770 MiB, 0.81% gc time)
MMatrix       |   71     71
  1.683213 seconds (754.89 k allocations: 43.175 MiB, 0.74% gc time)
SArray        |   92     92
  6.771967 seconds (4.15 M allocations: 225.854 MiB, 0.91% gc time)
MArray        |  101    101
  5.707790 seconds (2.80 M allocations: 148.675 MiB, 0.78% gc time)
FieldVector   |   27     27
  1.229272 seconds (694.42 k allocations: 40.199 MiB, 1.04% gc time)
Scalar        |    8      8
  3.683220 seconds (1.80 M allocations: 104.034 MiB, 2.84% gc time)
SUnitRange    |   10     10
  0.119789 seconds (30.79 k allocations: 1.793 MiB)
SizedArray    |   49       1     50
  2.187806 seconds (1.05 M allocations: 60.016 MiB, 0.97% gc time)
SDiagonal     |   71     71
 13.556212 seconds (13.73 M allocations: 720.349 MiB, 2.92% gc time)
Custom types  |    2      2
  0.064020 seconds (9.20 k allocations: 583.443 KiB)
Core definitions and constructors |   57     57
  1.797976 seconds (446.71 k allocations: 27.369 MiB, 0.52% gc time)
AbstractArray interface |   54     54
  3.770388 seconds (1.79 M allocations: 99.841 MiB, 1.12% gc time)
Indexing      |   73     73
  7.115772 seconds (3.73 M allocations: 211.120 MiB, 2.52% gc time)
Map, reduce, mapreduce, broadcast |   67     67
 10.392894 seconds (8.81 M allocations: 482.237 MiB, 1.76% gc time)
Array math    |  121    121
  3.884809 seconds (2.94 M allocations: 166.417 MiB, 1.86% gc time)
Broadcast sizes |   30     30
Broadcast     |   77      12     89
  9.460607 seconds (4.76 M allocations: 273.940 MiB, 1.15% gc time)
Linear algebra |   86     86
  7.504202 seconds (3.61 M allocations: 205.617 MiB, 1.40% gc time)
Matrix multiplication |   61       1     62
 28.755902 seconds (32.22 M allocations: 1.396 GiB, 4.07% gc time)

KristofferC · 2018-03-23T12:46:46Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

KristofferC · 2018-03-23T12:54:26Z

Meant to run this:

@nanosoldier runbenchmarks(ALL, vs = "@f0087141b79736beec7b5a2ee946d5c4ec257167")

iamed2 · 2018-03-23T17:23:46Z

Those test runs don't seem to show noticeable performance improvements. Is this working properly?

KristofferC · 2018-03-23T17:34:09Z

The test run is supposed to show compilation time.

iamed2 · 2018-03-23T17:51:47Z

Oh that makes sense, sorry.

KristofferC · 2018-03-23T19:31:52Z

@ararslan any idea about Nanosoldier?

ararslan · 2018-03-23T19:59:06Z

Ugh, yeah. It's hitting JuliaWeb/HTTP.jl#220 again. I'll restart the server.

ararslan · 2018-03-23T20:01:10Z

@nanosoldier runbenchmarks(ALL, vs="@f0087141b79736beec7b5a2ee946d5c4ec257167")

KristofferC · 2018-03-24T09:09:13Z

Any ideas about nanosoldier @ararslan?

ararslan · 2018-03-24T20:59:10Z

Same error, but it looks like other requests have gotten through, so I'll just try again. GitHub.jl needs to be updated to work with the changes in HTTP.jl, which is why the error keeps happening.

@nanosoldier runbenchmarks(ALL, vs="@f0087141b79736beec7b5a2ee946d5c4ec257167")

KristofferC · 2018-03-25T13:35:39Z

Perhaps needs a restart @ararslan?

ararslan · 2018-03-25T20:00:15Z

I restarted the server, dunno if it will help but worth a try.

@nanosoldier runbenchmarks(ALL, vs="@f0087141b79736beec7b5a2ee946d5c4ec257167")

KristofferC · 2018-03-26T14:29:12Z

I ran the benchmarks locally and got https://gist.github.com/KristofferC/c220cb87cda1f77654ed3f89edd5ec60 (filtering out the scalar results because they seemed like noise).

My computer was not completely idle when running though so not sure how reliable those are. What I was mostly interested in looking at was the tuple linear algebra ones which all seems to have gotten a significant boost. Would be nice to get a real nanosoldier run though.

ararslan · 2018-03-26T20:27:25Z

Other Nanosoldier runs have been getting through but for whatever reason this particular PR seems to hit the error from HTTP every time.

andyferris · 2018-03-26T22:29:11Z

I definitely think we should do this, though I've been confused over time as to what happens when this is off (e.g. when I don't use -O3) since I can still see xmm and ymm registers and so-on being used - are there other parts of LLVM or the Julia compiler that would likely make that happen?

ararslan · 2018-03-26T22:40:38Z

I modified the logging level in the server, so if this fails, hopefully we'll have a better understanding of why. Sorry for the noise here, Kristoffer, and thanks for your patience.

@nanosoldier runbenchmarks(ALL, vs="@f0087141b79736beec7b5a2ee946d5c4ec257167")

ararslan · 2018-03-26T22:47:55Z

I had to modify the server's local clone of HTTP to fix an UndefVarError. I'll submit a PR for that to HTTP, but in the meantime:

@nanosoldier runbenchmarks(ALL, vs="@f0087141b79736beec7b5a2ee946d5c4ec257167")

ararslan · 2018-03-27T03:02:05Z

This PR hit the same error again, and unfortunately Nanosoldier is now down until further notice for on-site work.

KristofferC · 2018-03-29T22:14:48Z

@nanosoldier runbenchmarks(ALL, vs="@c12922eeea1afb59d05477698d408e4ff54ff7f1")

nanosoldier · 2018-03-30T03:18:05Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

ararslan · 2018-03-30T04:01:20Z

Glad a Nanosoldier run was able to go through finally. I wouldn't put too much weight on those results though, since the benchmarks hadn't been retuned before running.

ararslan · 2018-03-30T04:39:53Z

Benchmarks have been retuned.

@nanosoldier runbenchmarks(ALL, vs="@c12922eeea1afb59d05477698d408e4ff54ff7f1")

KristofferC · 2018-03-30T06:26:19Z

Maybe I should rebase this? I don't think it is running on the merge commit.

ararslan · 2018-03-30T06:52:26Z

It should always be running on the merge commit unless there are branch conflicts.

KristofferC · 2018-03-30T07:03:10Z

Okay but I m quite sure it doesn't because e.g.the memory regressions from #26435 (comment) also shows up here which mean that this commit includes that PR but not the one that we compared against.

ararslan · 2018-03-30T07:14:16Z

Oh sorry you're right, I don't think it checks out the merge commit with master if it's comparing against another specific commit. Then yes, it would be good to rebase this.

KristofferC · 2018-03-30T08:02:22Z

@nanosoldier runbenchmarks(ALL, vs = ":vc/llvm6")

let's try this

ararslan · 2018-03-30T19:03:30Z

Nanosoldier seems to be consistently hitting the IOError again.

KristofferC · 2018-03-30T19:09:08Z

JuliaWeb/GitHub.jl#108 maybe :/

ararslan · 2018-03-30T23:23:01Z

Trying something out, testing in production. #yolo.

@nanosoldier runbenchmarks(ALL, vs=":vc/llvm6")

nanosoldier · 2018-03-31T04:35:22Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

ararslan · 2018-03-31T04:37:46Z

He lives!

KristofferC · 2018-04-18T12:30:40Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

nanosoldier · 2018-04-18T17:23:21Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

vchuravy · 2018-05-06T02:47:35Z

rebase onto master, and then LGTM

KristofferC · 2018-05-06T08:39:28Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

nanosoldier · 2018-05-06T13:57:24Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

vchuravy · 2018-05-06T19:25:33Z

CI failures are all network failures.

KristofferC added performance Must go faster triage This should be discussed on a triage call labels Mar 23, 2018

JeffBezanson removed the triage This should be discussed on a triage call label Mar 23, 2018

vchuravy force-pushed the vc/llvm6 branch from f008714 to 91baf58 Compare March 25, 2018 20:46

KristofferC force-pushed the kc/SLP_llvm06 branch from d4bebfd to 6190bf2 Compare March 26, 2018 20:31

andyferris mentioned this pull request Mar 26, 2018

hcat allocations with Julia master JuliaArrays/StaticArrays.jl#388

Closed

ararslan mentioned this pull request Mar 30, 2018

Retune benchmarks JuliaCI/BaseBenchmarks.jl#194

Merged

KristofferC force-pushed the kc/SLP_llvm06 branch from 6190bf2 to 3de3834 Compare March 30, 2018 08:01

KristofferC force-pushed the kc/SLP_llvm06 branch from 3de3834 to 8244230 Compare April 18, 2018 12:29

KristofferC changed the base branch from vc/llvm6 to master April 18, 2018 12:29

enable the SLP Vectorizer optimization pass by default

32bcf5d

KristofferC force-pushed the kc/SLP_llvm06 branch from 8244230 to 32bcf5d Compare May 6, 2018 07:48

vchuravy merged commit bf29b40 into master May 6, 2018

vchuravy deleted the kc/SLP_llvm06 branch May 6, 2018 19:25

mbauman mentioned this pull request May 8, 2018

Boundschecking test failure in arrayops #27032

Closed

vchuravy mentioned this pull request Jun 19, 2018

Revert "Revert "enable the SLP Vectorizer optimization pass by default"" #27659

Closed

This was referenced Jul 27, 2018

Enable SIMD with Base.literal_pow JuliaDiff/ForwardDiff.jl#332

Merged

Revive #27659: Revert "Revert "enable the SLP Vectorizer optimization pass by default"" #28344

Merged

enable the SLP Vectorizer optimization pass by default #26594

enable the SLP Vectorizer optimization pass by default #26594

Conversation

KristofferC commented Mar 23, 2018 • edited Loading

SLP Disabled:

SLP Enabled

KristofferC commented Mar 23, 2018

KristofferC commented Mar 23, 2018

iamed2 commented Mar 23, 2018

KristofferC commented Mar 23, 2018

iamed2 commented Mar 23, 2018

KristofferC commented Mar 23, 2018

ararslan commented Mar 23, 2018

ararslan commented Mar 23, 2018

KristofferC commented Mar 24, 2018

ararslan commented Mar 24, 2018

KristofferC commented Mar 25, 2018

ararslan commented Mar 25, 2018

KristofferC commented Mar 26, 2018

ararslan commented Mar 26, 2018

andyferris commented Mar 26, 2018

ararslan commented Mar 26, 2018

ararslan commented Mar 26, 2018

ararslan commented Mar 27, 2018

KristofferC commented Mar 29, 2018

nanosoldier commented Mar 30, 2018

ararslan commented Mar 30, 2018

ararslan commented Mar 30, 2018

KristofferC commented Mar 30, 2018 • edited Loading

ararslan commented Mar 30, 2018

KristofferC commented Mar 30, 2018

ararslan commented Mar 30, 2018

KristofferC commented Mar 30, 2018

ararslan commented Mar 30, 2018

KristofferC commented Mar 30, 2018

ararslan commented Mar 30, 2018

nanosoldier commented Mar 31, 2018

ararslan commented Mar 31, 2018

KristofferC commented Apr 18, 2018

nanosoldier commented Apr 18, 2018

vchuravy commented May 6, 2018

KristofferC commented May 6, 2018

nanosoldier commented May 6, 2018

vchuravy commented May 6, 2018

KristofferC commented Mar 23, 2018 •

edited

Loading

KristofferC commented Mar 30, 2018 •

edited

Loading