Faster view creation #19259

mbauman · 2016-11-08T00:00:14Z

This is a series of performance patches that I wrote a while ago, but never had a chance to fully vet their performance impact. At a minimum, this fixes #19257. We really need better benchmarks for subarray creation in nanosoldier; I don't think SubArray creation is really tested there.

The fourth commit is the most micro- of micro-optimizations, and it might be a little too cute… but I think there's good reason to do it beyond the micro-optimization. See the commit message for more details.

mbauman · 2016-11-08T00:02:44Z

May as well see what the soldier reports in any case: @nanosoldier runbenchmarks("ALL", vs = ":master")

nanosoldier · 2016-11-08T01:49:16Z

Your benchmark job has completed, but no benchmarks were actually executed. Perhaps your tag predicate contains mispelled tags? cc @jrevels

jrevels · 2016-11-08T02:01:04Z

(ALL is a keyword for run all benchmarks, "ALL" means run all benchmarks tagged "ALL")
@nanosoldier runbenchmarks(ALL, vs = ":master")

nanosoldier · 2016-11-08T05:14:42Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

timholy · 2016-11-09T14:56:42Z

Nice! I like the direction, good catch. What kind of speedups are you seeing?

Aside from the test failures, might want to look into the lu regression. The rest I bet are spurious.

kmsquire · 2016-11-09T18:27:21Z

doc/devdocs/subarrays.rst

+	    parent::P
+	    indexes::I
+	    offset1::L    # for linear indexing and pointer, only stored when LinearFast
+	    stride1::L    # for linear indexing, only stored when LinearFast


tabs -> spaces

mbauman · 2016-11-10T19:09:46Z

I didn't push this sooner since I had a hard time constructing benchmarks that actually demonstrated that this was an improvement. The inlining changes were purely based upon simplifications to @code_typed/llvm, but I had a hard time measuring an impact there.

The reason I pushed it when I did was because I didn't want others to duplicate this work as they investigated #19257. That result seems to have been somewhat spurious, but we need more benchmarks here in any case. I'm not sure when I'll have time to test this further.

StefanKarpinski · 2016-11-11T17:39:19Z

So what's your call, @mbauman, merge or not? If you don't think it's a regression, I'd be in favor.

mbauman · 2016-11-11T20:08:45Z

Lemme drop the last commit here and propose it separately since it's breaking.

mbauman · 2017-01-02T22:45:50Z

I'd like to merge this for 0.6, but it'd be good to wait until JuliaCI/BaseBenchmarks.jl#54 makes it onto Nanosoldier for a final perf check.

tkelman · 2017-01-03T03:37:34Z

base/subarray.jl

-compute_offset1(parent, stride1::Integer, dims::Tuple{Int}, inds::Tuple{Colon}, I::Tuple) = compute_linindex(parent, I) - stride1*first(indices(parent, dims[1]))  # index-preserving case
-compute_offset1(parent, stride1::Integer, dims, inds, I::Tuple) = compute_linindex(parent, I) - stride1  # linear indexing starts with 1
+compute_offset1(parent, stride1::Integer, dims::Tuple{Int}, inds::Tuple{Colon}, I::Tuple) = (@_inline_meta; compute_linindex(parent, I) - stride1*first(indices(parent, dims[1])))  # index-preserving case
+compute_offset1(parent, stride1::Integer, dims, inds, I::Tuple) = (@_inline_meta; compute_linindex(parent, I) - stride1)  # linear indexing starts with 1


bit overly long lines here, wrap?

All it needs is the dimensionality of the indexing result.

mbauman · 2017-01-03T05:45:10Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

jrevels · 2017-01-03T14:18:30Z

Had to kick nanosoldier, retriggering:

@nanosoldier runbenchmarks(ALL, vs = ":master")

nanosoldier · 2017-01-03T18:46:15Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

mbauman · 2017-01-03T23:56:34Z

Hm, I cannot reproduce any of the largest regressions… in fact "sumeach_view","linspace(1.0,2.0,10000)" is nearly 2x faster with this branch on my computer (instead of 2x slower). It is pretty old, though, and doesn't have the same fancy SIMD registers that nanosoldier has. But there are many more performance gains — including a few constant-folds, which is pretty cool.

Since I can't reproduce the perf issues, I think this is as good as I can make it.

KristofferC · 2017-01-04T00:09:20Z

As another data point, on my laptop I'm getting a ~3x slowdown for the benchmark in the comment above. The LLVM codes are here: https://gist.github.com/KristofferC/1cf9e09d97289b521f494c9c68958043.

Edit: Removed some confusion...

mbauman · 2017-01-04T01:44:57Z

Thanks for checking Kristoffer. I'm seeing the same LLVM IR as you, but 75us on this branch compared to 130us on master. I have an old i5; OpenBLAS reports compiling for Nehalem, LLVM reports westmere.

…

On Jan 3, 2017, at 6:09 PM, Kristoffer Carlsson ***@***.***> wrote: As another data point, on my laptop I'm getting a ~3x slowdown. The LLVM codes are here: https://gist.github.com/KristofferC/1cf9e09d97289b521f494c9c68958043 and what seems suspicious is this call: %22 = call i64 @jlsys_convert_52679(%jl_value_t* inttoptr (i64 139759635777136 to %jl_value_t*), double %21) in the loop. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

KristofferC · 2017-01-04T01:55:44Z

I get 115us on PR and ~45us on master. On 0.5 it is around 115us as well. This PR improves much more than it regresses though so while it is sometimes good to be greedy, perhaps this can just be merged?

StefanKarpinski · 2017-01-04T20:28:04Z

The failure is the current ongoing 32-bit Linux problem. I'm with @KristofferC on this.

JeffBezanson · 2017-01-04T20:46:56Z

Sounds good to me too.

KristofferC · 2017-01-05T13:09:20Z

@yuyichao postulated that the slowdown is due to LLVM generating bad native code on newer architectures causing a partial register stall when converting an integer to double. I confirmed this by recompiling the sysimg for x86-64 which made the relevant benchmark almost 3x faster. (The same problem appears for the pi_sum benchmark at

julia/test/perf/micro/perf.jl

Line 84 in 30bf89f

function pisum()

which is 2x slower when compiling for native architecture).

vchuravy · 2017-01-05T14:36:15Z

In that light I would say we should merge this and tackle the code generation issue later.

timholy · 2017-03-05T13:43:08Z

I wonder if this has regressed?

master:

julia> A = rand(1000,1000,1);

julia> @benchmark view(A, :, :, 1) seconds=1
BenchmarkTools.Trial: 
  memory estimate:  880 bytes
  allocs estimate:  40
  --------------
  minimum time:     14.049 μs (0.00% GC)
  median time:      14.456 μs (0.00% GC)
  mean time:        14.703 μs (0.00% GC)
  maximum time:     64.914 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

julia-0.5:

julia> A = rand(1000,1000,1);

julia> @benchmark view(A, :, :, 1) seconds=1
BenchmarkTools.Trial: 
  memory estimate:  48 bytes
  allocs estimate:  1
  --------------
  minimum time:     27.119 ns (0.00% GC)
  median time:      28.238 ns (0.00% GC)
  mean time:        35.656 ns (6.56% GC)
  maximum time:     1.478 μs (95.10% GC)
  --------------
  samples:          10000
  evals/sample:     995
  time tolerance:   5.00%
  memory tolerance: 1.00%

(I'd fix this myself except I'm up to my neck in another project right now.)

KristofferC · 2017-03-05T13:51:15Z

I get:

julia> A = rand(1000,1000,1);

julia> @btime view($A, :, :, 1)
 10.903 ns (1 allocation: 64 bytes)

KristofferC · 2017-03-05T13:52:09Z

Looks like the missed interpolation of $A was the problem

timholy · 2017-03-05T14:11:03Z

You are so right. Newbie error 😄. With it, they're the same speed on 0.5 and 0.6.

…iaLang#19259).

). (#21883)

). (#21883) (cherry picked from commit 66402ac)

yuyichao mentioned this pull request Nov 8, 2016

Creating a view (SubArray) of a Vector allocates memory #19257

Closed

KristofferC added potential benchmark Could make a good benchmark in BaseBenchmarks performance Must go faster labels Nov 9, 2016

kmsquire reviewed Nov 9, 2016

View reviewed changes

mbauman force-pushed the mb/fastersubcreation branch from 006d861 to 13f8518 Compare November 11, 2016 20:10

mbauman force-pushed the mb/fastersubcreation branch from 13f8518 to 8b812ae Compare December 9, 2016 23:35

mbauman added this to the 0.6.0 milestone Jan 2, 2017

tkelman reviewed Jan 3, 2017

View reviewed changes

mbauman added 4 commits January 2, 2017 23:40

Ensure SubArray constructor inlines

e0c6330

Don't bother calculating the size of the SubArray during creation

627393a

All it needs is the dimensionality of the indexing result.

Deprecate three-argument SubArray constructor

f66bc82

Wrap long lines

410cd49

mbauman force-pushed the mb/fastersubcreation branch from 8b812ae to 410cd49 Compare January 3, 2017 05:44

KristofferC merged commit 20b704a into master Jan 5, 2017

tkelman deleted the mb/fastersubcreation branch January 5, 2017 14:48

pabloferz mentioned this pull request Jan 11, 2017

LLVM generates bad code on newer architercures #19976

Closed

mbauman mentioned this pull request Jan 12, 2017

Make LinSpace generic #18777

Merged

Sacha0 added deprecation This change introduces or involves a deprecation needs news A NEWS entry is required for this change labels May 14, 2017

Sacha0 added a commit to Sacha0/julia that referenced this pull request May 14, 2017

Add NEWS.md entry for three-arg SubArray constructor deprecation (Jul…

952e8d9

…iaLang#19259).

tkelman removed the needs news A NEWS entry is required for this change label May 16, 2017

tkelman pushed a commit that referenced this pull request May 16, 2017

Add NEWS.md entry for three-arg SubArray constructor deprecation (#19259

66402ac

). (#21883)

tkelman pushed a commit that referenced this pull request May 16, 2017

Add NEWS.md entry for three-arg SubArray constructor deprecation (#19259

94f73eb

). (#21883) (cherry picked from commit 66402ac)

KristofferC removed the potential benchmark Could make a good benchmark in BaseBenchmarks label Oct 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster view creation #19259

Faster view creation #19259

mbauman commented Nov 8, 2016 •

edited

Loading

mbauman commented Nov 8, 2016

nanosoldier commented Nov 8, 2016

jrevels commented Nov 8, 2016

nanosoldier commented Nov 8, 2016

timholy commented Nov 9, 2016

kmsquire Nov 9, 2016

mbauman commented Nov 10, 2016

StefanKarpinski commented Nov 11, 2016

mbauman commented Nov 11, 2016

mbauman commented Jan 2, 2017

tkelman Jan 3, 2017

mbauman commented Jan 3, 2017

jrevels commented Jan 3, 2017

nanosoldier commented Jan 3, 2017

mbauman commented Jan 3, 2017 •

edited

Loading

KristofferC commented Jan 4, 2017 •

edited

Loading

mbauman commented Jan 4, 2017 via email

KristofferC commented Jan 4, 2017

StefanKarpinski commented Jan 4, 2017

JeffBezanson commented Jan 4, 2017

KristofferC commented Jan 5, 2017

vchuravy commented Jan 5, 2017

timholy commented Mar 5, 2017

KristofferC commented Mar 5, 2017

KristofferC commented Mar 5, 2017

timholy commented Mar 5, 2017

Faster view creation #19259

Faster view creation #19259

Conversation

mbauman commented Nov 8, 2016 • edited Loading

mbauman commented Nov 8, 2016

nanosoldier commented Nov 8, 2016

jrevels commented Nov 8, 2016

nanosoldier commented Nov 8, 2016

timholy commented Nov 9, 2016

kmsquire Nov 9, 2016

Choose a reason for hiding this comment

mbauman commented Nov 10, 2016

StefanKarpinski commented Nov 11, 2016

mbauman commented Nov 11, 2016

mbauman commented Jan 2, 2017

tkelman Jan 3, 2017

Choose a reason for hiding this comment

mbauman commented Jan 3, 2017

jrevels commented Jan 3, 2017

nanosoldier commented Jan 3, 2017

mbauman commented Jan 3, 2017 • edited Loading

KristofferC commented Jan 4, 2017 • edited Loading

mbauman commented Jan 4, 2017 via email

KristofferC commented Jan 4, 2017

StefanKarpinski commented Jan 4, 2017

JeffBezanson commented Jan 4, 2017

KristofferC commented Jan 5, 2017

vchuravy commented Jan 5, 2017

timholy commented Mar 5, 2017

KristofferC commented Mar 5, 2017

KristofferC commented Mar 5, 2017

timholy commented Mar 5, 2017

mbauman commented Nov 8, 2016 •

edited

Loading

mbauman commented Jan 3, 2017 •

edited

Loading

KristofferC commented Jan 4, 2017 •

edited

Loading