Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colon and Range indexing performance regressions #14651

Closed
mbauman opened this issue Jan 12, 2016 · 3 comments
Closed

Colon and Range indexing performance regressions #14651

mbauman opened this issue Jan 12, 2016 · 3 comments
Labels
performance Must go faster regression Regression in behavior compared to a previous version

Comments

@mbauman
Copy link
Member

mbauman commented Jan 12, 2016

Thanks to @KristofferC, I was able to chase down quite a few of the indexing performance regressions since 0.4.2. There are still a few perf tests remaining with large regressions, however:

Perf test name Min time relative to 0.4.2
"sumcolonFb ArrayLS" 2.54728
"sumcolonFb ArrayLF" 2.50111
"sumcolonFb ArrayLSLS" 2.36786
"sumrangeFb ArrayLSLS" 1.94905
"sumrangeFb ArrayLS" 1.55122
"sumrangeFb ArrayLF" 1.53517
"sumcolonIb ArrayLSLS" 1.3634
"sumrangeIb ArrayLSLS" 1.31532
"sumcolonIb ArrayLF" 1.25546
"sumrangeIb ArrayLS" 1.23067
"sumcolonIb ArrayLS" 1.22831
"sumcolonFb Array" 1.22608

(This is using LLVM 3.3, with #14650 applied)

I poked at these briefly, but I wasn't able to spot the problem immediately. I'm afraid I won't be able to spend much more time on this.

Quite a few of the array tests are significantly faster than 0.4.2; here's the whole distribution for the array tests:

screen shot 2016-01-11 at 8 36 34 pm

@mbauman mbauman added performance Must go faster regression Regression in behavior compared to a previous version labels Jan 12, 2016
@mbauman
Copy link
Member Author

mbauman commented Jan 15, 2016

There's something odd going on with inlining and codegen. Here's a somewhat reduced testcase:

using Benchmarks
function h1(A)
    dest = similar(A, (300,))
    h2(dest, A)
end
@inline function h2(dest, src)
    for i_2 = 1:size(dest,2)
        for i_1 = 1:size(dest,1)
            unsafe_setindex!(dest,i_1*i_2,i_1)
        end
    end
    dest
end

A = Array{Float32}(300,500)
dest = similar(A, (300,))

@show @benchmark similar(A, (300,))
@show @benchmark h2(dest, A)
@show @benchmark h1(A)

On 0.4.2, I see 250ns, 350ns, and 510ns, respectively. On master, I get 210ns, 350ns, and 660ns. Removing the inlining annotation on _unsafe_getindex! seems to be a sensible workaround (and may be the right thing to do on its own merits, too)… but it'd be nice to figure out what's happening here.

@blakejohnson
Copy link
Contributor

The timings for that script on my machine tell a different story. On 0.4.2 I see 145ns, 465ns, and 655ns... On master I see 150ns, 470ns, and 475ns.

@mbauman
Copy link
Member Author

mbauman commented Feb 6, 2016

These are all different on LLVM 3.7, and I'm not seeing anything too terribly drastic anymore. I'm closing this guy as being stale.

@mbauman mbauman closed this as completed Feb 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

2 participants