Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid introducing local variable (and GC frame store) in unsafe_setindex! #13461

Merged
merged 1 commit into from
Oct 7, 2015

Conversation

yuyichao
Copy link
Contributor

@yuyichao yuyichao commented Oct 5, 2015

This is a workaround to avoid the generation of a GC frame store when unsafe_setindex! is called directly.

The problem seems to be that the type inference is not able to figure out the exit point of @inbounds return .... (Or more precisely, the second return is never reachable and the result is not used anyway....)

julia> Base.code_typed(Base.unsafe_setindex!, Tuple{Matrix{Float64}, Float64, Int})
1-element Array{Any,1}:
 :($(Expr(:lambda, Any[:A,:x,:i1,:(I::Real...)], Any[Any[Any[:A,Array{Float64,2},0],Any[:x,Float64,0],Any[:i1,Int64,0],Any[:I,Tuple{},0]],Any[],Any[],Any[:T]], :(begin  # array.jl, line 306:
        $(Expr(:boundscheck, false))
        return (Base.arrayset)(A::Array{Float64,2},x::Float64,i1::Int64)::Array{Float64,2}
        return $(Expr(:boundscheck, :(Base.pop)))
    end::Array{Float64,2}))))

The code above generates a pointer local variable and disables SIMD. There are many fixes/optimizations that we can do to avoid the store in the loop at multiple levels but I think this workaround should be the safest to backport to 0.4 with minimum side effect (not necessarily 0.4.0).

This is an alternative solution to what's in #13459.
It seems that A[:] = 0. still generates one allocation while fill! doesn't so #13459 might still be good to have (or we should find why is A[:] = 0. allocate and fix that in general). (edit: It's the splatting panelty, see #13461 (comment))

julia> @time fill!(A, 0.);
  0.008700 seconds (4 allocations: 160 bytes)

julia> @time A[:] = 0.;
  0.008251 seconds (5 allocations: 176 bytes)

@yuyichao
Copy link
Contributor Author

yuyichao commented Oct 5, 2015

@KristofferC @mbauman

@yuyichao
Copy link
Contributor Author

yuyichao commented Oct 5, 2015

................. The extra allocation is due to #13359 ......

@yuyichao yuyichao mentioned this pull request Oct 6, 2015
@yuyichao
Copy link
Contributor Author

yuyichao commented Oct 6, 2015

Note that the code generated for this PR is still better than the one for #13463 since there's no gc frame at all.

@jakebolewski
Copy link
Member

@yuyichao it would be good to fix the underlying problem but this looks fine, the original code wasn't really idiomatic anyways.

@yuyichao yuyichao added the performance Must go faster label Oct 6, 2015
@mbauman
Copy link
Member

mbauman commented Oct 6, 2015

Interesting. I didn't realize there was a penalty for @inbounds return. This LGTM, but I agree it'd be nice to fix this properly… I'm pretty sure there are other places where this is used (perhaps just introduced by me?).

@yuyichao
Copy link
Contributor Author

yuyichao commented Oct 6, 2015

I did a git grep '@inbounds return' and this is the only place that otherwise only have one return statement. We should probably have the type inference removes unused local variables but that's much harder than this PR and will have a very small chance of breaking code with finalizers....

@mbauman
Copy link
Member

mbauman commented Oct 7, 2015

This is awesome. Looks like it speeds up the non-scalar array perf tests by 5-15%. I agree that we should get a better fix for this eventually, but this is great for now.

mbauman added a commit that referenced this pull request Oct 7, 2015
Avoid introducing local variable (and GC frame store) in `unsafe_setindex!`
@mbauman mbauman merged commit f17d5af into master Oct 7, 2015
@mbauman mbauman deleted the yyc/setindex-return branch October 7, 2015 22:13
yuyichao added a commit that referenced this pull request Oct 8, 2015
…r is

confused about the return point.

(cherry picked from commit bb247cf)
ref #13461
@tkelman
Copy link
Contributor

tkelman commented Oct 8, 2015

backported in adb832a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants