Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unnecessary allocations in broadcast!? #19608

Closed
stevengj opened this issue Dec 15, 2016 · 15 comments
Closed

unnecessary allocations in broadcast!? #19608

stevengj opened this issue Dec 15, 2016 · 15 comments
Labels
broadcast Applying a function over a collection performance Must go faster

Comments

@stevengj
Copy link
Member

The broadcast! function seems like it allocates memory, e.g. in the following example it apparently allocates about 16 bytes per loop iteration.

julia> function foo(x, n)
           for i = 1:n
               broadcast!(x -> 2x+1, x, x)
           end
           return x
       end
foo (generic function with 1 method)

julia> @time foo([0,0,0], 10^4); # warmup
  0.031009 seconds (26.09 k allocations: 904.224 KB)

julia> @time foo([0,0,0], 10^4);
  0.000213 seconds (10.01 k allocations: 156.531 KB)

It seems like this should be something that can be eliminated.

cc @pabloferz

@stevengj stevengj added performance Must go faster broadcast Applying a function over a collection labels Dec 15, 2016
@Sacha0
Copy link
Member

Sacha0 commented Dec 15, 2016

Strangely

julia> VERSION
v"0.6.0-dev.1557"

julia> using BenchmarkTools

julia> q = [0, 0, 0];

julia> @benchmark broadcast!(z -> 2z+1, $q, $q)
BenchmarkTools.Trial:
  samples:          10000
  evals/sample:     997
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  0.00 bytes
  allocs estimate:  0
  minimum time:     21.00 ns (0.00% GC)
  median time:      21.00 ns (0.00% GC)
  mean time:        22.61 ns (0.00% GC)
  maximum time:     193.00 ns (0.00% GC)

Perhaps the allocation in the example above is elsewhere? Best!

@yuyichao
Copy link
Contributor

This is caused by the allocation of (x,) which will likely be a compile time constant if x is a constant.

@stevengj
Copy link
Member Author

So, basically we need some improved loop hoisting for the compiler to recognize that (x,) can be allocated outside the loop?

@stevengj
Copy link
Member Author

(I thought Julia didn't need to do a heap allocation for local tuples?)

@yuyichao
Copy link
Contributor

I doubt hoisting is the most useful optimization in this case (it certainly is useful) since it will breakdown if x is assigned to. The tuple isn't tuple since it is somehow passed to a _broadcast..... function that's not inlined.

@stevengj
Copy link
Member Author

So maybe we just need to add @inline somewhere?

@pabloferz
Copy link
Contributor

pabloferz commented Dec 15, 2016

So maybe we just need to add @inline somewhere?

That's most likely

@pabloferz
Copy link
Contributor

pabloferz commented Dec 15, 2016

This might be the cause

$(Expr(:meta, :noinline))
.

@stevengj
Copy link
Member Author

Why do we have the noinline there?

@tkelman
Copy link
Contributor

tkelman commented Dec 15, 2016

(useful tip - hit y when viewing a file and github will add the sha to the url, so the line of code you point to stays correct even if the file changes later:

$(Expr(:meta, :noinline))
)

@pabloferz
Copy link
Contributor

I don't know, but we could ask @timholy (522547b)

@tkelman Thanks for the tip!

@stevengj stevengj mentioned this issue Dec 16, 2016
5 tasks
@Sacha0
Copy link
Member

Sacha0 commented Dec 17, 2016

This comment might be the answer? Best!

@stevengj
Copy link
Member Author

Since that comment is for code that seems to have been deleted, maybe it is no longer relevant? Regardless, I just tried removing the noinline and adding @inline to all the _broadcast! methods, and it doesn't seem to remove the allocation, so this is not sufficient.

@pabloferz
Copy link
Contributor

pabloferz commented Dec 17, 2016

I think I have a fix. Will submit a PR (#19639)

@stevengj
Copy link
Member Author

Closed by #19639.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broadcast Applying a function over a collection performance Must go faster
Projects
None yet
Development

No branches or pull requests

5 participants