-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow loop fusion when multiplying a column vector with a row vector #20875
Comments
Note that the argument to |
It is not a scalar. Note that It it better to create a temp variable first. |
Oh, sorry, yes I miscounted the parenthesis. |
Why close this if there's a real performance issue? The vectorized |
The output of |
@StefanKarpinski, it's not a real performance issue. The programmer has to decide whether it is worth it to allocate the extra storage to call the If you do |
One really nice thing about the new (c)transpose wrappers is that you can just transpose a range without needing to expand it to a full vector, so the collect in |
In principle, the optimal thing would be something like
which allocates no temporary storage and also calls |
What about the Any-typed variables in the code_warntype output? I.e. |
That sort of optimization is what an |
@StefanKarpinski, you're right, a sufficiently clever compiler could hoist However, the repeated computations of |
Let's focus on the type inference issues then, since those should definitely not occur. Being clever enough to hoist repeated pure computations is out of scope for now. |
There isn't an inference issue, the red is showing variables that are not used. I wonder if #20853 is already filtering them. |
In version 0.5. I don't get those Any-typed variables using
Code for both: @inbounds function test_perf6()
rangeᵀ = (1:2000000)'
steering_vectors = complex.(ones(4,11), ones(4,11))
sum_signal = zeros(Complex{Float64}, 4, length(rangeᵀ))
for i = 1:11
for kc = 1:length(rangeᵀ)
carrier_signal = cis((2 * pi * 1.023e6 / 4e6) * rangeᵀ[kc] + (40 * pi / 180))
for ks = 1:4
sum_signal[ks,kc] += steering_vectors[ks,i] * carrier_signal
end
end
end
return sum_signal
end |
After inlining, we can probably add a pass to attempt loop-hoisting (LICM). Inference is perfectly capable of doing that without the user being required to apply an incorrect |
That |
This is greatly improved on master (and #25377). That said, the LLVM IR looks like we're still computing
|
closing in favor of the example in #29285 (this one used a now-deprecated method of |
I am using Julia v0.6.0-pre.alpha.34
I run this simple function, which should be fast, because of loop fusion:
This results in:
However, when I remove the dot after
cis
, I get this result:The memory consumption is reduced, but the fused code it 3 x slower
The text was updated successfully, but these errors were encountered: