Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to the fast broadcast implementation #716

Merged
merged 12 commits into from
Apr 13, 2019
Merged

Conversation

YingboMa
Copy link
Member

Macro @.. is the fast broadcast implementation which assumes that all arrays are non-extruded, so that LLVM can always do a runtime alias check and vectorize.

@YingboMa
Copy link
Member Author

YingboMa commented Apr 12, 2019

julia> using OrdinaryDiffEq, BenchmarkTools

julia> using OrdinaryDiffEq: perform_step!, initialize!

julia> prob10 = ODEProblem((du, u, p, t)->copyto!(du, u),ones(10),(0.0,10.0));

julia> prob100 = ODEProblem((du, u, p, t)->copyto!(du, u),ones(100),(0.0,10.0));

julia> prob1000 = ODEProblem((du, u, p, t)->copyto!(du, u),ones(1000),(0.0,10.0));

julia> integ10 = init(prob10, Tsit5()); integ100 = init(prob100, Tsit5()); integ1000 = init(prob1000, Tsit5());

julia> initialize!(integ10, integ10.cache); initialize!(integ100, integ100.cache); initialize!(integ1000, integ1000.cache);

julia> @btime perform_step!($integ10,   $integ10.cache, false) # PR
  251.051 ns (0 allocations: 0 bytes)

julia> @btime perform_step!($integ100,  $integ100.cache, false) # PR
  645.738 ns (0 allocations: 0 bytes)

julia> @btime perform_step!($integ1000, $integ1000.cache, false) # PR
  5.743 μs (0 allocations: 0 bytes)

julia> @btime perform_step!($integ10,   $integ10.cache, false) # Master
  220.204 ns (0 allocations: 0 bytes)

julia> @btime perform_step!($integ100,  $integ100.cache, false) # Master
  620.471 ns (0 allocations: 0 bytes)

julia> @btime perform_step!($integ1000, $integ1000.cache, false) # Master
  5.754 μs (0 allocations: 0 bytes)

Broadcast has a constant overhead on the order of 1-2 ns (there are 7 broadcasts in adaptive Tsit5). See below.

@codecov
Copy link

codecov bot commented Apr 12, 2019

Codecov Report

Merging #716 into master will decrease coverage by 0.3%.
The diff coverage is 94.38%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #716      +/-   ##
==========================================
- Coverage   72.31%   72.01%   -0.31%     
==========================================
  Files          93       93              
  Lines       29182    28808     -374     
==========================================
- Hits        21104    20747     -357     
+ Misses       8078     8061      -17
Impacted Files Coverage Δ
src/dense/interpolants.jl 98.45% <ø> (-0.12%) ⬇️
src/OrdinaryDiffEq.jl 100% <ø> (ø) ⬆️
...rc/perform_step/general_rosenbrock_perform_step.jl 0% <0%> (ø) ⬆️
src/perform_step/prk_perform_step.jl 100% <100%> (ø) ⬆️
src/nlsolve/newton.jl 94.11% <100%> (ø) ⬆️
src/perform_step/extrapolation_perform_step.jl 96.58% <100%> (ø) ⬆️
src/perform_step/high_order_rk_perform_step.jl 98.55% <100%> (-0.2%) ⬇️
src/nlsolve/utils.jl 76.36% <100%> (ø) ⬆️
src/perform_step/exponential_rk_perform_step.jl 94.92% <100%> (ø) ⬆️
src/dense/high_order_rk_addsteps.jl 97.4% <100%> (-0.47%) ⬇️
... and 40 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b521164...a445431. Read the comment docs.

@coveralls
Copy link

coveralls commented Apr 12, 2019

Coverage Status

Coverage increased (+5.6%) to 75.535% when pulling a445431 on myb/fastbc into b521164 on master.

@YingboMa
Copy link
Member Author

With SciML/DiffEqBase.jl#204, I got

using OrdinaryDiffEq, BenchmarkTools
prob = ODEProblem((du, u, p, t)->copyto!(du, u),ones(1000),(0.0,10.0))
integ = init(prob, Tsit5());
@btime OrdinaryDiffEq.perform_step!($integ, $(integ.cache)) # bc
@btime OrdinaryDiffEq.perform_step!($integ, $(integ.cache)) # loop
#=
julia> @btime OrdinaryDiffEq.perform_step!($integ, $(integ.cache)) # bc
  4.122 μs (0 allocations: 0 bytes)

julia> @btime OrdinaryDiffEq.perform_step!($integ, $(integ.cache)) # loop
  4.129 μs (0 allocations: 0 bytes)
=#
integ = init(prob, Vern9());
@btime OrdinaryDiffEq.perform_step!($integ, $(integ.cache)) # bc
@btime OrdinaryDiffEq.perform_step!($integ, $(integ.cache)) # loop
#=
julia> @btime OrdinaryDiffEq.perform_step!($integ, $(integ.cache)) # bc
  11.411 μs (0 allocations: 0 bytes)

julia> @btime OrdinaryDiffEq.perform_step!($integ, $(integ.cache)) # loop
  20.662 μs (0 allocations: 0 bytes)
=#

.

I think that it is safe to say that there is no regression :-)

@YingboMa
Copy link
Member Author

This PR needs SciML/DiffEqBase.jl#204

@@ -50,7 +50,7 @@ function qradd!(Q::AbstractMatrix, R::AbstractMatrix, v::AbstractVector, k::Int)
@inbounds begin
d = norm(v)
R[k, k] = d
@. Q[:, k] = v / d
@.. @view(Q[:, k]) = v / d
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh lol. That might effect timings.

@ChrisRackauckas
Copy link
Member

Nord fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants