Switch to the fast broadcast implementation #716

YingboMa · 2019-04-12T03:32:00Z

Macro @.. is the fast broadcast implementation which assumes that all arrays are non-extruded, so that LLVM can always do a runtime alias check and vectorize.

YingboMa · 2019-04-12T03:42:46Z

julia> using OrdinaryDiffEq, BenchmarkTools

julia> using OrdinaryDiffEq: perform_step!, initialize!

julia> prob10 = ODEProblem((du, u, p, t)->copyto!(du, u),ones(10),(0.0,10.0));

julia> prob100 = ODEProblem((du, u, p, t)->copyto!(du, u),ones(100),(0.0,10.0));

julia> prob1000 = ODEProblem((du, u, p, t)->copyto!(du, u),ones(1000),(0.0,10.0));

julia> integ10 = init(prob10, Tsit5()); integ100 = init(prob100, Tsit5()); integ1000 = init(prob1000, Tsit5());

julia> initialize!(integ10, integ10.cache); initialize!(integ100, integ100.cache); initialize!(integ1000, integ1000.cache);

julia> @btime perform_step!($integ10,   $integ10.cache, false) # PR
  251.051 ns (0 allocations: 0 bytes)

julia> @btime perform_step!($integ100,  $integ100.cache, false) # PR
  645.738 ns (0 allocations: 0 bytes)

julia> @btime perform_step!($integ1000, $integ1000.cache, false) # PR
  5.743 μs (0 allocations: 0 bytes)

julia> @btime perform_step!($integ10,   $integ10.cache, false) # Master
  220.204 ns (0 allocations: 0 bytes)

julia> @btime perform_step!($integ100,  $integ100.cache, false) # Master
  620.471 ns (0 allocations: 0 bytes)

julia> @btime perform_step!($integ1000, $integ1000.cache, false) # Master
  5.754 μs (0 allocations: 0 bytes)

~~Broadcast has a constant overhead on the order of 1-2 ns (there are 7 broadcasts in adaptive Tsit5).~~ See below.

codecov · 2019-04-12T04:14:43Z

Codecov Report

Merging #716 into master will decrease coverage by 0.3%.
The diff coverage is 94.38%.

@@            Coverage Diff             @@
##           master     #716      +/-   ##
==========================================
- Coverage   72.31%   72.01%   -0.31%     
==========================================
  Files          93       93              
  Lines       29182    28808     -374     
==========================================
- Hits        21104    20747     -357     
+ Misses       8078     8061      -17

Impacted Files	Coverage Δ
src/dense/interpolants.jl	`98.45% <ø> (-0.12%)`	⬇️
src/OrdinaryDiffEq.jl	`100% <ø> (ø)`	⬆️
...rc/perform_step/general_rosenbrock_perform_step.jl	`0% <0%> (ø)`	⬆️
src/perform_step/prk_perform_step.jl	`100% <100%> (ø)`	⬆️
src/nlsolve/newton.jl	`94.11% <100%> (ø)`	⬆️
src/perform_step/extrapolation_perform_step.jl	`96.58% <100%> (ø)`	⬆️
src/perform_step/high_order_rk_perform_step.jl	`98.55% <100%> (-0.2%)`	⬇️
src/nlsolve/utils.jl	`76.36% <100%> (ø)`	⬆️
src/perform_step/exponential_rk_perform_step.jl	`94.92% <100%> (ø)`	⬆️
src/dense/high_order_rk_addsteps.jl	`97.4% <100%> (-0.47%)`	⬇️
... and 40 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b521164...a445431. Read the comment docs.

coveralls · 2019-04-12T04:24:15Z

Coverage increased (+5.6%) to 75.535% when pulling a445431 on myb/fastbc into b521164 on master.

YingboMa · 2019-04-12T18:12:02Z

With SciML/DiffEqBase.jl#204, I got

using OrdinaryDiffEq, BenchmarkTools
prob = ODEProblem((du, u, p, t)->copyto!(du, u),ones(1000),(0.0,10.0))
integ = init(prob, Tsit5());
@btime OrdinaryDiffEq.perform_step!($integ, $(integ.cache)) # bc
@btime OrdinaryDiffEq.perform_step!($integ, $(integ.cache)) # loop
#=
julia> @btime OrdinaryDiffEq.perform_step!($integ, $(integ.cache)) # bc
  4.122 μs (0 allocations: 0 bytes)

julia> @btime OrdinaryDiffEq.perform_step!($integ, $(integ.cache)) # loop
  4.129 μs (0 allocations: 0 bytes)
=#
integ = init(prob, Vern9());
@btime OrdinaryDiffEq.perform_step!($integ, $(integ.cache)) # bc
@btime OrdinaryDiffEq.perform_step!($integ, $(integ.cache)) # loop
#=
julia> @btime OrdinaryDiffEq.perform_step!($integ, $(integ.cache)) # bc
  11.411 μs (0 allocations: 0 bytes)

julia> @btime OrdinaryDiffEq.perform_step!($integ, $(integ.cache)) # loop
  20.662 μs (0 allocations: 0 bytes)
=#

.

I think that it is safe to say that there is no regression :-)

YingboMa · 2019-04-13T00:02:13Z

This PR needs SciML/DiffEqBase.jl#204

ChrisRackauckas · 2019-04-13T10:02:36Z

src/nlsolve/utils.jl

@@ -50,7 +50,7 @@ function qradd!(Q::AbstractMatrix, R::AbstractMatrix, v::AbstractVector, k::Int)
  @inbounds begin
    d = norm(v)
    R[k, k] = d
-    @. Q[:, k] = v / d
+    @.. @view(Q[:, k]) = v / d


oh lol. That might effect timings.

ChrisRackauckas · 2019-04-13T10:49:30Z

Nord fails.

Switch to the fast broadcast implementation

3aa1c1e

YingboMa requested review from devmotion and ChrisRackauckas April 12, 2019 03:32

YingboMa added 4 commits April 11, 2019 23:49

Fix RKN error

7e60e16

Fix Anderson nlsolve

d18fe25

Fix EPIRK errors

8e59076

Fix add step

2e48dbc

Fix Verns

4a372da

YingboMa added 3 commits April 12, 2019 17:17

Broadcast addsteps!

f3810c8

Fix addstep test errors

148c712

Fix DP8 dense output

c39210f

ChrisRackauckas reviewed Apr 13, 2019

View reviewed changes

Update low_order_rk_addsteps.jl

f4972a5

YingboMa added 2 commits April 13, 2019 18:28

Add no index tests

c9c42a9

Fix interpolants' high order derivative

a445431

ChrisRackauckas approved these changes Apr 13, 2019

View reviewed changes

ChrisRackauckas merged commit 2d04bba into master Apr 13, 2019

ChrisRackauckas deleted the myb/fastbc branch April 13, 2019 23:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to the fast broadcast implementation #716

Switch to the fast broadcast implementation #716

YingboMa commented Apr 12, 2019

YingboMa commented Apr 12, 2019 •

edited

Loading

codecov bot commented Apr 12, 2019 •

edited

Loading

coveralls commented Apr 12, 2019 •

edited

Loading

YingboMa commented Apr 12, 2019

YingboMa commented Apr 13, 2019

ChrisRackauckas Apr 13, 2019

ChrisRackauckas commented Apr 13, 2019

Switch to the fast broadcast implementation #716

Switch to the fast broadcast implementation #716

Conversation

YingboMa commented Apr 12, 2019

YingboMa commented Apr 12, 2019 • edited Loading

codecov bot commented Apr 12, 2019 • edited Loading

Codecov Report

coveralls commented Apr 12, 2019 • edited Loading

YingboMa commented Apr 12, 2019

YingboMa commented Apr 13, 2019

ChrisRackauckas Apr 13, 2019

Choose a reason for hiding this comment

ChrisRackauckas commented Apr 13, 2019

YingboMa commented Apr 12, 2019 •

edited

Loading

codecov bot commented Apr 12, 2019 •

edited

Loading

coveralls commented Apr 12, 2019 •

edited

Loading