Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manually union split chunksize calculation #1536

Closed
wants to merge 4 commits into from
Closed

Conversation

ChrisRackauckas
Copy link
Member

@ChrisRackauckas ChrisRackauckas commented Dec 9, 2021

using DifferentialEquations, SnoopCompile

function lorenz(du,u,p,t)
 du[1] = 10.0(u[2]-u[1])
 du[2] = u[1]*(28.0-u[3]) - u[2]
 du[3] = u[1]*u[2] - (8/3)*u[3]
end

u0 = [1.0;0.0;0.0]
tspan = (0.0,100.0)
prob = ODEProblem(lorenz,u0,tspan)
alg = Rodas5()
tinf = @snoopi_deep solve(prob,alg)

Before:

InferenceTimingNode: 1.524478/15.326828 on Core.Compiler.Timings.ROOT() with 4 direct children

julia> inference_triggers(tinf)
3-element Vector{InferenceTrigger}:
 Inference triggered to call (NamedTuple{(:chunk_size,)})(::Tuple{Val{3}}) from prepare_alg (C:\Users\accou\.julia\dev\OrdinaryDiffEq\src\alg_utils.jl:174) with specialization DiffEqBase.prepare_alg(::Rodas5{0, true, DefaultLinSolve, Val{:forward}}, ::Vector{Float64}, ::SciMLBase.NullParameters, ::ODEProblem{Vector{Float64}, Tuple{Float64, Float64}, true, SciMLBase.NullParameters, ODEFunction{true, typeof(lorenz), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem})
 Inference triggered to call DiffEqBase.solve_call(::ODEProblem{Vector{Float64}, Tuple{Float64, Float64}, true, SciMLBase.NullParameters, ODEFunction{true, typeof(lorenz), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, ::Rodas5{3, true, DefaultLinSolve, Val{:forward}}) from #solve_up#44 (C:\Users\accou\.julia\packages\DiffEqBase\b1nST\src\solve.jl:87) with specialization DiffEqBase.var"#solve_up#44"(::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, ::typeof(DiffEqBase.solve_up), ::ODEProblem{Vector{Float64}, Tuple{Float64, Float64}, true, SciMLBase.NullParameters, ODEFunction{true, typeof(lorenz), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, ::Nothing, ::Vector{Float64}, ::SciMLBase.NullParameters, ::Rodas5{0, true, DefaultLinSolve, Val{:forward}})
 Inference triggered to call OrdinaryDiffEq.jacobian2W!(::Matrix{Float64}, ::LinearAlgebra.UniformScaling{Bool}, ::Float64, ::Matrix{Float64}, ::Bool) called from toplevel

After:

InferenceTimingNode: 3.082193/16.376914 on Core.Compiler.Timings.ROOT() with 2 direct children

julia> inference_triggers(tinf)
1-element Vector{InferenceTrigger}:
 Inference triggered to call OrdinaryDiffEq.jacobian2W!(::Matrix{Float64}, ::LinearAlgebra.UniformScaling{Bool}, ::Float64, ::Matrix{Float64}, ::Bool) called from toplevel

That's without the static array handling branch.

@ChrisRackauckas ChrisRackauckas changed the title WIP: manually union split chunksize calculation Manually union split chunksize calculation Dec 9, 2021
@ChrisRackauckas
Copy link
Member Author

Compile time is not improved, but having everything be non-dynamic in its behavior is nice for other reasons...

Copy link
Contributor

@chriselrod chriselrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be worth refactoring things a bit.

Ideally, the choice of chunk size would be made at a point where types converge again, so that you don't ever actually have to deal with these unions or worry about max_methods.

elseif chunk_size > 4
cs = Val{4}()
remake(alg,chunk_size=cs)
else
Copy link
Contributor

@chriselrod chriselrod Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth adding branches for 2 and 3? It'd cost some compile time, but I'm worried about possible runtime regressions for small problems.

This comment is also conditional on max_methods.
Currently, you're getting a union of 3 return types.
That breaks when we set max_methods<3 (e.g. =1).

My proposal (5 return types) breaks currently. Hence, the above comment on performing this split at a point of return type convergence.

Copy link
Member

@YingboMa YingboMa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would break PreallocationTools.jl because it just defaults to pickchunksize https://github.com/SciML/PreallocationTools.jl/blob/master/src/PreallocationTools.jl#L26

@ChrisRackauckas
Copy link
Member Author

It can handle resizing to smaller chunksizes though.

@YingboMa
Copy link
Member

YingboMa commented Dec 9, 2021

Did you run the script on the same computer? Looks like that the inference time regressed.

@ChrisRackauckas
Copy link
Member Author

This comment is also conditional on max_methods.
Currently, you're getting a union of 3 return types.
That breaks when we set max_methods<3 (e.g. =1).
My proposal (5 return types) breaks currently. Hence, the above comment on performing this split at a point of return type convergence.

Yeah.. 1,2,3,4,8 would be best. 1 or 5 are the best choices, 3 is a bad one 😅

@ChrisRackauckas
Copy link
Member Author

@static if maxmethods = 3... 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣

Dear god the compiler team would hate me.

@ChrisRackauckas
Copy link
Member Author

Did you run the script on the same computer? Looks like that the inference time regressed.

Yeah, I mentioned that above.

Compile time is not improved, but having everything be non-dynamic in its behavior is nice for other reasons...

I don't know if there's also a runtime hit too. So it's hard to tell if this is a real PR or something mostly for testing.

@chriselrod
Copy link
Contributor

I don't know if there's also a runtime hit too. So it's hard to tell if this is a real PR or something mostly for testing.

I think 4 and 8 will be good, especially with JuliaDiff/ForwardDiff.jl@d033d2a
which should guarantee those SIMD well.
Although my testing suggests the explicit SIMD does not actually help the HCV Pumas model's runtime performance, while hurting the compilation time.

@YingboMa
Copy link
Member

YingboMa commented Dec 9, 2021

Have you tried to only split 4 and 1?

src/alg_utils.jl Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants