Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime dispatching in inner products #41

Closed
bclyons12 opened this issue Jun 15, 2023 · 4 comments
Closed

Runtime dispatching in inner products #41

bclyons12 opened this issue Jun 15, 2023 · 4 comments
Labels
bug Something isn't working enhancement New feature or request

Comments

@bclyons12
Copy link
Member

Right now I'm getting lots of allocations when computing the inner products in TEQUILA. It's a pretty complicated chain of function calls with intermediary functions being called. @sjkelly suggested that it's related to JuliaLang/julia#15276. The performance implications of variable capture are described here.

As suggested by @sjkelly, I tried replacing the functions defined inside of other functions with anonymous functions. I also tried introducing let statements and using @closure from FastClosures.jl. None of this had any significant impact on performance. I didn't push any of the things I tried above since they didn't help, but I'm hoping to get some help identifying where I've gone wrong.

Here's the simplest sequence that demonstrates the problem that I've found.

using TEQUILA
shot =  Shot(101, 31, 15, "TEQUILA/sample/g_chease_mxh_d3d");
surfaces = deepcopy(shot.surfaces);
@time shot_refit = Shot(shot.N, shot.M, shot.ρ, surfaces, shot;
                             P = shot.P, dP_dψ = shot.dP_dψ,
                             F_dF_dψ = shot.F_dF_dψ, Jt_R = shot.Jt_R, Jt = shot.Jt,
                             Pbnd = shot.Pbnd, Fbnd = shot.Fbnd, Ip_target = shot.Ip_target);
  0.979654 seconds (787 allocations: 442.250 KiB)
@profview Shot(shot.N, shot.M, shot.ρ, surfaces, shot;
                                    P = shot.P, dP_dψ = shot.dP_dψ,
                                    F_dF_dψ = shot.F_dF_dψ, Jt_R = shot.Jt_R, Jt = shot.Jt,
                                    Pbnd = shot.Pbnd, Fbnd = shot.Fbnd, Ip_target = shot.Ip_target);

Here's the file that was used: g_chease_mxh_d3d.txt

Here's the profiling:
Screen Shot 2023-06-14 at 8 47 14 PM

@sjkelly
Copy link

sjkelly commented Jun 30, 2023

This turns out to be a case of:
https://docs.julialang.org/en/v1/manual/performance-tips/#Be-aware-of-when-Julia-avoids-specializing
Where Function is not strictly specialized.

I did a quick test by parameterizing the function definitions from e.g. f(x::Function) to f(x::F) where {F<:Function} and the dynamic dispatch is removed from the profiling flamegraph. There is a small reduction in allocations, and not a noticeable performance difference.

However, there is a union split or dynamic dispatch that is not detected in this limits function where it returns two different length SVectors:
https://github.com/ProjectTorreyPines/FiniteElementHermite.jl/blob/69475066261ad6cb46060da2d90fbef153dd5256/src/hermite.jl#L379-L390

@bclyons12
Copy link
Member Author

@sjkelly Thanks! Learned something new. I'll play with this to see how it affects performance, and I'll see if I can eliminate that type instability in limits()

@lstagner
Copy link

For the limits function you can dispatch on a Value type so that each length value gets its own compiled function.

@bclyons12
Copy link
Member Author

This appeared to be fixed by avoiding this issue https://docs.julialang.org/en/v1/manual/performance-tips/#Be-aware-of-when-Julia-avoids-specializing . In addition, the new quadrature routines avoid most of the function evaluations in favor of predefined array summing, so this is no longer an issue.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants