Add option to not reorder loops? #459

johnbcoughlin · 2023-01-11T21:56:10Z

As noted in #126 and in my own usage of the library, compile time is roughly factorial in the number of nested loops. For many applications, it is not too hard to guess a loop ordering that should be good enough. For example, consider this set of 6 nested loops:

    for λx in 1:Kx, λy in 1:Ky, λvx in 1:Kvx, λvy in 1:Kvy
        for αx in 1:Nx, αv in 1:Nv
            ax = as_x[λx]
            ay = as_y[λy]
            avx = as_vx[λvx]
            avy = as_vy[λvy]

            # Do some work
        end
    end

The indices λx, λy etc. are larger strides than the indices αx, αv. So we probably shouldn't even consider loop orders where the alphas are outside. In terms of ordering the outer loops, I think that a human should be able to do a pretty good job just based on the way the data is laid out in memory. It would be nice to be able to take advantage of the automatic unrolling and vectorization capabilities of LV without incurring a compile time blowup.

Is this a feasible suggestion based on how the library is structured right now?

The text was updated successfully, but these errors were encountered:

chriselrod · 2023-01-12T03:33:42Z

I think that a human should be able to do a pretty good job just based on the way the data is laid out in memory.

That is a function of type. E.g., Adjointand PermutedDimsArray.
People do often get it wrong. "Fastest dimensions on the inside" is not always fastest.

I agree. LoppModels (the rewrite) of avoiding the factorial behavior.

There is #398, but I didn't actually notice it helping for some reason. It's worth looking more into.

Is this a feasible suggestion based on how the library is structured right now?

#398 is simple enough, and avoids the factorial algorithm.
Avoiding reordering at all is more difficult, because LV forgets what the original order actually was.

chriselrod · 2023-01-12T03:37:00Z

Maybe the logic to actually skip the factorial behavior is broken in #398.

johnbcoughlin · 2023-01-12T03:58:36Z

I am doing some more investigation into my slow compile times, and I think that it may actually have more to do with LLVM being quite slow to deal with high-dimensional arrays than with LV--the compile time for my generated function is extremely slow even with no @turbo annotation.

chriselrod · 2023-01-12T16:35:28Z

I am doing some more investigation into my slow compile times, and I think that it may actually have more to do with LLVM being quite slow to deal with high-dimensional arrays than with LV--the compile time for my generated function is extremely slow even with no @turbo annotation.

Are you generating tons of unrolled code?
If you blame LLVM, I suggest starting Julia with

JULIA_LLVM_ARGS="-time-passes" julia

It may also be useful to look into the Julia compiler itself.
If you build Julia from source, you can enable timings:
https://github.com/JuliaLang/julia/blob/b07484ca39a963b49fe31b8d5d2ceee4864f1737/src/options.h#L82
Knowing more about where time is spent in the compiler might help figure out how how to reduce it.

johnbcoughlin closed this as completed Jan 12, 2023

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to not reorder loops? #459

Add option to not reorder loops? #459

johnbcoughlin commented Jan 11, 2023

chriselrod commented Jan 12, 2023

chriselrod commented Jan 12, 2023

johnbcoughlin commented Jan 12, 2023

chriselrod commented Jan 12, 2023 •

edited

Loading

Add option to not reorder loops? #459

Add option to not reorder loops? #459

Comments

johnbcoughlin commented Jan 11, 2023

chriselrod commented Jan 12, 2023

chriselrod commented Jan 12, 2023

johnbcoughlin commented Jan 12, 2023

chriselrod commented Jan 12, 2023 • edited Loading

chriselrod commented Jan 12, 2023 •

edited

Loading