WIP: shorten compile time with nospecialize #76

timholy · 2020-03-16T21:26:40Z

This is a branch that, for the moment, is mostly for JuliaLang/julia#35131. Steps to reproduce: after deving this branch, do

julia> using LoopVectorization

shell> pwd
/home/tim/.julia/dev/LoopVectorization/test

julia> include("broadcast.jl")
(T, #= /home/tim/.julia/dev/LoopVectorization/test/broadcast.jl:7 =# @__LINE__()) = (Float32, 7)
(T, #= /home/tim/.julia/dev/LoopVectorization/test/broadcast.jl:7 =# @__LINE__()) = (Float64, 7)
(T, #= /home/tim/.julia/dev/LoopVectorization/test/broadcast.jl:7 =# @__LINE__()) = (Int32, 7)
(T, #= /home/tim/.julia/dev/LoopVectorization/test/broadcast.jl:7 =# @__LINE__()) = (Int64, 7)
Test Summary: | Pass  Total
broadcast     |   74     74
Test.DefaultTestSet("broadcast", Any[], 74, false)

julia> include("/home/tim/src/julia/nspecializations.jl")   # just if you want a quick count
allmethods (generic function with 1 method)

julia> nspecializations(first(methods(LoopVectorization.abstractparameters)))
33

To be clear, the point of abstractparameters is to allow one to write certain methods without a where clause and still be able to access the parameters. I am assuming that having the where will force specialization independent of a @nospecialize. (If that's not correct, I'd rather get rid of abstractparameters and go back to the way they were before.) So it's ironic that abstractparameters itself seems to get inferred for many different input types.

chriselrod · 2020-03-16T21:32:31Z

Running the test suite prints a lot of times. How noisy are Travis runs?

I can also run the tests locally to compare as well.

timholy · 2020-03-16T21:34:40Z

Do you mean, can one use the time as a guide to progress on this issue? Or, might I have left a stray @show or something in a previous PR? I get a lot of output too.

timholy · 2020-03-16T21:39:49Z

Actually, this might already be helping: if memory serves it used to be ~205s for me to run the whole test suite, this run came out as 177. Of course that's completely uncontrolled (what else was my machine doing at the same time? was I plugged into power? did packages have to be compiled first?), but something to keep an eye on.

Travis has a heterogeneous collection of machines, I think, so probably not very accurate to compare run-to-run timings on Travis.

chriselrod · 2020-03-16T21:42:31Z

Yes, I was wondering if we could use Travis to track progress.
The latest run on master, Julia 1.1, Linux: 687 seconds
This PR, Julia 1.1, Linux: 666 seconds

How many other builds were ongoing / competing for CI's resources, etc?
I could do more controlled runs on a home desktop.

timholy · 2020-03-16T22:03:44Z

Here's results from a more controlled experiment where I ran each twice:

master:
- 213.179568
- 211.746881
this PR:
- 212.729866
- 214.747268

I don't know what happened with that one fast run, but so far this isn't helping. Of course, if we've identified a crack in the control of specializations (which I suspect we have), the conclusion might change.

These tests were done on Julia 1.3.

chriselrod · 2020-03-18T08:50:11Z

Would it be better to ditch the precompile file when timing this? That file shows a huge list of methods from the broadcast file I don't want it specializing on, such as specific static sizes or patterns of true/false in LowDimArray.

On a couple day old build of Julia master, I got, with LoopVectorization master
255.619275 seconds (743.55 M allocations: 32.386 GiB, 2.51% gc time)
255.793652 seconds (743.55 M allocations: 32.386 GiB, 2.61% gc time)
255.626683 seconds (743.55 M allocations: 32.386 GiB, 2.59% gc time)

This PR, after merging master
253.579559 seconds (739.90 M allocations: 32.215 GiB, 2.64% gc time)
253.793647 seconds (739.90 M allocations: 32.215 GiB, 2.61% gc time)
252.885642 seconds (739.90 M allocations: 32.215 GiB, 2.59% gc time)

I've disabled all turboing on this desktop (i.e., it'll run at the same speed regardless of how many cores are under load, or how long it has been working [although speeds will change as a function of work, so clocks vary with sse/avx/avx512/idle, but that should be consistent within a given workload]; but that does mean that the max speed it I have it configured to run at is actually 0.2 GHz slower than the stock max single core boost).

There's still a fair bit of noise, but this PR does already look a bit faster on my computer. It seems to allocate a little less memory.

timholy · 2020-03-18T09:13:34Z

Great observation. This PR is very much partial, so it might get even better. Still, I have a sense that something isn't right, so I'll wait a bit and see if Jeff discovers anything else.

timholy · 2020-04-14T13:02:51Z

Superseded by #97

WIP: shorten compile time with nospecialize

8b737dc

timholy mentioned this pull request Mar 16, 2020

Specialization despite nospecialize annotation JuliaLang/julia#35131

Closed

This was referenced Mar 17, 2020

A simpler DSL? (Pass code literal to _avx_!) #74

Closed

CompatHelper: bump compat for "Colors" to "0.12" JuliaImages/ImageCore.jl#124

Merged

timholy closed this Apr 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: shorten compile time with nospecialize #76

WIP: shorten compile time with nospecialize #76

timholy commented Mar 16, 2020

chriselrod commented Mar 16, 2020

timholy commented Mar 16, 2020 •

edited

Loading

timholy commented Mar 16, 2020 •

edited

Loading

chriselrod commented Mar 16, 2020 •

edited

Loading

timholy commented Mar 16, 2020

chriselrod commented Mar 18, 2020

timholy commented Mar 18, 2020

timholy commented Apr 14, 2020

WIP: shorten compile time with nospecialize #76

WIP: shorten compile time with nospecialize #76

Conversation

timholy commented Mar 16, 2020

chriselrod commented Mar 16, 2020

timholy commented Mar 16, 2020 • edited Loading

timholy commented Mar 16, 2020 • edited Loading

chriselrod commented Mar 16, 2020 • edited Loading

timholy commented Mar 16, 2020

chriselrod commented Mar 18, 2020

timholy commented Mar 18, 2020

timholy commented Apr 14, 2020

timholy commented Mar 16, 2020 •

edited

Loading

timholy commented Mar 16, 2020 •

edited

Loading

chriselrod commented Mar 16, 2020 •

edited

Loading