Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: shorten compile time with nospecialize #76

Closed
wants to merge 1 commit into from
Closed

WIP: shorten compile time with nospecialize #76

wants to merge 1 commit into from

Conversation

timholy
Copy link
Contributor

@timholy timholy commented Mar 16, 2020

This is a branch that, for the moment, is mostly for JuliaLang/julia#35131. Steps to reproduce: after deving this branch, do

julia> using LoopVectorization

shell> pwd
/home/tim/.julia/dev/LoopVectorization/test

julia> include("broadcast.jl")
(T, #= /home/tim/.julia/dev/LoopVectorization/test/broadcast.jl:7 =# @__LINE__()) = (Float32, 7)
(T, #= /home/tim/.julia/dev/LoopVectorization/test/broadcast.jl:7 =# @__LINE__()) = (Float64, 7)
(T, #= /home/tim/.julia/dev/LoopVectorization/test/broadcast.jl:7 =# @__LINE__()) = (Int32, 7)
(T, #= /home/tim/.julia/dev/LoopVectorization/test/broadcast.jl:7 =# @__LINE__()) = (Int64, 7)
Test Summary: | Pass  Total
broadcast     |   74     74
Test.DefaultTestSet("broadcast", Any[], 74, false)

julia> include("/home/tim/src/julia/nspecializations.jl")   # just if you want a quick count
allmethods (generic function with 1 method)

julia> nspecializations(first(methods(LoopVectorization.abstractparameters)))
33

To be clear, the point of abstractparameters is to allow one to write certain methods without a where clause and still be able to access the parameters. I am assuming that having the where will force specialization independent of a @nospecialize. (If that's not correct, I'd rather get rid of abstractparameters and go back to the way they were before.) So it's ironic that abstractparameters itself seems to get inferred for many different input types.

@chriselrod
Copy link
Member

Running the test suite prints a lot of times. How noisy are Travis runs?

I can also run the tests locally to compare as well.

@timholy
Copy link
Contributor Author

timholy commented Mar 16, 2020

Do you mean, can one use the time as a guide to progress on this issue? Or, might I have left a stray @show or something in a previous PR? I get a lot of output too.

@timholy
Copy link
Contributor Author

timholy commented Mar 16, 2020

Actually, this might already be helping: if memory serves it used to be ~205s for me to run the whole test suite, this run came out as 177. Of course that's completely uncontrolled (what else was my machine doing at the same time? was I plugged into power? did packages have to be compiled first?), but something to keep an eye on.

Travis has a heterogeneous collection of machines, I think, so probably not very accurate to compare run-to-run timings on Travis.

@chriselrod
Copy link
Member

chriselrod commented Mar 16, 2020

Yes, I was wondering if we could use Travis to track progress.
The latest run on master, Julia 1.1, Linux: 687 seconds
This PR, Julia 1.1, Linux: 666 seconds

How many other builds were ongoing / competing for CI's resources, etc?
I could do more controlled runs on a home desktop.

@timholy
Copy link
Contributor Author

timholy commented Mar 16, 2020

Here's results from a more controlled experiment where I ran each twice:

  • master:
    • 213.179568
    • 211.746881
  • this PR:
    • 212.729866
    • 214.747268

I don't know what happened with that one fast run, but so far this isn't helping. Of course, if we've identified a crack in the control of specializations (which I suspect we have), the conclusion might change.

These tests were done on Julia 1.3.

@chriselrod
Copy link
Member

Would it be better to ditch the precompile file when timing this? That file shows a huge list of methods from the broadcast file I don't want it specializing on, such as specific static sizes or patterns of true/false in LowDimArray.

On a couple day old build of Julia master, I got, with LoopVectorization master
255.619275 seconds (743.55 M allocations: 32.386 GiB, 2.51% gc time)
255.793652 seconds (743.55 M allocations: 32.386 GiB, 2.61% gc time)
255.626683 seconds (743.55 M allocations: 32.386 GiB, 2.59% gc time)

This PR, after merging master
253.579559 seconds (739.90 M allocations: 32.215 GiB, 2.64% gc time)
253.793647 seconds (739.90 M allocations: 32.215 GiB, 2.61% gc time)
252.885642 seconds (739.90 M allocations: 32.215 GiB, 2.59% gc time)

I've disabled all turboing on this desktop (i.e., it'll run at the same speed regardless of how many cores are under load, or how long it has been working [although speeds will change as a function of work, so clocks vary with sse/avx/avx512/idle, but that should be consistent within a given workload]; but that does mean that the max speed it I have it configured to run at is actually 0.2 GHz slower than the stock max single core boost).

There's still a fair bit of noise, but this PR does already look a bit faster on my computer. It seems to allocate a little less memory.

@timholy
Copy link
Contributor Author

timholy commented Mar 18, 2020

Great observation. This PR is very much partial, so it might get even better. Still, I have a sense that something isn't right, so I'll wait a bit and see if Jeff discovers anything else.

@timholy
Copy link
Contributor Author

timholy commented Apr 14, 2020

Superseded by #97

@timholy timholy closed this Apr 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants