-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Packages take too long to load #7280
Comments
dup of #4373 I know everybody is frustrated, myself included. |
One possibility might be to precompile more code that we know common packages need. |
@JeffBezanson, should we interpret your comment as meaning that you think there may be room for substantially faster loading without the need for a julia analog of |
Yes. Something in there is not scaling well. I have some old performance data where Winston used to load in ~5 seconds, and now it takes 9 seconds on the same machine. It is possible that compiling top-level expressions is an issue. The vast majority of top-level expressions should not require native code gen (e.g. they just add definitions), but we compile a lot of them out of pure fear that somebody will benchmark something inside one. There is also a large amount of type diversity that is making it harder for the compiler to converge. My favorite example is that the package manager code uses 17 different types of Dicts (when I last counted about a year ago). I don't know how to address this problem. Julia itself could also probably start up faster. Of course it's gotten much better, but we're still looking at ~0.5 second. I suspect a lot of time is spent deserializing the system image data, but this deserves a close look now that the "first 90%" of startup time has been solved. We also spend a lot of time in the LLVM optimizer (~17% of Winston loading in my year-ago measurement), which may not be worth it for lots of code, but this is tough to solve. Type inference also spends too much time analyzing itself, due to every type in the system streaming through it. It's very hard to avoid this without pessimizing user code. We might need some special cases for |
I think you're talking about addressing the problem in general, but we could stop using such specifically typed dictionaries in the package manager. |
I would be sad to see such a major piece of the implementation move from Julia to C. |
A lot of the "type diversity" comes from having too many types generally. For example, all of |
I like the idea of studying it. AFAICT, as reported by |
Isn't it the case that caching the compiled image somewhere is an easier solution? (as we discussed in #4373, @JeffBezanson, thanks for pointing to that issue). I think it is pretty acceptable if a package compiles as fast as the Julia Base itself. |
I think both caching and speeding things up are worthwhile. But if we can make loading faster without caching, it will help package authors iterate faster for example. Also, caching is not as easy as it sounds --- it is very hard to know if some existing native code is consistent with all the currently loaded definitions. Furthermore, many packages like to make run-time decisions during loading, effectively making their code impossible to cache. |
Explicit caching is also an option – i.e. making package caching opt-in. Major packages would clearly do so to improve their load times. That would make it clearer that run-time decisions are either going to be cached or have to happen after loading the cached code. |
OK, I basically got complete backtracing on I'd be happy to post the data somewhere, if others want to analyze it too. |
In the meantime here's a gist. These are the triggering lines, sorted in alphabetical order. (Sampling interval was 10ms, just to avoid overfilling the profile buffer.) |
It would be good if that result is accurate. It will probably be easier to optimize the front-end than inference.jl. One promising option is to port the front-end to Chicken scheme, which many people seem to feel is a great scheme-to-C compiler. |
Cough, JuliaParser, cough. |
Although Chicken may be good too. |
Julia is not fast enough for our needs :-P |
Agreed that interpretation is key. FYI here is what I did: compile the debug version of Julia, then
For some reason, this works beautifully on one machine (a SandyBridge Xeon E5-2650 system running CentOS 6.4), but the C lookups are nearly useless on my laptop (i7-640LM running Kubuntu 14.04). That's part of why I'm offering to post the data, if you want it. You can extract the triggering line by doing something like this:
|
Jeff, since 76% of the samples were collected in |
It's parsing and lowering. Generally lowering takes a bit longer than parsing. The only difficulty with caching this is that the cache for one file has to be invalidated if it uses a macro defined in another file that changes. Jameson started implementing this: #5061 |
Cutting the loading time to half or one third of the current status would make many packages much nicer to use. |
Ugh. That's a tough one. I suppose having to explicitly declare one's dependencies doesn't count as a solution. Do we even have an analog of |
If significant time is spent on LLVM optimizer passes, would it be practical to add a switch to julia disable all or some of them? The issue I have with developing Gadfly is having to reload it and draw one plot over and over. So if I tweak something, it takes like 40 seconds to see the result. Since I'm only drawing one plot, I imagine some of that optimization is counter-productive. |
Gadfly is too slow, ggplot2 or R basic plot are better than this. In that, I don't call Julia to draw a plot. |
I know this has been discussed for many times, and support of (cached) precompiled packages is something in plan. However, I think it is useful to have a dedicated issue for this.
The long loading time of packages is becoming worse as packages grow. Many important packages take seconds to load. This has already caused unnecessary tension between having sophisticated packages and reducing package loading time.
I measured the time of loading some important packages (on a quite decent Mac Pro with i7 CPU and 16GB Ram). Results are below:
For comparison, packages in Python is much bigger but loads much faster. e.g.
scipy
takes about70 ms
to load, whilematplotlib
takes about92 ms
.I think it is time that we seriously consider this, and have the facility for package pre-compiling and caching as soon as possible.
The text was updated successfully, but these errors were encountered: