Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packages take too long to load #7280

Closed
lindahua opened this issue Jun 16, 2014 · 24 comments
Closed

Packages take too long to load #7280

lindahua opened this issue Jun 16, 2014 · 24 comments
Labels
packages Package management and loading

Comments

@lindahua
Copy link
Contributor

I know this has been discussed for many times, and support of (cached) precompiled packages is something in plan. However, I think it is useful to have a dedicated issue for this.

The long loading time of packages is becoming worse as packages grow. Many important packages take seconds to load. This has already caused unnecessary tension between having sophisticated packages and reducing package loading time.

I measured the time of loading some important packages (on a quite decent Mac Pro with i7 CPU and 16GB Ram). Results are below:

Package Name Load time (second)
StatsBase 0.92
Graphs 1.04
Distributions 3.07
DataFrames 4.26
PyPlot 7.15
Winston 8.19
Gadfly 16.50

For comparison, packages in Python is much bigger but loads much faster. e.g. scipy takes about 70 ms to load, while matplotlib takes about 92 ms.

I think it is time that we seriously consider this, and have the facility for package pre-compiling and caching as soon as possible.

@JeffBezanson
Copy link
Member

dup of #4373

I know everybody is frustrated, myself included.
Conditional dependencies should help Gadfly, which loads slowly mostly because it depends on almost every other major package. Other than that, we need more analysis. It is possible we are hitting some bad critical paths deep in the system.

@StefanKarpinski
Copy link
Member

One possibility might be to precompile more code that we know common packages need.

@timholy
Copy link
Member

timholy commented Jun 16, 2014

@JeffBezanson, should we interpret your comment as meaning that you think there may be room for substantially faster loading without the need for a julia analog of *.pyc files?

@JeffBezanson
Copy link
Member

Yes. Something in there is not scaling well. I have some old performance data where Winston used to load in ~5 seconds, and now it takes 9 seconds on the same machine.

It is possible that compiling top-level expressions is an issue. The vast majority of top-level expressions should not require native code gen (e.g. they just add definitions), but we compile a lot of them out of pure fear that somebody will benchmark something inside one.

There is also a large amount of type diversity that is making it harder for the compiler to converge. My favorite example is that the package manager code uses 17 different types of Dicts (when I last counted about a year ago). I don't know how to address this problem.

Julia itself could also probably start up faster. Of course it's gotten much better, but we're still looking at ~0.5 second. I suspect a lot of time is spent deserializing the system image data, but this deserves a close look now that the "first 90%" of startup time has been solved.

We also spend a lot of time in the LLVM optimizer (~17% of Winston loading in my year-ago measurement), which may not be worth it for lots of code, but this is tough to solve.

Type inference also spends too much time analyzing itself, due to every type in the system streaming through it. It's very hard to avoid this without pessimizing user code. We might need some special cases for inference.jl, or just rewrite in C, which will also make bootstrapping faster and easier.

@StefanKarpinski
Copy link
Member

My favorite example is that the package manager code uses 17 different types of Dicts (when I last counted about a year ago). I don't know how to address this problem.

I think you're talking about addressing the problem in general, but we could stop using such specifically typed dictionaries in the package manager.

@StefanKarpinski
Copy link
Member

Type inference also spends too much time analyzing itself, due to every type in the system streaming through it. It's very hard to avoid this without pessimizing user code. We might need some special cases for inference.jl, or just rewrite in C, which will also make bootstrapping faster and easier.

I would be sad to see such a major piece of the implementation move from Julia to C.

@JeffBezanson
Copy link
Member

A lot of the "type diversity" comes from having too many types generally. For example, all of ASCIIString, UTF8String, and ByteString are used for Dict keys. That's 9 types of basic String=>String Dicts for no real reason (and if you throw in String, 16 types). Of course not all of these are used, but that's the space the type entropy is trying to occupy.

@timholy
Copy link
Member

timholy commented Jun 16, 2014

I like the idea of studying it. AFAICT, as reported by ProfileView.view(C=true) all the time is spent inside C code. At least on Linux, the ability to get useful lookups from C instruction pointers (ips) is so limited, it's hard to study. When I do @profile reload("DataFrames"), the only useful hits are in jl_load (big surprise) and jl_expand. I'd wager >90% of ips are not usefully decoded. But basically it looks like all the time is in libjulia.so, even if I delete sys.so.

@lindahua
Copy link
Contributor Author

Isn't it the case that caching the compiled image somewhere is an easier solution? (as we discussed in #4373, @JeffBezanson, thanks for pointing to that issue).

I think it is pretty acceptable if a package compiles as fast as the Julia Base itself.

@JeffBezanson
Copy link
Member

I think both caching and speeding things up are worthwhile. But if we can make loading faster without caching, it will help package authors iterate faster for example. Also, caching is not as easy as it sounds --- it is very hard to know if some existing native code is consistent with all the currently loaded definitions. Furthermore, many packages like to make run-time decisions during loading, effectively making their code impossible to cache.

@StefanKarpinski
Copy link
Member

Explicit caching is also an option – i.e. making package caching opt-in. Major packages would clearly do so to improve their load times. That would make it clearer that run-time decisions are either going to be cached or have to happen after loading the cached code.

@timholy
Copy link
Member

timholy commented Jun 16, 2014

OK, I basically got complete backtracing on @profile reload("DataFrames"). Looks like the large majority of the time is being spent in flisp.c: apply_cl. EDIT: There are three hotspots inside that function, but it's not easy to tell which because we can't look up source-code lines in C code (they're instruction-pointer offsets).

I'd be happy to post the data somewhere, if others want to analyze it too.

@timholy
Copy link
Member

timholy commented Jun 16, 2014

In the meantime here's a gist. These are the triggering lines, sorted in alphabetical order. (Sampling interval was 10ms, just to avoid overfilling the profile buffer.) inference.jl accounts for almost none of the time (5 out of 876 triggers).

@JeffBezanson
Copy link
Member

It would be good if that result is accurate. It will probably be easier to optimize the front-end than inference.jl. One promising option is to port the front-end to Chicken scheme, which many people seem to feel is a great scheme-to-C compiler.

@StefanKarpinski
Copy link
Member

Cough, JuliaParser, cough.

@StefanKarpinski
Copy link
Member

Although Chicken may be good too.

@JeffBezanson
Copy link
Member

Julia is not fast enough for our needs :-P

@timholy
Copy link
Member

timholy commented Jun 16, 2014

Agreed that interpretation is key. FYI here is what I did: compile the debug version of Julia, then

Profile.init(10^7, 0.01)
using StatsBase
@profile reload("DataFrames")  # I chose DataFrames for its lack of macros
ips, lidict = Profile.retrieve()
using HDF5, JLD
@save "/tmp/reloaddataframes.jld" ips lidict

For some reason, this works beautifully on one machine (a SandyBridge Xeon E5-2650 system running CentOS 6.4), but the C lookups are nearly useless on my laptop (i7-640LM running Kubuntu 14.04). That's part of why I'm offering to post the data, if you want it.

You can extract the triggering line by doing something like this:

ends = find(ips .== 0)
starts = [4,ends[1:end-1].+4]   # on my machine there are 3 ips for the signal handler, etc
for i = 1:length(starts); println(lidict[ips[starts[i]]]); end

@timholy
Copy link
Member

timholy commented Jun 19, 2014

Jeff, since 76% of the samples were collected in flisp.c, let me ask: does the lisp code only run during the lowering step? Rather than having to deal with the full complexity of caching an .so file, what about caching just the lowered representation? Seems like it might provide a 4x speed bump, which would be quite noticeable, and it's something that I (naively) imagine might not take many lines of code.

@JeffBezanson
Copy link
Member

It's parsing and lowering. Generally lowering takes a bit longer than parsing. The only difficulty with caching this is that the cache for one file has to be invalidated if it uses a macro defined in another file that changes.

Jameson started implementing this: #5061

@lindahua
Copy link
Contributor Author

Cutting the loading time to half or one third of the current status would make many packages much nicer to use.

@timholy
Copy link
Member

timholy commented Jun 19, 2014

The only difficulty with caching this is that the cache for one file has to be invalidated if it uses a macro defined in another file that changes.

Ugh. That's a tough one. I suppose having to explicitly declare one's dependencies doesn't count as a solution. Do we even have an analog of @which that works for macros? Even worse, what happens when the user renames the file containing the macro?

@dcjones
Copy link
Contributor

dcjones commented Jun 21, 2014

If significant time is spent on LLVM optimizer passes, would it be practical to add a switch to julia disable all or some of them? The issue I have with developing Gadfly is having to reload it and draw one plot over and over. So if I tweak something, it takes like 40 seconds to see the result. Since I'm only drawing one plot, I imagine some of that optimization is counter-productive.

@harryprince
Copy link

harryprince commented Mar 11, 2018

Gadfly is too slow, ggplot2 or R basic plot are better than this. In that, I don't call Julia to draw a plot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
packages Package management and loading
Projects
None yet
Development

No branches or pull requests

6 participants