-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
static compile part 3 (modules) #8656
Conversation
…e also methods, and minor corrections
…eserializing (see #8652 for cause). note this causes more nuisance ambiguity warnings
I can't believe nobody's commented on this yet. I am so so excited about this development. After Julia core startup time got so much better it really emphasized how long module load time was. I often avoid restarting my Julia session just so I won't need to reload modules, so this is going to be a huge workflow improvement. Thanks! |
This is awesome! It will dramatically improve the JuliaBox experience too. |
Awesome, I did not know that the development of static compilation of packages where this far. This is going to be a huge selling point for 0.4. |
Is it possible to have this backported to 0.3? It's crazy, but thought I'd ask anyways. |
It's not crazy at all. This was actually developed some months ago in a previous, rejected (and admittedly much less interesting) pull request and I just rebased it recently and then made it a bit more powerful (and thus much more useful). I wouldn't be surprised if it backports cleanly. but that would require getting this merged soon in 0.4 so that it can start seeing some testing (and to get me to start coding the user interface parts) |
I've been occupied by other things and hadn't even noticed this. Really fantastic---certainly one of the most important developments around. I don't really have the expertise to review this, unfortunately, but hopefully we can get this merged soon. |
So what do you need for this? Testing? |
unrelated, but perhaps worth noting that if #8008 is merged first, it will probably not be possible to backport this. since that will force me to simplify this PR, thus making it incompatible. also, it should perhaps be noted here that, at least initially, this will add a strong dependence on the exact sys.ji image, so that any recompilation will force recompilation of all dependencies. thus it is not useful for people developing code in base – although that also means it will be advantageous to think about actually implementing #5155. this would not have been practical a year ago, but with the rapid growth in packages and julia release versions, this is now a sufficient condition to be extremely useful. @StefanKarpinski testing isn't a bad thing, although since the interface is pretty raw right now, most of the preconditions are tested later by assertions instead making this a little harder to use. I need Jeff's approval for this to actually merge it and start on the next part. The final user interface will look pretty much nothing like the current interface. If anyone wants to try it, the test interface looks something like the following: using FixedPointNumbers
ccall(:jl_save_new_module, Any, (Ptr{Uint8},), "FixedPointNumbers_cache.jlc", FixedPointNumbers)
ccall(:jl_restore_new_module, Any, (Ptr{Uint8},), "FixedPointNumbers_cache.jlc")
FixedPointNumbers.__init__() In the final version, the interface will look something like the following: @inine module FixedPointNumbers # declares that this module may be cached
import OtherModulesThatAlsoDeclareStaticCompile #declares dependency on OtherM...
include("other_file_to_include.jl") # declares dependency on other_fi...
end note: I don't know what to call the macro, but it will behave much like the existing inline macro (adding some metadata to the module Expr), so I've temporarily borrowed the name, to be refined in the PR that actually implements some of that content. $ ./julia --build $JULIA_HOME/../lib/julia/FixedPointNumbers \
-J sys.ji ~/.julia/v0.4/FixedPointNumbers/src/FixedPointNumbers.jl julia> using FixedPointNumbers
# looks for FixedPointNumbers in path
# success -> looks in cache for something that matches
# success -> verify preconditions
# failure
# load FixedPointNumbers.jl using above command line, then start again open question: what preconditions should be validated on the path before loading? weakest is to just always use the cache when available (e.g. *.jlc becomes a valid file extension that Julia will prefer over *.jl when it is found first in the path). this is nice because it add no requirements to the filesystem. it also interacts somewhat nicely with possible future fully static compilation options, in that julia could emit objects with a *.so filename, and seamlessly patch itself together via dlload callbacks. on the other end of the range of possibilities, it can easily record something about the files that it loaded (timestamp, hash, content) and then decide whether to use the *.jlc file or reject it. but i've left this question for the very bottom because I don't want it to impact this PR. This open questions has no impact on this PR, and is precisely why I want to separate this into multiple PRs. I suspect I will implement option A, then wrap it in option B as the default, but allowing the user to force the usage of option A where desired. |
Since currently we recompile every package each time we load it, having to recompile a package when the julia build has changed is certainly not a substantial barrier. Obviously you don't want to rebuild all packages after finishing a julia build, because that would enormously increase the time needed to build core julia. Just rebuild packages on-demand (related to your last question). One question: this is module-by-module, not file-by-file? So if I'm working on a big package with ~20 files, a single change to any of them forces a recompile of the entire module? Presuming the answer is "yes, and it would be really hard to implement file-by-file," my suspicion is that developers should be able to split large projects into multiple modules and achieve gains that way. So again no major barrier, I'm just seeking clarification. Regarding your question at the end about loading the precompiled modules (which I agree is a separate issue): I think we basically have to implement the more complex version. Otherwise people developing packages will be perenially forgetting to delete the old |
yes, julia does scope by the module, not by the file. although, I could patch up vtjnash/Speed.jl to accelerate the line-by-line cache, that only helps if you are editing some module at the end. I suspect doing something with obviously, yes, you can't rebuild all packages after a rebuild, since you may not even know where they are located. one of my next steps is to handle that. |
Between #4600 and this, we could maybe just begin to encourage using more submodules when structuring big modules. I'm not sure if that would cut down on this compilation time, but it might. |
unfortunately, while the serializer work (this PR) generalizes quite well to handling arbitrary submodules, the preconditions I'm am planning using for the next PR don't generalize so easily. that means that it will be relatively trivial to add support for conditional submodule caching, but hard to be more general for embedded submodules. although, i'm not discouraging #4600 |
Seeing no objections to this, I'll merge and start working on the next part |
Glad to hear it! |
Sweet! What can I do with this? |
Do you know how long it takes to generate the cache file for GTK and it's dependencies? Curious about how long |
It would be nice to set a flag to a package so it doesn't precompile. |
This PR is missing a description of what the change actually does. All I see is that it prepares us for more stuff in the future, and that it doesn't have an api yet, etc. Ok, but what does it do? |
emitting the cache file is pretty negligible in time cost. i didn't measure it directly, however, other than observing that the command
primarily it makes the "mode" of the serializer more explicit: then it uses that mode to enable the creation of another "mode" that is essentially equivalent to MODE_AST ( another benefit is that the serializer is now (nearly) reentrant. however, it would require allocating the global state on the stack ( Line 18 in e3a74ee
this is likely to be opt-in, at least at first |
@@ -260,6 +270,16 @@ static void jl_update_all_fptrs() | |||
delayed_fptrs = NULL; | |||
} | |||
|
|||
static int is_submodule(jl_module_t *parent, jl_module_t *child) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have used the other argument order --- is child
a submodule of parent
?
This change primarily seems to introduce two functions, |
@@ -135,29 +135,29 @@ void parse_opts(int *argcp, char ***argvp) | |||
case 'h': | |||
printf("%s%s", usage, opts); | |||
exit(0); | |||
case 'c': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file is just indentation (whitespace) fixes
I described how to use them in an earlier comment (#8656 (comment)). However, once the "official" interface is merged, these functions will no longer be DLLEXPORT – that is just for convenience, to make it possible to perform incremental testing at the REPL |
see bd205a0 for added comments. I'm not sure that I can meaningfully add them here |
I suspect this misses the main point of the question. The time to generate the cache file won't be dominated by serialization and I/O, it will be dominated by parsing and lowering all the My presumption is that the cache will be regenerated only when the user says So in practice I bet users will eventually learn to become annoyed 😄 by slow responses the first time after a |
@vtjnash, what are the odds of using this feature to shorten build times of julia itself? In particular, if someone is working on files that load after |
perhaps: you could either try to set this up to save a |
Shrinking Base would be good anyway. |
I'm a bit worried that this may be trying to be too clever and is going to cause a lot of confusion and brittleness when used. Of course, I don't really understand how this is expected to work since there's been no explanation of that provided, just vague indications that code will be compiled and cached. I suspect that giving the user explicit control will be a bit easier to understand and use: a |
the user can't know what files and state some random library three levels deep might need to restore. this transfers the burden for controlling what code can be cached from the user to the library authors. libraries would need to have a flag to their code that says "Yes, I'll restore any necessary external state in my your proposal is also also workable, and roughly equivalent to adding code to the of the two options, i think it is much better for the library authors to be able to control this action than to expect the users to be able to make this decision
yep. it's just an even bigger question then of what moves out, and how. i think we may even want to split base into it's own repository at some point, so that even for binary distributions, we can provide a fully modifiable environment with all the change-tracking and github PR excellence. |
Could you please write up an email or something where you explain what the strategy here actually is? |
I don't know what the final strategy will look like in terms of user interactions. ideally, it would be completely transparent to the user and easy for a library author to enable. This is just the framework for experimenting with various proposals. hence also why I wanted to split the technical content here from the PR implementing the documented interface(s). the primary open question is how we should tie modules to files, since currently we don't. |
At this stage the UI is not the issue yet. The issue is the design of the underlying mechanism. For example, an important tidbit I've gleaned so far is that generic functions referenced at the top level of a module will be copied. This implies we are introducing a new operation of "separating" a module that has semantic implications. I know this might not be the final form, but these are exactly the issues we should be discussing. |
I don't think that occurs very often. However, I could add a |
The point is that we need to elucidate and think about all such behaviors --- what other things like that are in here? Do we want to define a notion of what objects are "owned" my a module, and those get serialized and everything else is saved as references? Maybe every generic function should be officially owned by one module or another, i.e. add a |
How does this work with caching the output of staged functions? The ArrayView / Generator changes will make them pretty pervasive. Having a module own a generic function seems like something we should consider. Perhaps it is useful in other in other areas (re-compiling dependent functions)? It seems like the only way to restrict the extensibility of methods in the future (and implement something like Dylan's |
It's worth keeping in mind how much overlap there is between this and distributed computing. Many of the same issues of ownership and serialization come up in both.
|
Along those lines, it would also be great if we could modularize this serializer a bit and reuse much of the same code to tackle some of the performance issues raised in #7893. |
This prepares the serializer to be able to handle incremental loading (aka module caching). Currently there is no interface to it exposed to the user (other than the raw interface I demonstrate below for testing). Static compile part 4 to add the user-friendly interface will be developed shortly, but I wanted to go ahead and get this merged first since that is a separate task. Part 4 is expected to be easy technically but has more UI questions to answer.
note: this would have been fully compatible with the current serializer, except that I changed the deser_tag hash-table (with indexes from 0 to 255) into an array.