-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
automatic recompilation of stale cache files #12458
Conversation
Looks great. Could we restrict the verbose info output to interactive sessions only? |
@jakebolewski, sure I guess that seems reasonable. However, I think that may end up suppressing some of the ...Okay, what I can do is to have |
So glad this looks to be ready for 0.4. Thank you, @stevengj |
This is game changing! |
Yes, this really puts the icing on the cake. 0.4 is awesome. |
AppVeyor got confused (assigned two jobs the same number). AFAICT the build can not be restarted from here, so you'll have to force-push a new SHA1 hash if you want to trigger a new build. (also Travis is having issues with their Mac cluster, so those are backed up right now too) |
@vtjnash, I already pushed a couple of patches since the initial AppVeyor failure, and the same failure happened again. |
I had to manually increment the AppVeyor "next build" number to un-stick it. Builds should be working again now, however. |
We have all green now! |
LGTM |
I'll merge by the end of the day if there are no objections. |
// are include depenencies | ||
void jl_serialize_dependency_list(ios_t *s) | ||
{ | ||
size_t total_size = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation again
(rebased to squash commits, merge with #12448, fix indentation) |
automatic recompilation of stale cache files
Hooray!! |
used via `include`. It has no effect outside of compilation. | ||
``` | ||
""" | ||
include_dependency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Newly added functions do not appear to get correctly populated by doc/genstdlib.jl
. I think you need to add an opening signature for this somewhere in the RST docs for it to work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand the new doc system for the manual. @one-more-minute, can you clarify what we are supposed to do? I'd really rather just put the documentation inline in loading.jl
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting the docs inline should work fine. As with the rest of the docstrings, genstdlib.jl
will look for .. function:: include_dependency(...)
and splice the doc string it finds there, so if that doesn't exist already it needs to be added (or there wouldn't be any way to know where the doc string should go).
When I have more free time I'm going to put up a bunch of examples of how to work with this, which will hopefully make it clearer.
I'm already seeing timestamps cause issues here, editing packages and not seeing the corresponding |
I have to wonder if there's some hybrid hashing/timestamp solution to this. |
What would that look like? Check time stamp first, then check hash if equal? Doesn't seem like that would be all that beneficial over checking just the hashes. Maybe opt in to "I work across multiple machines therefore need this to be more careful" somehow? Is there some way to make this user-configurable? I'm trying to override the |
end | ||
modules, files = cache_dependencies(io) | ||
for f in files | ||
if mtime(f) > cachefile_mtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead of checking whether mtime(f) > cachefile_mtime
this should be mtime(f) != cachefile_mtime
to assure that the cache is in sync with the file content. Otherwise if you change machines and one of your clocks is skewed a file change might happen "in the past".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache_dependencies
returns all of the included files, and compiling a .ji
doesn't modify the mtime
for each of those included files, so I suspect this would result in recompiling every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. Somehow I took it that we store all those mtimes and if one of them changes we should recompile. We are clearly not doing that, sorry for the noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With #12559, now we do store all the mtimes.
"env.Decider('MD5-timestamp'): as of SCons 0.98, you can set the Decider function on an environment. MD5-timestamp says if the timestamp matches, don't bother re-MD5ing the file. This can give huge speedups. See the man page for info." (quoting https://bitbucket.org/scons/scons/wiki/GoFastButton) |
The discussion in #12259 seemed to indicate that a checksum would not be too expensive. If you want one, the steps would be:
I would check the timestamp before checking the checksum, mainly because it gives a little extra protection against the unlikely event of checksum collisions yielding false negatives. (I'm assuming we would use a checksum optimized for speed here, like CRC32, rather than a cryptographically secure checksum.) As @pao says, we can also store the timestamp in the file and check whether it matches, not just whether it is |
We could store the |
CRC32 is great for bit error detection, but I'm not sure about possibly large intentional changes (this site notes that On a reasonably large scons-built project, I've never run into an MD5 hash collision causing a miscompile. |
Instead of adding a hashing algorithm to base we could also use https://libgit2.github.com/libgit2/#HEAD/group/odb/git_odb_hashfile |
Remember, this is going to be done every time a module is loaded.
|
The need for a hash is now significantly lower, I don't think it's immediately pressing. I'm having trouble getting |
Does git maybe hash the file including some implied header or something? |
@tkelman A CRC-32 in C is very fast, and not at all much code (Intel even has an assembly instruction for a CRC-32 step). It would be easy to add it to the julia core, and not link against anything else. |
We also already have murmurhash easily available from either C or Julia, worth benchmarking and thinking about hash size/collision likelihood tradeoffs there. Unless I find a way to hit any more trouble I don't think we need to worry about it for now. Will let you know if my programmatically generated |
What are you doing with programmatically generating Julia code? Sounds interesting. |
Murmurhash3 would be good too. I hope Julia has more success than the Brain in taking over the world! |
This fixes #12259 by automating the recompilation of stale cache files whenever you
require
a module (e.g. byusing
) and a stale cache file is found. Currently, staleness is judged by timestamps; other mechanisms like checksums are left to future PRs.For example, here is a typical session after I had done
Base.compile(:PyPlot)
and then updated a file in Compat:This patch supersedes #12445.
Like #12445, it adds a list of the dependency files to the
.ji
file, with a newinclude_dependency(path)
function to manually supply non-include
dependencies for a module.(Also as discussed in #12445, I changed the serialization metadata to use bigendian rather than littleendian storage, for ease of reading back in via
ntoh
.)Once #12448 lands, it will be easy to add other information about the dependencies (e.g. checksums) if that is judged necessary in the future, but I think it is better to develop that incrementally. (Merging a hash/checksum algorithm is a nontrivial task that is best left to a separate PR.)
Note that you still have to manually
Base.compile
a module at least once—this PR only automates the recompilation of the image. A separate PR can implement a@cacheable
tag (or whatever) to allow modules to opt-in for automatic initial compilation.