Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster incremental sysimg rebuilds #40414

Closed
wants to merge 2 commits into from
Closed

Faster incremental sysimg rebuilds #40414

wants to merge 2 commits into from

Conversation

Keno
Copy link
Member

@Keno Keno commented Apr 9, 2021

Faster incremental sysimg rebuilds

Recent improvements in precompilation have improved compile
time issues like ttfp quite significantly. However, it is
still significantly faster to just build a system image,
in which case ttfp is basically instant. The difference is
primarily due to us not being able to store native code in
.ji files as well as invalidations of previously loaded code
requiring recompilation. In the long term these issues can
be overcome, but in the short term, I think we should try
to leverage system images more heavily, since they already
basically solve the problem. I believe the reason people
aren't really using system images is three-fold

  1. System images take a long time to compile
  2. The system image workflow is pretty manual
  3. System images don't play well with pkg updates etc.

Thus, my evil plan to improve the situation is

  1. Make system images build faster
  2. Add an autoload annotation to Project.toml files.
    If present, julia will hash the manifest and look
    for any matching system image in ~/.julia/sysimages.
  3. Make system images build even faster

The idea is that for the standard workflow where people
just use plain julia with the default environment or
julia --project system images would be just loaded
automatically, thus reducing the barrier to entry.

In the initial version, there is no automatic rebuild
of these system images - they would still be built manually
with PkgCompiler, but at least the loading side would
be automatic and hopefully the build will be fast enough
that people will actually be willing to wait.
Eventually the rebuild could also be automatic
(maybe even in the background).

The major drawback of this plan is that system images will
start with all packages already loaded (even if their
bindings aren't present in Main). This will require some
workflow adjustments. I think it'll probably turn out fine,
but it's worth highlighting.

This PR is step 1 in this direction. It provides the ability
to rebuild system images much faster. The key observation
is that most of the time in sysimage build is spent in LLVM
generating native code (serializing julia's data structures
is quite fast). Thus if we can re-use the code already
generated for the system image we're currently running, we'll
save a fair amount of time.

Unfortunately, this is not 100% straightforward since we were
assuming that no linking happens in a number of places. This
PR hacks around that, but it is not a particularly satisfying
long term solution. That said, it should work fine, and I think
it's worth doing, so that we can explore the workflow
adjustments that would rely on this.

With that said, here's how to use this (at the low level, of
course PkgCompiler would just handle this)

$ mkdir chained
$ time ./usr/bin/julia --sysimage-native-code=chained --sysimage=usr/lib/julia/sys.so --output-o chained/chained.o.a -e 'Base.__init_build();'
real	0m9.633s
user	0m8.613s
sys	0m1.020s
$ cd chained
$ cp ../usr/lib/julia/sys-o.a . # Get the -o.a from the old sysimage
$ ar x sys-o.a # Extract it into text.o and data.o
$ rm data.o # rm the serialized sysimg data
$ mv text.o text-old.o
$ llvm-objcopy --remove-section .data.jl.sysimg_link text-old.o # rm the link between the native code and the old sysimg data
$ ar x chained.o.a # Extract new sysimage files
$ gcc -shared -o chained.so text.o data.o text-old.o # Link everything
$ ../julia --sysimage=chained.so

As can be seen, regenerating the system image took about 9s (the
subsequent commands aren't timed here, but take less than a second total).
This compares very favorably with a non-chained sysimg rebuild:

time ./usr/bin/julia --sysimage=usr/lib/julia/sys.so --output-o nonchained.o.a -e 'Base.__init_build();'

real	2m42.667s
user	2m39.211s
sys	0m3.452s

Of course if you do load additional packages, the extra code
does still need to be compiled, so e.g. building a system image
for Plots goes from 3 mins to 1 mins (building all of plots,
plus everything in base that got invalidated). That is still all in
LLVM though - it should be relatively straightforward to
multithread that after this PR (since linking the sysimg
in multiple pieces is allowed). That part is not implemented
yet though.

@Keno Keno force-pushed the kf/fastsysimg branch 3 times, most recently from c2d1343 to 86c959e Compare April 10, 2021 01:00
@NHDaly
Copy link
Member

NHDaly commented Apr 12, 2021

Amazing! Thank you for tackling this, @Keno! It sounds very exciting!

In case you haven't seen this already, regarding speeding up PackageCompiler, I've asked in the past about why PackageCompiler currently ends up compiling everything to native code twice, where we round-trip through a text file to record the compilations: JuliaLang/PackageCompiler.jl#486. Just wanted to float this past your vision in case you hadn't seen it. Kristoffer provided good explanation, but it seems like something that could be improved with a bit of design work.

@Keno
Copy link
Member Author

Keno commented Apr 12, 2021

This can basically address that as well. You can build the whole sysimg without precompiles, then dump out your precompiles with that and then use this mechanism to build a chained sysimg as before much faster.

@Keno
Copy link
Member Author

Keno commented Apr 13, 2021

@timholy one application that comes to mind is speeding up development of Base itself. Could we have a mode of Revise where it serializes what needs to be Revised in some easy to load format, and then use this to quickly update an existing system image (without Revise itself showing up in the system image)?

@kshyatt kshyatt added building Build system, or building Julia or its dependencies performance Must go faster labels Apr 13, 2021
Keno added a commit that referenced this pull request Apr 13, 2021
This is the second part of the plan described in #40414
(though complimentary to the PR itself). In particular, this
PR makes it possible to quickly replace a system image during
initial startup. This is done by adding a hook early in the
startup sequence (after the system image, but before any
dependent libraries are initialized) for Julia to look at
the specified project file and decide to load a different
sysimage instead.

In the current version of the PR, this works as following:
 - If the `--autoload` argument is specified, julia will hash
   the contents of the currently active project's manifest path.
 - If a corresponding .so is found in `~/.julia/sysimages`, it will
   load that sysimage instead.
 - If not, loading will proceed as usual, a warning is generated
   but before any user code is run, Julia will `require` any
   dependencies specified in the Project.toml.

The third point is there such that independent of whether or not
the system image is found, the environment upon transfer of
control to the user is always the same (e.g. a package may
have type-pirated a method, which is available independent
of whether the user ever explicitly did `using`).

This is highly incomplete. In particular, these scheme to find
the system image needs to take account of preferences and should
probably exlcude any packages that are `dev`'ed (or their
dependents). I'm not sure I'll have the time to get around to
finishing this, but I'm hoping somebody else would be willing
to jump in for that part. The underlying mechanism seems to work
fine at this point, so this work should be mostly confined to
loading.jl.
The multiversioning pass currently does two things:
- Clone all functions and create a set of tables to tell the sysimage
  loader where to find the various cloned functions.
- Compress the table of pointers by going from 64 bit pointers to 32 bit
  offsets from the first function of the .text section.

The second optimization is useful, because it cuts down on space
and speed up dynamic loading. Unfortunately relocations of this kind
are not expressible in all object formats and as a result this
scheme does not work if the table needs to describe function pointers
in multiple compilation units.

I'm working on improving the performance of incremental system image
rebuilds which would rely on being able to re-link such system images
and is thus incomptabile with this compression.

There are possible ways, to make it compatible, namely:
- Add a relocation to all the relevant file formats that expresses
  offsets from the start of the section, or,
- Change the multiversion table to be pcrel rather than relative to
  the first function in the table.

The first would require some signifcant coordination with standards
bodies, and both are currently not supported in LLVM.

To make progress on this issue, simply make the multiversion pass
optional and keep the table uncompressed in this case. This wastes
some space and adds a few fractions of a second to the system image
load time, but it should let us proceed on the incremental sysimage
project. If it works well, we can go back and consider the future
of the multiversioning tables.
This commit provides the ability
to rebuild system images much faster. The key observation
is that most of the time in sysimage build is spent in LLVM
generating native code (serializing julia's data structures
is quite fast). Thus if we can re-use the code already
generated for the system image we're currently running, we'll
save a fair amount of time.

Unfortunately, this is not 100% straightforward since we were
assuming that no linking happens in a number of places. This
PR hacks around that, but it is not a particularly satisfying
long term solution. That said, it should work fine, and I think
it's worth doing, so that we can explore the workflow
adjustments that would rely on this.

With that said, here's how to use this (at the low level, of
course PkgCompiler would just handle this)

```shell
$ mkdir chained
$ time ./usr/bin/julia --sysimage-native-code=chained --sysimage=usr/lib/julia/sys.so --output-o chained/chained.o.a -e 'Base.__init_build();'
real	0m9.633s
user	0m8.613s
sys	0m1.020s
$ cp ../usr/lib/julia/sys-o.a . # Get the -o.a from the old sysimage
$ ar x sys-o.a # Extract it into text.o and data.o
$ rm data.o # rm the serialized sysimg data
$ mv text.o text-old.o
$ llvm-objcopy --remove-section .data.jl.unique text-old.o # rm the link between the native code and the old sysimg data
$ ar x chained.o.a # Extract new sysimage files
$ gcc -shared -o chained.so text.o data.o text-old.o # Link everything
$ ../julia --sysimage=chained.so
```

As can be seen, regenerating the system image took about 9s (the
subsequent commands aren't timed here, but take less than a second total).
This compares very favorably with a non-chained sysimg rebuild:

```
time ./usr/bin/julia --sysimage=usr/lib/julia/sys.so --output-o nonchained.o.a -e 'Base.__init_build();'

real	2m42.667s
user	2m39.211s
sys	0m3.452s
```

Of course if you do load additional packages, the extra code
does still need to be compiled, so e.g. building a system image
for `Plots` goes from 3 mins to 1 mins (building all of plots,
plus everything in base that got invalidated). That is still all in
LLVM though - it should be relatively straightforward to
multithread that after this PR (since linking the sysimg
in multiple pieces is allowed). That part is not implemented
yet though.
@timholy
Copy link
Member

timholy commented Aug 22, 2021

Just noticed #40414 (comment). Sure, that would be pretty easy to do in principle. What exactly would it look like?

File an issue at Revise when you want this; my impression is that we're not yet at a place where this will make a difference.

@timholy
Copy link
Member

timholy commented Aug 22, 2021

I have to say, this triggers my love/hate relationship with https://github.com/JuliaLang/PackageCompiler.jl. I totally get why it's necessary to have it, but at the same time its existence is probably what's let us get away for so long without just implementing native-code caching in package .ji files. I'd rather just fix that. Are we really so far from that goal? It just doesn't seem like it should be all that insurmountable. I'm on a bit of a close-the-precompile-issues rampage right now. There really aren't that many issues per se, but we'll still need some things (the method.roots problem #32705, and the issue of whether to store non-internal CodeInstances) that are pretty big.

@jpsamaroo
Copy link
Member

Alternatively, could we put the native code into shared libraries, and load them when we load .ji files? It could improve the situation of calling Julia from C, since we'd have an obvious place to emit ccall-able methods, and they could potentially remain available without the runtime being started (if they don't depend on the runtime).

@timholy
Copy link
Member

timholy commented Aug 23, 2021

I'm not really sure of the right implementation, mostly because I've never actually looked at the format of a shared library file. But that seems pretty sensible. Once we can cache external MethodInstances & CodeInstances (in our current no-native-code format), AFAICT the main remaining job is doing the work of the linker. If we can rely on external tools, that seems likely to be a win.

@ViralBShah
Copy link
Member

@Keno is this the PR you said can be rebased and brought back?

@giordano giordano deleted the kf/fastsysimg branch August 8, 2024 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
building Build system, or building Julia or its dependencies performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants