-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking Issue for Incremental Compilation #47660
Comments
Given that the cutoff for Rust 2018 is imminent, is there a summary of the progress on incremental compilation since shipping in 1.24? |
@bstrie Basically all the boxes you see checked in the post above. Note though that the "2018" in the title of the issue does not mean "2018 Edition". It's just a list of things that I thought would be good next steps for incr. comp. to work on this year. |
I would like to start working on that in order to slowly work towards enabling end-to-end queries for the RLS. Since last All-Hands a lot of work was spent on parallelizing the queries, so with improved query infrastructure I imagine it'd be a good time to tackle this now. I realize this needs more design and discussion as to make it actually incremental and more fine-grained but I imagine moving it into whole-crate query and refactoring the session code could be useful. I'd be more than happy to work on this (most notably starting with name resolution) but I'll also appreciate any mentoring I can get, since I didn't work in the compiler much except for the save-analysis generation. @nikomatsakis @Zoxc @petrochenkov do you have any immediate plan or vision how this could be tackled now? |
No plans, I know very little about queries or incremental compilation. |
I am strongly in favor of this, and I agree with you that a "coarsed-grained" approach is a logical first step. I know that @Zoxc (whom you cc'd) was doing some work here too, but I'm not sure exactly what, and I'm sure @eddyb has thoughts. Overall I feel like it would be useful to sketch out an overall plan -- maybe it makes sense to try and have an ad-hoc sync meeting to talk over the complications? Pushing backwards from HIR is going to be the biggest challenge, no doubt. In principle, we could do something like having a |
Sounds like a good idea!
FWIW there's already a PR that exposes the precomputed HIR map via |
Visiting for backlog bonanza. We debated about whether this issue is providing much value as is, versus just closing it (and saying that this is tracked via all the issues with corresponding labels). But for now we won't do that. We'll leave it open, and just update the label to reflect its status. @rustbot label: S-tracking-impl-incomplete. |
Incremental compilation will soon be available in stable Rust (with version 1.24) but there's still lots of room for improvement. This issue will try to give an overview of specific areas for potential optimization.
Compile Time
The main goal for incremental compilation is to decrease compile times in the common case. Here are some of things that affect it:
Caching Efficiency
The biggest topic for incremental compilation is: How efficiently are we using our caches. It can be further subdivided into the following areas:
Data Structure Stability
Caching efficiency depends on how the data in the cache is represented. If data is likely to change frequently then the cache is likely to be invalidated frequently. One example is source location information: Source location information is likely to change. Add a comment somewhere and the source location of everything below the comment has changed. As a consequence, everything in the cache that contains source location information is likely in need of frequent invalidation. It would be preferable to factor data structures in a way that confines this volatility to only few kinds of cache entries. The following issues track concrete plans to improve the situation here:
Object File Granularity and Partitioning
The problem of object file granularity can be viewed as a variation of "Data Structure Stability" but it is so important for compile times that I want to list it separately. At the moment the compiler will put machine code that comes from items in the same source-level module into the same object file. This means that changing one function in a module will lead to all functions in that module to be re-compiled. Obviously this can lead to a large amount of redundant work. For full accuracy we could in theory have one object file per function - which indeed improves re-compile times a lot in many cases - but that would increase disk space requirements to an unacceptable level and makes compilation sessions with few cache hits much more expensive (TODO: insert numbers). And as long as we don't do incremental linking it might also make linking a bottleneck.
The main goal here is to re-compile as little unchanged code as possible while keeping overhead small. This is a hard problem and some approaches are:
Adaptive schemes would require us to re-think part of our testing and benchmarking infrastructure and writing a simulator is a big project of its own, so short term we should look into improved static partitioning schemes:
Another avenue for improvement of how we handle object files:
Whole-Cache Invalidation
Currently commandline arguments are tracked at a very coarse level: Any change to a commandline argument will completely invalidate the cache. The red-green tracking system could take care of making this finer grained but quite a lot of refactoring would need to happen in order to make sure that commandline arguments are only accessible via queries.
Note that this currently completely prevents sharing a cache between
cargo check
andcargo build
.Avoid Redundant Work
The compiler is doing redundant work at the moment. Reducing it will also positively affect incremental compilation:
Querify More Logic, Cache More Queries
We can only cache things that are represented as queries, and we can only profit from caching for things that are (transitively) cached. There are some obvious candidates for querification:
A more ambitious goal would be to querify name resolution and macro expansion.
Framework Overhead
Doing dependency tracking will invariably introduce some overhead. We should strive to keep this overhead low. The main areas of overhead are:
Another thing that falls into this category is:
The following issues track individual improvements:
We should continuously profile common use cases to find out where we can further reduce framework overhead.
Disk Space Requirements
Build directories of large C/C++ and Rust code bases can be massive, oftentimes many gigabytes. Since incremental compilation has to keep around the previous version of each object file for re-use, plus LLVM IR, plus the dependency graph and query result cache, build directories can up to triple in size when incremental compilation is turned on (depending on which crates are compiled incrementally). The best way to reduce cache size is to reduce the amount of translated code that we need to cache. Solving #47317 and #46477 would help here. MIR-only RLIBs (#38913), which are one way to solve #47317, might also obviate the need to cache LLVM IR at all.
Runtime Performance of Incrementally Compiled Code
Currently delivering good runtime performance for incrementally compiled code is only a side goal. If incrementally compiled code is "fast enough" we rather try to improve compile times. However, since ThinLTO also supports an incremental mode, we could provide a middle ground between "re-compile everything for good performance" and "re-compile incrementally and only get 50% performance".
If you have any further ideas or spot anything that I've missed, let me know in the comments below!
The text was updated successfully, but these errors were encountered: