Description
Incremental compilation will soon be available in stable Rust (with version 1.24) but there's still lots of room for improvement. This issue will try to give an overview of specific areas for potential optimization.
Compile Time
The main goal for incremental compilation is to decrease compile times in the common case. Here are some of things that affect it:
Caching Efficiency
The biggest topic for incremental compilation is: How efficiently are we using our caches. It can be further subdivided into the following areas:
Data Structure Stability
Caching efficiency depends on how the data in the cache is represented. If data is likely to change frequently then the cache is likely to be invalidated frequently. One example is source location information: Source location information is likely to change. Add a comment somewhere and the source location of everything below the comment has changed. As a consequence, everything in the cache that contains source location information is likely in need of frequent invalidation. It would be preferable to factor data structures in a way that confines this volatility to only few kinds of cache entries. The following issues track concrete plans to improve the situation here:
- Improve caching efficiency by handling spans in a more robust way. incr.comp.: Improve caching efficiency by handling spans in a more robust way #47389
- Turn translation-related attributes into a query. incr.comp.: Turn translation-related attributes into a query. #47320
Object File Granularity and Partitioning
The problem of object file granularity can be viewed as a variation of "Data Structure Stability" but it is so important for compile times that I want to list it separately. At the moment the compiler will put machine code that comes from items in the same source-level module into the same object file. This means that changing one function in a module will lead to all functions in that module to be re-compiled. Obviously this can lead to a large amount of redundant work. For full accuracy we could in theory have one object file per function - which indeed improves re-compile times a lot in many cases - but that would increase disk space requirements to an unacceptable level and makes compilation sessions with few cache hits much more expensive (TODO: insert numbers). And as long as we don't do incremental linking it might also make linking a bottleneck.
The main goal here is to re-compile as little unchanged code as possible while keeping overhead small. This is a hard problem and some approaches are:
- Keep using a fixed partitioning scheme but improve the partitioning algorithm
- Implement an adaptive scheme that reacts to set of changes a user makes
- It might even be a good idea to write a simulator that allows to test different schemes and then feed it with actual data generated by an instrumented compiler.
Adaptive schemes would require us to re-think part of our testing and benchmarking infrastructure and writing a simulator is a big project of its own, so short term we should look into improved static partitioning schemes:
- Take type parameters into account when assigning generic instances to CGUs. (TODO)
- Do per-MonoItem dependency tracking in order to collect data about granularity fallout. (incr.comp.: Do per-MonoItem dependency tracking in order to collect data about granularity fallout. #48211)
Another avenue for improvement of how we handle object files:
- Allow for re-using object files that contain unused code. (incr.comp.: Allow for re-using object files that contain unused code. #48212)
Whole-Cache Invalidation
Currently commandline arguments are tracked at a very coarse level: Any change to a commandline argument will completely invalidate the cache. The red-green tracking system could take care of making this finer grained but quite a lot of refactoring would need to happen in order to make sure that commandline arguments are only accessible via queries.
Note that this currently completely prevents sharing a cache between cargo check
and cargo build
.
Avoid Redundant Work
The compiler is doing redundant work at the moment. Reducing it will also positively affect incremental compilation:
- Linking is not done incrementally although at least Gold and MSVC would support it. (TODO)
- Instances of generic functions are duplicated for every crate that uses them. Experiment with sharing monomorphized code between crates #47317
- Closures are unnecessarily duplicated for generic instances. Instantiate fewer copies of a closure inside a generic function #46477
- Add support for split-debuginfo on platforms that allow it. debuginfo: Add support for split-debuginfo on platforms that allow it #34651
Querify More Logic, Cache More Queries
We can only cache things that are represented as queries, and we can only profit from caching for things that are (transitively) cached. There are some obvious candidates for querification:
- Cache the specialization_graph query. incr.comp.: Cache the specialization_graph query. #48987
- Cache type_of and some other queries. incr.comp.: Cache type_of and some other queries. #47455
- Turn translation-related attributes into a query. incr.comp.: Turn translation-related attributes into a query. #47320
- Querify WF-checking so it can be cached. incr.comp.: Querify WF-checking so it can be cached #46753
- Cache check_match and use ensure() for coherence-related queries. incr.comp.: Cache check_match and use ensure() for coherence-related queries. #46881
- Enable query result caching for many more queries. incr.comp.: Enable query result caching for many more queries #46556
A more ambitious goal would be to querify name resolution and macro expansion.
Framework Overhead
Doing dependency tracking will invariably introduce some overhead. We should strive to keep this overhead low. The main areas of overhead are:
- Building the dependency graph during compiler execution
- Computing query result hashes and dependency node identifiers
- Loading and storing the dependency graph from/to disk
Another thing that falls into this category is:
- Efficiency of loading and storing cache entries
The following issues track individual improvements:
- Load query result cache in background or use memory mapped file. (TODO)
- Investigate sharing more data inside result cache (e.g. ty::Slice). (TODO)
- Reduce amount of cache sanity checking for non-debug_assertions compilers. (TODO)
- Investigate making the result cache updateable in-place. incr.comp.: Investigate making the result cache updateable in-place. #48231
- Use a struct-of-arrays instead of an array-of-structs for SerializedDepGraph. incr.comp.: Use a struct-of-arrays instead of an array-of-structs for SerializedDepGraph #47326
- Speed up result hashing and DepNode computation by caching certain stable hashes. incr.comp.: Speed up result hashing and DepNode computation by caching certain stable hashes #47294
- Optimize DepGraph::try_mark_green(). incr.comp.: Optimize DepGraph::try_mark_green() #47293
- Speed up leb128 encoding and decoding for unsigned values. Speed up leb128 encoding and decoding for unsigned values. #46919
- Use an array instead of a hashmap for storing result hashes. incr.comp.: Use an array instead of a hashmap for storing result hashes. #46842
- Precompute small hash for filenames to save some work. incr.comp.: Precompute small hash for filenames to save some work. #46839
- Speed up span hashing by caching expansion context hashes. incr.comp.: Speed up span hashing by caching expansion context hashes. #46562
- Load dep-graph in the background. incr.comp.: Load dep-graph in the background #46555
- Don't encode Fingerprint values with leb128. incr.comp.: Don't encode Fingerprint values with leb128. #45875
- Explore delayed read-edge deduplication or getting rid of it entirely. incr.comp.: Explore delayed read-edge deduplication or getting rid of it entirely. #45873
- Maybe optimize case of dependency-less anonymous queries. incr.comp.: Maybe optimize case of dependency-less anonymous queries. #45408
- Implement "eval-always" queries. incr.comp.: Implement "eval-always" queries #45238
We should continuously profile common use cases to find out where we can further reduce framework overhead.
Disk Space Requirements
Build directories of large C/C++ and Rust code bases can be massive, oftentimes many gigabytes. Since incremental compilation has to keep around the previous version of each object file for re-use, plus LLVM IR, plus the dependency graph and query result cache, build directories can up to triple in size when incremental compilation is turned on (depending on which crates are compiled incrementally). The best way to reduce cache size is to reduce the amount of translated code that we need to cache. Solving #47317 and #46477 would help here. MIR-only RLIBs (#38913), which are one way to solve #47317, might also obviate the need to cache LLVM IR at all.
Runtime Performance of Incrementally Compiled Code
Currently delivering good runtime performance for incrementally compiled code is only a side goal. If incrementally compiled code is "fast enough" we rather try to improve compile times. However, since ThinLTO also supports an incremental mode, we could provide a middle ground between "re-compile everything for good performance" and "re-compile incrementally and only get 50% performance".
- Make ThinLTO compatible with incremental compilation. (Enable ThinLTO with incremental compilation. #53673)
If you have any further ideas or spot anything that I've missed, let me know in the comments below!