Description
At the moment the compiler will always eagerly deduplicate read-edges in CurrentDepGraph::read_index
. In order to be able to do this the compiler has to allocate a HashSet
for each task. My suspicion is that there is not much duplication to begin with and that there's potential for optimization here:
- If there is not much duplication to begin with we could skip deduplication entirely for non-anonymous nodes. There should be no semantic difference and a slight performance hit only for actually duplicated nodes.
- If deduplication seems like a good idea, doing so during serialization instead of immediately has a few potential benefits:
- Dep-graph serialization happens while LLVM is running in parallel, so more often than not any work done there might not add to the actual compile-time.
- When compilation exits early with an error, we don't save the dep-graph at all and the work for deduplication has been wasted. If compilation already reached dep-graph serialization, the chances are high that it will succeed.
- When we already have the whole dep-graph we can implement deduplication more efficiently because we know when we don't even have to allocate a hash set at all or, if we have to, we know which capacity will suffice.
So the first step would be to collect some data on how much read-edge duplication there even is. This is most easily done by modifying the compiler to count duplicates (in librustc/dep_graph/graph.rs
) and print the number/percentage in -Zincremental-info
. This modified compiler can then be used to compile a number of crates to get an idea what's going on.
If duplication is low (e.g. less than 2% of reads got filtered out), then we could just remove de-duplication and test the effect with a try-build. Otherwise, we can move deduplication to DepGraph::serialize()
and measure the performance impact of that.