incr.comp.: Explore delayed read-edge deduplication or getting rid of it entirely.

At the moment the compiler will always eagerly deduplicate read-edges in `CurrentDepGraph::read_index`. In order to be able to do this the compiler has to allocate a `HashSet` for each task. My suspicion is that there is not much duplication to begin with and that there's potential for optimization here:
- If there is not much duplication to begin with we could skip deduplication entirely for non-anonymous nodes. There should be no semantic difference and a slight performance hit only for actually duplicated nodes.
- If deduplication seems like a good idea, doing so during serialization instead of immediately has a few potential benefits:
  - Dep-graph serialization happens while LLVM is running in parallel, so more often than not any work done there might not add to the actual compile-time.
  - When compilation exits early with an error, we don't save the dep-graph at all and the work for deduplication has been wasted. If compilation already reached dep-graph serialization, the chances are high that it will succeed.
  - When we already have the whole dep-graph we can implement deduplication more efficiently because we know when we don't even have to allocate a hash set at all or, if we have to, we know which capacity will suffice.

So the first step would be to collect some data on how much read-edge duplication there even is. This is most easily done by modifying the compiler to count duplicates (in `librustc/dep_graph/graph.rs`) and print the number/percentage in `-Zincremental-info`. This modified compiler can then be used to compile a number of crates to get an idea what's going on. 

If duplication is low (e.g. less than 2% of reads got filtered out), then we could just remove de-duplication and test the effect with a try-build. Otherwise, we can move deduplication to `DepGraph::serialize()` and measure the performance impact of that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incr.comp.: Explore delayed read-edge deduplication or getting rid of it entirely. #45873

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

incr.comp.: Explore delayed read-edge deduplication or getting rid of it entirely. #45873

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions