Skip to content

Experiment with using HashMap::with_capacity throughout the compilerΒ #137005

Open
@jyn514

Description

@jyn514

We have found several times that rustc's performance is basically a giant benchmark of our hashmaps. @orlp notes:

Resizing the hash table during building basically means all the work you did up until that point was to calculate a single boolean value: your capacity wasn't enough, and nothing else of value was done .

If we can avoid resizing our hashmaps, that will avoid quite a lot of needless hashing. Given that so much of our benchmarks are made up of hashing, that could have a significant overall performance improvement. @orlp wrote this helpful benchmark giving an idea of the order of magnitude improvement we're talking about:

fn main() {
    let start = std::time::Instant::now();
    for _ in 0..100 {
        let mut hs = HashSet::new();
        for x in 0..1_000_000 {
            hs.insert(x);
        }
        std::hint::black_box(hs);
    }
    println!("no capacity: {:?}", start.elapsed());

    let start = std::time::Instant::now();
    for _ in 0..100 {
        let mut hs = HashSet::with_capacity(1_000_000);
        for x in 0..1_000_000 {
            hs.insert(x);
        }
        std::hint::black_box(hs);
    }
    println!("perfect capacity: {:?}", start.elapsed());

    let start = std::time::Instant::now();
    for _ in 0..100 {
        let mut hs = HashSet::with_capacity(1_000_000);
        for x in 0..hs.capacity() + 1 {
            hs.insert(x);
        }
        std::hint::black_box(hs);
    }
    println!("worst-case underestimation: {:?}", start.elapsed());

    let start = std::time::Instant::now();
    for _ in 0..100 {
        let hasher = FixedState::with_seed(42);
        let mut sketch = CardinalitySketch::new();
        for x in 0..1_000_000 {
            sketch.insert(hasher.hash_one(x));
        }

        let mut hs = HashSet::with_capacity(sketch.estimate() * 5 / 4);
        for x in 0..1_000_000 {
            hs.insert(x);
        }
        std::hint::black_box(hs);
    }
    println!("cardinality estimate: {:?}", start.elapsed());
}
no capacity: 1.582721584s
perfect capacity: 337.567208ms
worst-case underestimation: 2.038280209s
cardinality estimate: 412.6665ms

There are two main parts to this issue.

  1. Determine whether making changes here will be an improvement (I feel fairly confident about this, but it's nice to have data). This could be done by either:
    • Looking at a performance sample to see which functions are hot, and seeing how much time resizing functions are taking up (i think the relevant function is hashbrown::raw::RawTableInner::resize_inner). This will also let us know which maps are most in need of pre-allocating (which will be helpful in part 2).
    • Using a very liberal estimate of how much capacity we need (say, 10x the number of items in the crate for each hashmap) and doing a perf run on that to see if we get any improvements.
  2. Determining an upper bound for each hashmap so we don't allocate more memory than we need. I expect this to be the hard part.

(and then, of course, actually switching to using with_capacity)

@rustbot label T-compiler I-slow E-medium E-mentor

Metadata

Metadata

Assignees

Labels

C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchE-mediumCall for participation: Medium difficulty. Experience needed to fix: Intermediate.E-mentorCall for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion.I-compiletimeIssue: Problems and improvements with respect to compile times.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions