Description
We have found several times that rustc's performance is basically a giant benchmark of our hashmaps. @orlp notes:
Resizing the hash table during building basically means all the work you did up until that point was to calculate a single boolean value: your capacity wasn't enough, and nothing else of value was done .
If we can avoid resizing our hashmaps, that will avoid quite a lot of needless hashing. Given that so much of our benchmarks are made up of hashing, that could have a significant overall performance improvement. @orlp wrote this helpful benchmark giving an idea of the order of magnitude improvement we're talking about:
fn main() {
let start = std::time::Instant::now();
for _ in 0..100 {
let mut hs = HashSet::new();
for x in 0..1_000_000 {
hs.insert(x);
}
std::hint::black_box(hs);
}
println!("no capacity: {:?}", start.elapsed());
let start = std::time::Instant::now();
for _ in 0..100 {
let mut hs = HashSet::with_capacity(1_000_000);
for x in 0..1_000_000 {
hs.insert(x);
}
std::hint::black_box(hs);
}
println!("perfect capacity: {:?}", start.elapsed());
let start = std::time::Instant::now();
for _ in 0..100 {
let mut hs = HashSet::with_capacity(1_000_000);
for x in 0..hs.capacity() + 1 {
hs.insert(x);
}
std::hint::black_box(hs);
}
println!("worst-case underestimation: {:?}", start.elapsed());
let start = std::time::Instant::now();
for _ in 0..100 {
let hasher = FixedState::with_seed(42);
let mut sketch = CardinalitySketch::new();
for x in 0..1_000_000 {
sketch.insert(hasher.hash_one(x));
}
let mut hs = HashSet::with_capacity(sketch.estimate() * 5 / 4);
for x in 0..1_000_000 {
hs.insert(x);
}
std::hint::black_box(hs);
}
println!("cardinality estimate: {:?}", start.elapsed());
}
no capacity: 1.582721584s
perfect capacity: 337.567208ms
worst-case underestimation: 2.038280209s
cardinality estimate: 412.6665ms
There are two main parts to this issue.
- Determine whether making changes here will be an improvement (I feel fairly confident about this, but it's nice to have data). This could be done by either:
- Looking at a performance sample to see which functions are hot, and seeing how much time resizing functions are taking up (i think the relevant function is
hashbrown::raw::RawTableInner::resize_inner
). This will also let us know which maps are most in need of pre-allocating (which will be helpful in part 2). - Using a very liberal estimate of how much capacity we need (say, 10x the number of items in the crate for each hashmap) and doing a perf run on that to see if we get any improvements.
- Looking at a performance sample to see which functions are hot, and seeing how much time resizing functions are taking up (i think the relevant function is
- Determining an upper bound for each hashmap so we don't allocate more memory than we need. I expect this to be the hard part.
(and then, of course, actually switching to using with_capacity
)
@rustbot label T-compiler I-slow E-medium E-mentor