Skip to content

#30983 Ensure capacity returned of HashMap is max(capacity, length). #30991

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 2, 2016

Conversation

rthomas
Copy link
Contributor

@rthomas rthomas commented Jan 18, 2016

Using the test @bluss suggested in #30983

@rust-highfive
Copy link
Contributor

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @aturon (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@bluss
Copy link
Member

bluss commented Jan 18, 2016

@rthomas Welcome to Rust! It's great that you want to contribute.

I don't agree with the solution in this PR. Maybe the linked issue isn't really a bug at all, it's just a surprising fact. If there is a rounding error in capacity(), we should fix that instead, or we can just let this behavior slide.

@brson
Copy link
Contributor

brson commented Jan 20, 2016

I'm also hesitant about this solution, but also the current state of capacity not being enough for len - that seems like an invariant people would reasonably expect.

@ranma42
Copy link
Contributor

ranma42 commented Jan 20, 2016

The exact computation is:

    /// An inverse of `min_capacity`.                                                                                                                                                        
    #[inline]
    fn usable_capacity(&self, cap: usize) -> usize {
        // As the number of entries approaches usable capacity,                                                                                                                                             
        // min_capacity(size) must be smaller than the internal capacity,                                                                                                                                   
        // so that the map is not resized:                                                                                                                                                                  
        // `min_capacity(usable_capacity(x)) <= x`.                                                                                                                                                         
        // The left-hand side can only be smaller due to flooring by integer                                                                                                                                
        // division.                                                                                                                                                                                        
        //                                                                                                                                                                                                  
        // This doesn't have to be checked for overflow since allocation size                                                                                                                               
        // in bytes will overflow earlier than multiplication by 10.                                                                                                                                        
        (cap * 10 + 10 - 1) / 11
    }

In general, if min_capacity is usable_size * num / den, usable_capacity should be (cap * den + den - 1) / num.

@brson
Copy link
Contributor

brson commented Jan 21, 2016

On second thought this doesn't look so bad to me. It is kinda a hack to paper over a deficiency, but it makes the capacity behave how you expect.

cc @rust-lang/libs

@alexcrichton
Copy link
Member

I would (perhaps naively) expect capacity() >= len() to always hold as well, so I'd be fine with fixing this problem somehow, although I admit to not understanding the underlying intricacies. Maybe @gankro knows more?

@Gankra
Copy link
Contributor

Gankra commented Jan 21, 2016

TL;DR it does this right now because of integer division. In particular the actual check we do internally is something like allocated_space < new_size * x / y ("is the allocated space 110% of the requested space"), but this function yields allocated_space * y / x ("91% of the allocated space"). In my head, these two can't be synched up because of overflow or perf, but I might be mistaken? Relevant code is HashMap::reserve. @pczarn might recall the details better.

This was deemed "fine" because the capacity is "I definitely won't allocate if you don't exceed this", and we don't actually make any guarantees that we need to reallocate if it is exceeded. It just so happens that HashMap is the only collection where this comes up. Vec is precise, and VecDeque's compensation is cap - 1. Everything else is node-based.

I'm fine with the current situation, and I'd be fine with this fix.

@ranma42
Copy link
Contributor

ranma42 commented Jan 21, 2016

@gankro I believe that the two have not been synched up to keep the code simpler, but in my previous comment I showed how usable_capacity should be changed to have the exact rounding.

If the implementation is updated with that code, HashMap behaves just like the other data structures, i.e. its capacity becomes exact: HashMap will not allocate if you insert at most len() - capacity() items, it will allocate if you insert len() - capacity() + 1.

@pczarn
Copy link
Contributor

pczarn commented Jan 21, 2016

@ranma42 is right. Adding 1 to min_capacity works as well.

@rthomas
Copy link
Contributor Author

rthomas commented Feb 1, 2016

Sorry for the lag. I've made the changes that @ranma42 suggested and improved the test for it.


for _ in 0..116 {
a.insert(item, 0);
item = item + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

item += 1;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@Gankra
Copy link
Contributor

Gankra commented Feb 2, 2016

Alright, sorry for the delay. r=me with commits squashed.

(I do wonder if this test is perhaps over-specifying... but if we opt to break it we can of course easily change the test)

@Gankra
Copy link
Contributor

Gankra commented Feb 2, 2016

@bors r+

Thanks a bunch! 😸

@bors
Copy link
Collaborator

bors commented Feb 2, 2016

📌 Commit 8aae7f7 has been approved by Gankro

bors added a commit that referenced this pull request Feb 2, 2016
@bors
Copy link
Collaborator

bors commented Feb 2, 2016

⌛ Testing commit 8aae7f7 with merge ddd1bf5...

@bors bors merged commit 8aae7f7 into rust-lang:master Feb 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants