-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Clarify HashMap's capacity handling. #36766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
r? @aturon (rust_highfive has picked a reviewer for you, use r? to override) |
I'm not in the compiler code much, but IIRC "span" is used during parsing to indicate a section of text (such as to report for an error). That particular usage of the word is fairly far away from this particular usage though, so there may not be a conflict in practice. |
r? @arthurprs |
The PR has merits, it's a bit confusing whatever capacity is referring to the hashmap capacity (the exposed one) or the rawtable capacity. But the proposed naming is a bit odd to me, maybe adding some |
It's true that the meaning of If you are still unhappy, I could change all uses of "capacity" within
As for adding prefixes, I think that's less effective. The use of an unprefixed "capacity" is baked into Finally, I assume this is the review for my patch? AFAICT it's an r-, but the suggestions on how to change things to move towards an r+ are quite vague. When I r- patches for Firefox I always do my best to make it clear that (a) it is an r-, and (b) what changes are required to move towards r+, or if the patch is irredeemably flawed, make that fact clear. |
That wasn't a formal review, just a comment trying to create some discussion around the problem at hand. I think we should preferably avoid introducing another concept (span), even if it's a simple one. I propose internal_capacity (or a variation) instead, as it has a better chance of being understood upfront. It's a trade-off of course, between disambiguating and not introducing another concept. If you don't agree we can always wait/weight more opinions. |
I can live with that. Do you want it to be used for |
I don't think we need any changes to RawTable. What do you think? |
I'm considering |
+1 for I don't consistency is in play there. |
e55bb04
to
f6f66aa
Compare
@arthurps: I have updated the commit to use "capacity" and "raw capacity". I left |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, only one optional nit.
This also makes reserve(0) a noop on an empty HashMap, which I think is good.
@@ -34,13 +34,9 @@ use super::table::BucketState::{ | |||
Full, | |||
}; | |||
|
|||
const INITIAL_LOG2_CAP: usize = 5; | |||
const INITIAL_CAPACITY: usize = 1 << INITIAL_LOG2_CAP; // 2^5 | |||
const MIN_NONZERO_RAW_CAPACITY: usize = 32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should leave something to remind that it must be a power of 2? Like the // 2^5
before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the commit to add a comment saying it must be a power of two.
This commit does the following. - Changes the terminology for capacities used within HashMap's code. "Internal capacity" is now consistently "raw capacity", and "usable capacity" is now consistently just "capacity". This makes the code easier to understand. - Reworks capacity and raw capacity computations. Raw capacity computations are now handled in a single place: `DefaultResizePolicy::raw_capacity()`. This function correctly returns zero when given zero, which means that the following cases now result in a capacity of zero when they previously did not. * `Hash{Map,Set}::with_capacity(0)` * `Hash{Map,Set}::with_capacity_and_hasher(0)` * `Hash{Map,Set}::shrink_to_fit()`, when used with a hash map/set whose elements have all been removed - Strengthens the language used in the comments describing the above functions, to make it clearer when they will result in a map/set with a capacity of zero. The new language is based on the language used for the corresponding functions in `Vec`. - Adds tests for the above zero-capacity cases. - Removes `test_resize_policy` because it is no longer useful.
f6f66aa
to
6a9b5e4
Compare
// 2. Ensure it is a power of two. | ||
// 3. Ensure it is at least the minimum size. | ||
let mut raw_cap = len * 11 / 10; | ||
assert!(raw_cap >= len, "raw_cap overflow"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion is a once-off (on hashmap creation from capacity), isn't it? So it should have no major impact on performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to double check if this adds more work to the pub fn reserve
, it's called before every insert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it? I was just looking at that. I don't think it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only called if capacity is insufficient, so in resizes from insert. That's ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you're right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah the PR actually already changes reserve to do less work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is called from with_capacity_and_header
, reserve
, and shrink_to_fit
. Prior to this PR, those functions had an overflow assertion. (shrink_to_fits
's was a debug_assert!
, the others were an assert!
.)
In the new code, the assertion has moved out of those functions, into DefaultResizePolicy::raw_capacity
. So it shouldn't have any effect.
@@ -667,28 +667,23 @@ impl<K, V, S> HashMap<K, V, S> | |||
/// ``` | |||
#[stable(feature = "rust1", since = "1.0.0")] | |||
pub fn reserve(&mut self, additional: usize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, reserve should have a fast path that returns quickly if the additional capacity is already present. This fast path should not have branches to panic. I'm not sure how big a difference it would make, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about
let remaining = self.capacity() - self.len(); # this can't overflow
if remaining < additional {
....
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nnethercote would you like to do this in the PR or want to wrap it up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do want to wrap it up, but might as well do it properly. I'll take a look at this on Monday.
r? arthurps for the additional commit that avoids the overflow check on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thank you for the review. Do we need to tell bors? |
I think it's reserved for the organization members. |
@bors r+ Thank you @nnethercote and @arthurprs |
📌 Commit 607d297 has been approved by |
Clarify HashMap's capacity handling. HashMap has two notions of "capacity": - "Usable capacity": the number of elements a hash map can hold without resizing. This is the meaning of "capacity" used in HashMap's API, e.g. the `with_capacity()` function. - "Internal capacity": the number of allocated slots. Except for the zero case, it is always larger than the usable capacity (because some slots must be left empty) and is always a power of two. HashMap's code is confusing because it does a poor job of distinguishing these two meanings. I propose using two different terms for these two concepts. Because "capacity" is already used in HashMap's API to mean "usable capacity", I will use a different word for "internal capacity". I propose "span", though I'm happy to consider other names.
HashMap has two notions of "capacity":
resizing. This is the meaning of "capacity" used in HashMap's API,
e.g. the
with_capacity()
function.zero case, it is always larger than the usable capacity (because some
slots must be left empty) and is always a power of two.
HashMap's code is confusing because it does a poor job of
distinguishing these two meanings. I propose using two different terms
for these two concepts. Because "capacity" is already used in HashMap's
API to mean "usable capacity", I will use a different word for "internal
capacity". I propose "span", though I'm happy to consider other names.