-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Fix BitvSet union_with and symmetric_difference_with #17558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -1176,6 +1176,18 @@ impl BitvSet { | |||
// Expand the vector if necessary | |||
self_bitv.reserve(other_bitv.capacity()); | |||
|
|||
// Increase the number of valid bits if necessary | |||
if self_bitv.nbits < other_bitv.nbits { | |||
// Zero out any unused bits in self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? reserve is called just above above, which zero's out the bits. I think you can just set self_bitv.nbits to other_bitv.nbits and call it a day. No?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All reserve does is add new uints to the storage vector. This section is for zeroing out any bits in the current uint (e.g. if nbits is 12, we want to 0 the upper 20 or 52 bits set to 0 in case there is something in them). Something similar is done in the insert function, so I figured it would be best to zero it explicitly to be safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think insert might be wrong? This structure has been plagued with implementation errors for a while (as you have just discovered). When a new word is added to the Vec, it's definitely zero'd out by reserve:
pub fn reserve(&mut self, size: uint) {
let old_size = self.storage.len();
let size = (size + uint::BITS - 1) / uint::BITS;
if old_size < size {
self.storage.grow(size - old_size, 0);
}
}
Bits are explicitly zero'd in remove, clear, and pop. @apoelstra's commit message suggests that these bits might somehow become indeterminate, but I can't see how. Maybe this is futureproofing for future optimization?
@apoelstra, any insight?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:/ The reason that Bitv::reserve
zeroes out bits was that you could otherwise pull the Bitv
out of a BitvSet
with get_mut_ref
(I guess this should be get_mut
now), call reserve
on it, and wind up having added things to the BitvSet
.
I think I had originally indended these bits to be indeterminate (hence my commit messages to that effect), but may have wound up zeroing them out everywhere while killing bugs like this. I remember realizing that these bits could be read in way more places than I'd initially thought, and that zeroing them only in codepaths where indeterminacy would break things was not feasible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apoelstra So basically, the indeterminacy checks can be nixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. If they do anything at all (and I don't think they do) they are simply hiding bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So should I go ahead and remove the 0'ing section?
Any reason not to just set nbits in bitvset's reserve? This would reduce room for error like this. |
Not that I can think of. And it would be better to handle it in one place. |
What is the final verdict on masking? Should reserve zero the unused upper bits if it increases nbits? |
Yes, Bitv.reserve should zero the contents, and BitvSet.reserve should set nbits to match the total capacity. @apoelstra Sound good to you? Any thoughts? |
What is the reason for having BitvSet.reserve set nbits rather than having Bitv.reserve do it? |
Equality in Bitv depends on nbits, but not in BitvSet. Thus, only BitvSet can do this without changing the actual semantics of the public methods. |
Sounds good to me. Agree with |
@gankro, @apoelstra Can you check out the updated PR? Thanks! |
self.storage.grow(new_size - old_size, 0); | ||
} | ||
|
||
// Zero out any unused bits in the current word |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grow is zeroing out the bits already. It literally pushes 0's into the Vec until it's the right size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry that was a bit vague. The relevant part is that this is specifically a precondition to this method call. All bits are allocated by reserve, so no words are ever "partially" initialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that you actually do need to mask out the current word, as there are some cases where there can be partially initialized words. For example, the with_capacity can return a Bitv with uninitialized bits set to one.
Removing the masking of the current word causes this test case to fail, as the insert will increase the nbits, and all the elements less than nbits (say if we inserted 160 instead) would appear to be inserted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch. Alright then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with_capacity
needs to be fixed not to do this, as per the discussion I had with @gankro on the last PR...we should target "unitialized bits are always zero" since it is easier to check the places that might create unitialized bits than the ones that might use them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
@gankro @apoelstra Sorry to keep bothering you. I changed the with_capacity function to fit in with @apoelstra's vision and cleaned up some of the ugly edge case handling. |
Thanks for persisting! It looks great now! Two nits: you make a useless temp variable in bitvset.from_bitv, and bitvset.with_capacity should dispatch to from_bitv. (On mobile can t comment on diffs) |
Thanks for all the feedback! It's been really helpful. The temp is there to make the param mutable. Is there a better way to mutate a function argument? Edit: Whoops, figured out how to mark parameters as mutable. That does change the signature of the function though. |
It doesn't actually change the signature. There's no difference between passing by mutable or immutable value. |
@gankro should be fixed now |
@kaseyc Great work! We have a policy about squashing commits once a PR is ready, to keep the history clean. Can you squash all four into one? |
@gankro Done. |
@alexcrichton this looks good to go. r? |
Thanks for this @kaseyc! It is much clearer what is going on with these structures now that we never have uninitialized bits :). The code looks good to me. |
@gankro @apoelstra The failed tests came from cases where the code is grabbing the underlying bitv and doing things with it, though the changes to how BitvSet handles the nbits means that doing so no longer works the same way. What should the behavior in that case be? Should code accessing the underlying Bitv just deal with the changed nbits behavior, or should the unwrap method hide that behavior by changing the nbits value to be just enough to hold the stored values or something like that? |
My vote is remove |
I'll get on that. Thanks.
|
This would need to be marked as a [breaking-change], if you're not familiar with that process. I'm not sure how deprecating that method would work? I guess we can just deprecate it as "doesn't work right, use CC @aturon thoughts? |
Mark it deprecated with "violates invariants; please use |
@apoelstra @gankro Updated the PR with the changes. get_mut_ref is deprecated, unwrap is renamed, converting to and from a Bitv sanitizes the nbits, and I found another place that needed to zero the nbits. |
} | ||
|
||
/// Consumes this set to return the underlying bit vector. | ||
/// The returned Bitv is sized to hold all elements up to the largest element in the BitvSet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a lot of extra work for little concrete value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was just to bring it closer to the current behavior, so people aren't getting the bitvsets filled with false values (as the nbits in a bitv is the capacity). It just seemed a little more ergonomic, especially looking at how many of the documentation examples use into_bitv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means the into_bitv->from_bitv dance is expensive though, right? Otherwise it's literally a noop, which makes deprecating get_mut reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does make it more expensive, yes. I'll change it back to just unwrapping it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small note: there should be an empty line between this sentence and the first; the first paragraph is used as a summary by rustdoc, and it seems like this doesn't need to be in it, e.g.
/// Consumes this set to return the underlying bit vector.
///
/// The returned Bitv ...
I took out the nbits adjustment from into_bitv |
@kaseyc Sure, sorry for the delay! Overall, I'm still happy with the direction this PR is going. However, I think that (Generally, at this stage we deprecate when it's easy/safe to do so, but breakage is OK.) As @gankro pointed out, you'll need to add a |
Sounds good. There should already be a [breaking-change] tag on the commit message, does the PR need one too? |
@kaseyc Ah, I hadn't realized there was one already there. Just on the commit is fine. |
Functions that add bits now ensure that any unused bits are set to 0. `into_bitv` sanitizes the nbits of the Bitv/BitvSet it returns by setting the nbits to the current capacity. Fix a bug with `union_with` and `symmetric_difference` with due to not updating nbits properly Add test cases to the _with functions Remove `get_mut_ref` This is a [breaking-change]. The things you will need to fix are: 1. BitvSet's `unwrap()` has been renamed to `into_bitv` 2. BitvSet's `get_mut_ref()` has been removed. Use `into_bitv()` and `from_bitv()` instead.
Updates the other_op function shared by the union/intersect/difference/symmetric_difference -with functions to fix an issue where certain elements would not be present in the result. To fix this, when other op is called, we resize self's nbits to account for any new elements that may be added to the set. Example: ```rust let mut a = BitvSet::new(); let mut b = BitvSet::new(); a.insert(0); b.insert(5); a.union_with(&b); println!("{}", a); //Prints "{0}" instead of "{0, 5}" ```
Congratulations, and thanks a ton for persevering on this! I hope to see more Rust contributions from you in the future! 😄 |
Thanks! And thank you for all your help through the process :D
|
fix: Fix double rounding of `f32` literals Fixes rust-lang#17556 by delaying parsing until the type is known. Also adds a test to check the issue is fixed.
Updates the other_op function shared by the union/intersect/difference/symmetric_difference -with functions to fix an issue where certain elements would not be present in the result. To fix this, when other op is called, we resize self's nbits to account for any new elements that may be added to the set.
Example: