Fix BitvSet union_with and symmetric_difference_with #17558

kaseyc · 2014-09-26T08:01:45Z

Updates the other_op function shared by the union/intersect/difference/symmetric_difference -with functions to fix an issue where certain elements would not be present in the result. To fix this, when other op is called, we resize self's nbits to account for any new elements that may be added to the set.

Example:

    let mut a = BitvSet::new();
    let mut b = BitvSet::new();
    a.insert(0);
    b.insert(5);
    a.union_with(&b);
    println!("{}", a); //Prints "{0}" instead of "{0, 5}"

Gankra · 2014-09-26T18:41:38Z

src/libcollections/bitv.rs

@@ -1176,6 +1176,18 @@ impl BitvSet {
        // Expand the vector if necessary
        self_bitv.reserve(other_bitv.capacity());

+        // Increase the number of valid bits if necessary
+        if self_bitv.nbits < other_bitv.nbits {
+            // Zero out any unused bits in self


Is this necessary? reserve is called just above above, which zero's out the bits. I think you can just set self_bitv.nbits to other_bitv.nbits and call it a day. No?

All reserve does is add new uints to the storage vector. This section is for zeroing out any bits in the current uint (e.g. if nbits is 12, we want to 0 the upper 20 or 52 bits set to 0 in case there is something in them). Something similar is done in the insert function, so I figured it would be best to zero it explicitly to be safe.

Yeah, I think insert might be wrong? This structure has been plagued with implementation errors for a while (as you have just discovered). When a new word is added to the Vec, it's definitely zero'd out by reserve:

pub fn reserve(&mut self, size: uint) { let old_size = self.storage.len(); let size = (size + uint::BITS - 1) / uint::BITS; if old_size < size { self.storage.grow(size - old_size, 0); } }

Bits are explicitly zero'd in remove, clear, and pop. @apoelstra's commit message suggests that these bits might somehow become indeterminate, but I can't see how. Maybe this is futureproofing for future optimization?

@apoelstra, any insight?

:/ The reason that Bitv::reserve zeroes out bits was that you could otherwise pull the Bitv out of a BitvSet with get_mut_ref (I guess this should be get_mut now), call reserve on it, and wind up having added things to the BitvSet.

I think I had originally indended these bits to be indeterminate (hence my commit messages to that effect), but may have wound up zeroing them out everywhere while killing bugs like this. I remember realizing that these bits could be read in way more places than I'd initially thought, and that zeroing them only in codepaths where indeterminacy would break things was not feasible.

@apoelstra So basically, the indeterminacy checks can be nixed?

Yes. If they do anything at all (and I don't think they do) they are simply hiding bugs.

So should I go ahead and remove the 0'ing section?

Gankra · 2014-09-26T20:32:01Z

Any reason not to just set nbits in bitvset's reserve? This would reduce room for error like this.

kaseyc · 2014-09-26T20:57:18Z

Not that I can think of. And it would be better to handle it in one place.

kaseyc · 2014-09-26T21:06:33Z

What is the final verdict on masking? Should reserve zero the unused upper bits if it increases nbits?

Gankra · 2014-09-26T21:20:04Z

Yes, Bitv.reserve should zero the contents, and BitvSet.reserve should set nbits to match the total capacity. @apoelstra Sound good to you? Any thoughts?

kaseyc · 2014-09-26T21:44:05Z

What is the reason for having BitvSet.reserve set nbits rather than having Bitv.reserve do it?

Gankra · 2014-09-26T21:46:09Z

Equality in Bitv depends on nbits, but not in BitvSet. Thus, only BitvSet can do this without changing the actual semantics of the public methods.

apoelstra · 2014-09-26T22:26:23Z

Sounds good to me. Agree with BitvSet::reserve setting nbits, it shouldn't affect anything functionally but would make the code easier to reason about.

kaseyc · 2014-09-29T20:27:13Z

@gankro, @apoelstra Can you check out the updated PR? Thanks!

Gankra · 2014-09-29T20:33:14Z

src/libcollections/bitv.rs

+            self.storage.grow(new_size - old_size, 0);
+        }
+
+        // Zero out any unused bits in the current word


grow is zeroing out the bits already. It literally pushes 0's into the Vec until it's the right size.

Sorry that was a bit vague. The relevant part is that this is specifically a precondition to this method call. All bits are allocated by reserve, so no words are ever "partially" initialized.

I found that you actually do need to mask out the current word, as there are some cases where there can be partially initialized words. For example, the with_capacity can return a Bitv with uninitialized bits set to one.

Removing the masking of the current word causes this test case to fail, as the insert will increase the nbits, and all the elements less than nbits (say if we inserted 160 instead) would appear to be inserted.

Ah, good catch. Alright then.

I think with_capacity needs to be fixed not to do this, as per the discussion I had with @gankro on the last PR...we should target "unitialized bits are always zero" since it is easier to check the places that might create unitialized bits than the ones that might use them.

Sounds good.

kaseyc · 2014-09-30T22:06:11Z

@gankro @apoelstra Sorry to keep bothering you. I changed the with_capacity function to fit in with @apoelstra's vision and cleaned up some of the ugly edge case handling.

Gankra · 2014-09-30T23:37:41Z

Thanks for persisting! It looks great now! Two nits: you make a useless temp variable in bitvset.from_bitv, and bitvset.with_capacity should dispatch to from_bitv.

(On mobile can t comment on diffs)

kaseyc · 2014-09-30T23:52:33Z

Thanks for all the feedback! It's been really helpful. The temp is there to make the param mutable. Is there a better way to mutate a function argument?

Edit: Whoops, figured out how to mark parameters as mutable. That does change the signature of the function though.

Gankra · 2014-10-01T00:24:48Z

It doesn't actually change the signature. There's no difference between passing by mutable or immutable value.

kaseyc · 2014-10-01T04:08:19Z

@gankro should be fixed now

Gankra · 2014-10-01T13:19:26Z

@kaseyc Great work! We have a policy about squashing commits once a PR is ready, to keep the history clean. Can you squash all four into one?

kaseyc · 2014-10-01T14:44:43Z

@gankro Done.

Gankra · 2014-10-01T15:26:43Z

@alexcrichton this looks good to go. r?

apoelstra · 2014-10-01T15:27:17Z

Thanks for this @kaseyc! It is much clearer what is going on with these structures now that we never have uninitialized bits :). The code looks good to me.

kaseyc · 2014-10-02T05:00:06Z

@gankro @apoelstra The failed tests came from cases where the code is grabbing the underlying bitv and doing things with it, though the changes to how BitvSet handles the nbits means that doing so no longer works the same way. What should the behavior in that case be? Should code accessing the underlying Bitv just deal with the changed nbits behavior, or should the unwrap method hide that behavior by changing the nbits value to be just enough to hold the stored values or something like that?

apoelstra · 2014-10-02T13:01:06Z

My vote is remove get_mut_ref entirely. Users can use unwrap (which should be renamed to_bitv in accordance with the new rules) and from_bitv (which should sanitize nbits).

kaseyc · 2014-10-02T19:25:46Z

I'll get on that. Thanks.
On Oct 2, 2014 6:01 AM, "Andrew Poelstra" [email protected] wrote:

My vote is remove get_mut_ref entirely. Users can use unwrap (which
should be renamed to_bitv in accordance with the new rules) and from_bitv
(which should sanitize nbits).

—
Reply to this email directly or view it on GitHub
#17558 (comment).

Gankra · 2014-10-02T19:42:37Z

This would need to be marked as a [breaking-change], if you're not familiar with that process. I'm not sure how deprecating that method would work? I guess we can just deprecate it as "doesn't work right, use into_bitv (which I believe is the correct name, not to_bitv) and from_bitv".

CC @aturon thoughts?

apoelstra · 2014-10-02T20:40:35Z

Mark it deprecated with "violates invariants; please use into_bitv and from_bitv instead".

kaseyc · 2014-10-04T21:16:04Z

@apoelstra @gankro Updated the PR with the changes. get_mut_ref is deprecated, unwrap is renamed, converting to and from a Bitv sanitizes the nbits, and I found another place that needed to zero the nbits.

Gankra · 2014-10-04T21:26:41Z

src/libcollections/bitv.rs

    }

    /// Consumes this set to return the underlying bit vector.
+    /// The returned Bitv is sized to hold all elements up to the largest element in the BitvSet.


This seems like a lot of extra work for little concrete value?

That was just to bring it closer to the current behavior, so people aren't getting the bitvsets filled with false values (as the nbits in a bitv is the capacity). It just seemed a little more ergonomic, especially looking at how many of the documentation examples use into_bitv

It means the into_bitv->from_bitv dance is expensive though, right? Otherwise it's literally a noop, which makes deprecating get_mut reasonable.

It does make it more expensive, yes. I'll change it back to just unwrapping it.

Small note: there should be an empty line between this sentence and the first; the first paragraph is used as a summary by rustdoc, and it seems like this doesn't need to be in it, e.g.

/// Consumes this set to return the underlying bit vector. /// /// The returned Bitv ...

kaseyc · 2014-10-05T00:23:21Z

I took out the nbits adjustment from into_bitv

Gankra · 2014-10-05T02:49:46Z

Looks good again!

@aturon @huonw r?

kaseyc · 2014-10-08T17:36:28Z

@aturon @huonw Can you look over my PR? Thanks!

aturon · 2014-10-08T17:50:31Z

@kaseyc Sure, sorry for the delay!

Overall, I'm still happy with the direction this PR is going. However, I think that get_mut_ref should be removed, rather than deprecated in this case, since with the new design it actually violates invariants.

(Generally, at this stage we deprecate when it's easy/safe to do so, but breakage is OK.)

As @gankro pointed out, you'll need to add a [breaking-change] marker to the end of the commit message.

kaseyc · 2014-10-08T17:53:24Z

Sounds good. There should already be a [breaking-change] tag on the commit message, does the PR need one too?

aturon · 2014-10-08T17:56:28Z

@kaseyc Ah, I hadn't realized there was one already there. Just on the commit is fine.

Functions that add bits now ensure that any unused bits are set to 0. `into_bitv` sanitizes the nbits of the Bitv/BitvSet it returns by setting the nbits to the current capacity. Fix a bug with `union_with` and `symmetric_difference` with due to not updating nbits properly Add test cases to the _with functions Remove `get_mut_ref` This is a [breaking-change]. The things you will need to fix are: 1. BitvSet's `unwrap()` has been renamed to `into_bitv` 2. BitvSet's `get_mut_ref()` has been removed. Use `into_bitv()` and `from_bitv()` instead.

kaseyc · 2014-10-09T07:15:43Z

@gankro @aturon The commit is updated with get_mut_ref's removal.

Updates the other_op function shared by the union/intersect/difference/symmetric_difference -with functions to fix an issue where certain elements would not be present in the result. To fix this, when other op is called, we resize self's nbits to account for any new elements that may be added to the set. Example: ```rust let mut a = BitvSet::new(); let mut b = BitvSet::new(); a.insert(0); b.insert(5); a.union_with(&b); println!("{}", a); //Prints "{0}" instead of "{0, 5}" ```

Gankra · 2014-10-09T20:57:54Z

Congratulations, and thanks a ton for persevering on this!

I hope to see more Rust contributions from you in the future! 😄

kaseyc · 2014-10-09T21:20:31Z

Thanks! And thank you for all your help through the process :D
On Oct 9, 2014 1:58 PM, "Alexis Beingessner" [email protected]
wrote:

Congratulations, and thanks a ton for persevering on this!

I hope to see more Rust contributions from you in the future! [image:
😄]

—
Reply to this email directly or view it on GitHub
#17558 (comment).

fix: Fix double rounding of `f32` literals Fixes rust-lang#17556 by delaying parsing until the type is known. Also adds a test to check the issue is fixed.

Gankra reviewed Sep 26, 2014
View reviewed changes

Gankra reviewed Sep 29, 2014
View reviewed changes

Gankra reviewed Oct 4, 2014
View reviewed changes

bors closed this Oct 9, 2014

bors merged commit dd4fa90 into rust-lang:master Oct 9, 2014

Fix BitvSet union_with and symmetric_difference_with #17558

Fix BitvSet union_with and symmetric_difference_with #17558

Conversation

kaseyc commented Sep 26, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gankra commented Sep 26, 2014

kaseyc commented Sep 26, 2014

kaseyc commented Sep 26, 2014

Gankra commented Sep 26, 2014

kaseyc commented Sep 26, 2014

Gankra commented Sep 26, 2014

apoelstra commented Sep 26, 2014

kaseyc commented Sep 29, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaseyc commented Sep 30, 2014

Gankra commented Sep 30, 2014

kaseyc commented Sep 30, 2014

Gankra commented Oct 1, 2014

kaseyc commented Oct 1, 2014

Gankra commented Oct 1, 2014

kaseyc commented Oct 1, 2014

Gankra commented Oct 1, 2014

apoelstra commented Oct 1, 2014

kaseyc commented Oct 2, 2014

apoelstra commented Oct 2, 2014

kaseyc commented Oct 2, 2014

Gankra commented Oct 2, 2014

apoelstra commented Oct 2, 2014

kaseyc commented Oct 4, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaseyc commented Oct 5, 2014

Gankra commented Oct 5, 2014

kaseyc commented Oct 8, 2014

aturon commented Oct 8, 2014

kaseyc commented Oct 8, 2014

aturon commented Oct 8, 2014

kaseyc commented Oct 9, 2014

Gankra commented Oct 9, 2014

kaseyc commented Oct 9, 2014