Skip to content

Fix BitvSet union_with and symmetric_difference_with #17558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 9, 2014
Merged

Fix BitvSet union_with and symmetric_difference_with #17558

merged 1 commit into from
Oct 9, 2014

Conversation

kaseyc
Copy link
Contributor

@kaseyc kaseyc commented Sep 26, 2014

Updates the other_op function shared by the union/intersect/difference/symmetric_difference -with functions to fix an issue where certain elements would not be present in the result. To fix this, when other op is called, we resize self's nbits to account for any new elements that may be added to the set.

Example:

    let mut a = BitvSet::new();
    let mut b = BitvSet::new();
    a.insert(0);
    b.insert(5);
    a.union_with(&b);
    println!("{}", a); //Prints "{0}" instead of "{0, 5}"

@@ -1176,6 +1176,18 @@ impl BitvSet {
// Expand the vector if necessary
self_bitv.reserve(other_bitv.capacity());

// Increase the number of valid bits if necessary
if self_bitv.nbits < other_bitv.nbits {
// Zero out any unused bits in self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? reserve is called just above above, which zero's out the bits. I think you can just set self_bitv.nbits to other_bitv.nbits and call it a day. No?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All reserve does is add new uints to the storage vector. This section is for zeroing out any bits in the current uint (e.g. if nbits is 12, we want to 0 the upper 20 or 52 bits set to 0 in case there is something in them). Something similar is done in the insert function, so I figured it would be best to zero it explicitly to be safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think insert might be wrong? This structure has been plagued with implementation errors for a while (as you have just discovered). When a new word is added to the Vec, it's definitely zero'd out by reserve:

pub fn reserve(&mut self, size: uint) {
   let old_size = self.storage.len();
   let size = (size + uint::BITS - 1) / uint::BITS;
   if old_size < size {
      self.storage.grow(size - old_size, 0);
   }
}

Bits are explicitly zero'd in remove, clear, and pop. @apoelstra's commit message suggests that these bits might somehow become indeterminate, but I can't see how. Maybe this is futureproofing for future optimization?

@apoelstra, any insight?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:/ The reason that Bitv::reserve zeroes out bits was that you could otherwise pull the Bitv out of a BitvSet with get_mut_ref (I guess this should be get_mut now), call reserve on it, and wind up having added things to the BitvSet.

I think I had originally indended these bits to be indeterminate (hence my commit messages to that effect), but may have wound up zeroing them out everywhere while killing bugs like this. I remember realizing that these bits could be read in way more places than I'd initially thought, and that zeroing them only in codepaths where indeterminacy would break things was not feasible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apoelstra So basically, the indeterminacy checks can be nixed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If they do anything at all (and I don't think they do) they are simply hiding bugs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should I go ahead and remove the 0'ing section?

@Gankra
Copy link
Contributor

Gankra commented Sep 26, 2014

Any reason not to just set nbits in bitvset's reserve? This would reduce room for error like this.

@kaseyc
Copy link
Contributor Author

kaseyc commented Sep 26, 2014

Not that I can think of. And it would be better to handle it in one place.

@kaseyc
Copy link
Contributor Author

kaseyc commented Sep 26, 2014

What is the final verdict on masking? Should reserve zero the unused upper bits if it increases nbits?

@Gankra
Copy link
Contributor

Gankra commented Sep 26, 2014

Yes, Bitv.reserve should zero the contents, and BitvSet.reserve should set nbits to match the total capacity. @apoelstra Sound good to you? Any thoughts?

@kaseyc
Copy link
Contributor Author

kaseyc commented Sep 26, 2014

What is the reason for having BitvSet.reserve set nbits rather than having Bitv.reserve do it?

@Gankra
Copy link
Contributor

Gankra commented Sep 26, 2014

Equality in Bitv depends on nbits, but not in BitvSet. Thus, only BitvSet can do this without changing the actual semantics of the public methods.

@apoelstra
Copy link
Contributor

Sounds good to me. Agree with BitvSet::reserve setting nbits, it shouldn't affect anything functionally but would make the code easier to reason about.

@kaseyc
Copy link
Contributor Author

kaseyc commented Sep 29, 2014

@gankro, @apoelstra Can you check out the updated PR? Thanks!

self.storage.grow(new_size - old_size, 0);
}

// Zero out any unused bits in the current word
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grow is zeroing out the bits already. It literally pushes 0's into the Vec until it's the right size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that was a bit vague. The relevant part is that this is specifically a precondition to this method call. All bits are allocated by reserve, so no words are ever "partially" initialized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that you actually do need to mask out the current word, as there are some cases where there can be partially initialized words. For example, the with_capacity can return a Bitv with uninitialized bits set to one.

Removing the masking of the current word causes this test case to fail, as the insert will increase the nbits, and all the elements less than nbits (say if we inserted 160 instead) would appear to be inserted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch. Alright then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with_capacity needs to be fixed not to do this, as per the discussion I had with @gankro on the last PR...we should target "unitialized bits are always zero" since it is easier to check the places that might create unitialized bits than the ones that might use them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@kaseyc
Copy link
Contributor Author

kaseyc commented Sep 30, 2014

@gankro @apoelstra Sorry to keep bothering you. I changed the with_capacity function to fit in with @apoelstra's vision and cleaned up some of the ugly edge case handling.

@Gankra
Copy link
Contributor

Gankra commented Sep 30, 2014

Thanks for persisting! It looks great now! Two nits: you make a useless temp variable in bitvset.from_bitv, and bitvset.with_capacity should dispatch to from_bitv.

(On mobile can t comment on diffs)

@kaseyc
Copy link
Contributor Author

kaseyc commented Sep 30, 2014

Thanks for all the feedback! It's been really helpful. The temp is there to make the param mutable. Is there a better way to mutate a function argument?

Edit: Whoops, figured out how to mark parameters as mutable. That does change the signature of the function though.

@Gankra
Copy link
Contributor

Gankra commented Oct 1, 2014

It doesn't actually change the signature. There's no difference between passing by mutable or immutable value.

@kaseyc
Copy link
Contributor Author

kaseyc commented Oct 1, 2014

@gankro should be fixed now

@Gankra
Copy link
Contributor

Gankra commented Oct 1, 2014

@kaseyc Great work! We have a policy about squashing commits once a PR is ready, to keep the history clean. Can you squash all four into one?

@kaseyc
Copy link
Contributor Author

kaseyc commented Oct 1, 2014

@gankro Done.

@Gankra
Copy link
Contributor

Gankra commented Oct 1, 2014

@alexcrichton this looks good to go. r?

@apoelstra
Copy link
Contributor

Thanks for this @kaseyc! It is much clearer what is going on with these structures now that we never have uninitialized bits :). The code looks good to me.

@kaseyc
Copy link
Contributor Author

kaseyc commented Oct 2, 2014

@gankro @apoelstra The failed tests came from cases where the code is grabbing the underlying bitv and doing things with it, though the changes to how BitvSet handles the nbits means that doing so no longer works the same way. What should the behavior in that case be? Should code accessing the underlying Bitv just deal with the changed nbits behavior, or should the unwrap method hide that behavior by changing the nbits value to be just enough to hold the stored values or something like that?

@apoelstra
Copy link
Contributor

My vote is remove get_mut_ref entirely. Users can use unwrap (which should be renamed to_bitv in accordance with the new rules) and from_bitv (which should sanitize nbits).

@kaseyc
Copy link
Contributor Author

kaseyc commented Oct 2, 2014

I'll get on that. Thanks.
On Oct 2, 2014 6:01 AM, "Andrew Poelstra" [email protected] wrote:

My vote is remove get_mut_ref entirely. Users can use unwrap (which
should be renamed to_bitv in accordance with the new rules) and from_bitv
(which should sanitize nbits).


Reply to this email directly or view it on GitHub
#17558 (comment).

@Gankra
Copy link
Contributor

Gankra commented Oct 2, 2014

This would need to be marked as a [breaking-change], if you're not familiar with that process. I'm not sure how deprecating that method would work? I guess we can just deprecate it as "doesn't work right, use into_bitv (which I believe is the correct name, not to_bitv) and from_bitv".

CC @aturon thoughts?

@apoelstra
Copy link
Contributor

Mark it deprecated with "violates invariants; please use into_bitv and from_bitv instead".

@kaseyc
Copy link
Contributor Author

kaseyc commented Oct 4, 2014

@apoelstra @gankro Updated the PR with the changes. get_mut_ref is deprecated, unwrap is renamed, converting to and from a Bitv sanitizes the nbits, and I found another place that needed to zero the nbits.

}

/// Consumes this set to return the underlying bit vector.
/// The returned Bitv is sized to hold all elements up to the largest element in the BitvSet.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a lot of extra work for little concrete value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was just to bring it closer to the current behavior, so people aren't getting the bitvsets filled with false values (as the nbits in a bitv is the capacity). It just seemed a little more ergonomic, especially looking at how many of the documentation examples use into_bitv

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means the into_bitv->from_bitv dance is expensive though, right? Otherwise it's literally a noop, which makes deprecating get_mut reasonable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does make it more expensive, yes. I'll change it back to just unwrapping it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small note: there should be an empty line between this sentence and the first; the first paragraph is used as a summary by rustdoc, and it seems like this doesn't need to be in it, e.g.

/// Consumes this set to return the underlying bit vector.
///
/// The returned Bitv ...

@kaseyc
Copy link
Contributor Author

kaseyc commented Oct 5, 2014

I took out the nbits adjustment from into_bitv

@Gankra
Copy link
Contributor

Gankra commented Oct 5, 2014

Looks good again!

@aturon @huonw r?

@kaseyc
Copy link
Contributor Author

kaseyc commented Oct 8, 2014

@aturon @huonw Can you look over my PR? Thanks!

@aturon
Copy link
Member

aturon commented Oct 8, 2014

@kaseyc Sure, sorry for the delay!

Overall, I'm still happy with the direction this PR is going. However, I think that get_mut_ref should be removed, rather than deprecated in this case, since with the new design it actually violates invariants.

(Generally, at this stage we deprecate when it's easy/safe to do so, but breakage is OK.)

As @gankro pointed out, you'll need to add a [breaking-change] marker to the end of the commit message.

@kaseyc
Copy link
Contributor Author

kaseyc commented Oct 8, 2014

Sounds good. There should already be a [breaking-change] tag on the commit message, does the PR need one too?

@aturon
Copy link
Member

aturon commented Oct 8, 2014

@kaseyc Ah, I hadn't realized there was one already there. Just on the commit is fine.

Functions that add bits now ensure that any unused bits are set to 0.
`into_bitv` sanitizes the nbits of the Bitv/BitvSet it returns by setting the nbits to the current capacity.
Fix a bug with `union_with` and `symmetric_difference` with due to not updating nbits properly
Add test cases to the _with functions
Remove `get_mut_ref`

This is a [breaking-change]. The things you will need to fix are:

1. BitvSet's `unwrap()` has been renamed to `into_bitv`
2. BitvSet's `get_mut_ref()` has been removed. Use `into_bitv()` and `from_bitv()` instead.
@kaseyc
Copy link
Contributor Author

kaseyc commented Oct 9, 2014

@gankro @aturon The commit is updated with get_mut_ref's removal.

bors added a commit that referenced this pull request Oct 9, 2014
Updates the other_op function shared by the union/intersect/difference/symmetric_difference -with functions to fix an issue where certain elements would not be present in the result. To fix this, when other op is called, we resize self's nbits to account for any new elements that may be added to the set.

Example:
```rust
	let mut a = BitvSet::new();
	let mut b = BitvSet::new();
	a.insert(0);
	b.insert(5);
	a.union_with(&b);
	println!("{}", a); //Prints "{0}" instead of "{0, 5}"
```
@bors bors closed this Oct 9, 2014
@bors bors merged commit dd4fa90 into rust-lang:master Oct 9, 2014
@Gankra
Copy link
Contributor

Gankra commented Oct 9, 2014

Congratulations, and thanks a ton for persevering on this!

I hope to see more Rust contributions from you in the future! 😄

@kaseyc
Copy link
Contributor Author

kaseyc commented Oct 9, 2014

Thanks! And thank you for all your help through the process :D
On Oct 9, 2014 1:58 PM, "Alexis Beingessner" [email protected]
wrote:

Congratulations, and thanks a ton for persevering on this!

I hope to see more Rust contributions from you in the future! [image:
😄]


Reply to this email directly or view it on GitHub
#17558 (comment).

lnicola pushed a commit to lnicola/rust that referenced this pull request Jul 11, 2024
fix: Fix double rounding of `f32` literals

Fixes rust-lang#17556 by delaying parsing until the type is known. Also adds a test to check the issue is fixed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants