-
Notifications
You must be signed in to change notification settings - Fork 13.3k
BTreeSet difference, symmetric_difference & union optimized #65375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Peeking { head, tail: iter } | ||
} | ||
fn next(&mut self) -> Option<I::Item> { | ||
match self.head { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this match necessary for correctness? I'd say it's not, since all set iterators should be fused already. So it could be removed if that's better for performance, and you'd have just this?
item = self.head;
self.head = self.tail.next();
item
Not sure if it's a win, at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For correctness, requiring FusedIterator (like I unintentionally left it in #65226) seems solid to me. But relying on that is bad for the performance I measure. That's also why the enum in #65226 is A(Option<I::Item>)
and not A(I::Item)
.
I don't quite understand why. A philosophical argument might me: a FusedIterator indeed keeps returning None, but nobody said it has to do so efficiently. But as far as I can tell, the map Keys iterator can't be any quicker: it just zero-compares the length that is red hot in the cache after updating it every iteration.
Maybe it's because I'm developing and benching this in a separate crate, which means the map Keys code cannot be inlined, and I'm mostly measuring (cross-DLL?) function calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The keys code can inlined (it is generic, instantiated into the crate where it is used). One might tweak down to codegen-units=1 to experiment if that make a difference.
If it doesn't help, we'll just not do that change :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I remembered wrong. Your shortcut helps and it helped in earlier versions. Just don't use a mem::replace
oneliner instead, that's way slower.
.field(&self.0.a.head) | ||
.field(&self.0.a.tail) | ||
.field(&self.0.b.head) | ||
.field(&self.0.b.tail) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will look messier, but I'm not sure if we have any reason to dress up the internals in any way. It should serve us best to just debug print what we have. That would be an argument for printing this as .field(&self.0)
or .field(&self.0.a).field(&self.0.b)
though, and giving Peeking a debug impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The remaining fmt::Debug
for Iter uses the older alignment style here (aligning methods to the .
). Best to keep the style consistent in the file (keep the style as it was).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never understood that part, just stole the debug implementation from some other liballoc struct. I guess part of the reason is to allow the "#[stable()" thingy.
I'm now finally able to produce examples of this output, like:
Intersection(Iter([1, 2, 3]), Iter([1, 2, 3]))
If I simply stick a #[derive(Debug)]
onto IntersectionInner, and simplify Intersection::fmt all the way down to f.debug_tuple("Intersection").field(&self.inner).finish()
, the output is much more clearly:
Intersection(Stitch { a: Iter([1, 2, 3]), b: Iter([1, 2, 3]) })
Nice benchmarks (posted in the other pr)! Looks like a neat improvement |
} | ||
} | ||
} | ||
|
||
fn size_hint(&self) -> (usize, Option<usize>) { | ||
(0, Some(self.a.len() + self.b.len())) | ||
(0, Some(self.0.a.len() + self.0.b.len())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the union upper bound - this sum potentially overflows (and then it should be None).
Both can be ok but it would be best with a comment in that case, that max length of any set is limited.
These are the readings for the updated PR:
Theoretically (and in practice) unchanged tests pruned from this list. You see that halfway through, the future gets bleak. While the future is bright when I build & run only the same benches (using --features merge), as I've always been doing lately. Each run is quite reproducible: the more benches built, the slower they all become, but those named *_future suffer much more. Running fewer benches by passing a name gives the same timing as running all, so it's the build that is different. Seems like the more code there is mixed in, the less the optimizer focuses, and the future code needs more attention. Building only MergeIter-based tests, the story is:
Anyways, looking into cargo-benchcmp now. |
Same story told in benchcmp's words:
gives (hurray!)
But instead:
gives (booh!)
|
Alternative to #65226: instead of peeking specialized to MergeIter, an additional
Peeking
intermediate. More lines of code but more readable, I think. Though it stirs a can of worms about Peekable.It obtains the same gains for symmetric_difference and union, and also boosts
difference
a few percent (if the left set is not much smaller than the right set), e.g.In addition, some cleanup of earlier changes suggested by clippy, which I can push to #65226 too.
r? @bluss