-
Notifications
You must be signed in to change notification settings - Fork 13.3k
TypedArena: Eliminate intermediate copy if Iterator size is known #77945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
r? @varkor (rust_highfive has picked a reviewer for you, use r? to override) |
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 455f4de686ab86152f096515da0669d68b32fbbb with merge 9d9c665b6419c577f598d8cc19a94c06e133f859... |
☀️ Try build successful - checks-actions, checks-azure |
Queued 9d9c665b6419c577f598d8cc19a94c06e133f859 with parent 5565241, future comparison URL. |
Btw, you might be aware, but to point it out: the current state of this PR is unsound. Later on, when the arena is dropped, it could drop uninitialized memory, as the I think a good resolution would be to not call Performance wise it will most likely behave like the current state of the PR. So it's mostly a concern for after the current perf run :). |
Oh my, thanks for pointing that out, I was indeed completely unaware of this! |
Finished benchmarking try commit (9d9c665b6419c577f598d8cc19a94c06e133f859): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
This doesn't seem to make much sense to me. The current code errors, fine, let's assume it's incorrect. But this also errors: unsafe fn write_from_iter<I: Iterator<Item = T>>(&self, mut iter: I, len: usize) -> &mut [T] {
let mut i = 0;
let mem = self.ptr.get();
// Use a manual loop since LLVM manages to optimize it better for
// slice iterators
loop {
let value = iter.next();
if i >= len || value.is_none() {
// only advance the pointer when we know for sure how many elements we have
self.ptr.set(mem.add(len)); // notice `len` here
// We only return as many items as the iterator gave us, even
// though it was supposed to give us `len`
return slice::from_raw_parts_mut(mem, i);
}
ptr::write(mem.add(i), value.unwrap());
i += 1;
}
} But this doesn't: unsafe fn write_from_iter<I: Iterator<Item = T>>(&self, mut iter: I, len: usize) -> &mut [T] {
let mut i = 0;
let mem = self.ptr.get();
self.ptr.set(mem.add(len)); // pointer set here
// Use a manual loop since LLVM manages to optimize it better for
// slice iterators
loop {
let value = iter.next();
if i >= len || value.is_none() {
// only advance the pointer when we know for sure how many elements we have
// We only return as many items as the iterator gave us, even
// though it was supposed to give us `len`
return slice::from_raw_parts_mut(mem, i);
}
ptr::write(mem.add(i), value.unwrap());
i += 1;
}
} While both unsound, the last 2 versions should be otherwise equivalent since ptr is not read in this function, except to get The error message |
Maybe this might be due to code inside the next function of the iterator accessing the arena? |
Huh, maybe, that's something I didn't think of. Seems to happen fairly regularly during ast lowering, so yeah, probably that's what causes it. Thank you. |
@bugadani for measuring the perf gains you could just disable arena dropping for now and make it leak, and otherwise revert to the original state of the PR. This would give an upper bound on the perf gains even though it includes the gains of leaking as well, so maybe make a separate PR where you only leak. Comparing the two PRs would give a more precise number(but of course the base changed so percentages won't be 100% accurate). |
@est31 I'm not sure why I'd want to do that? The current perf run shows a tiny improvement already, and if that warrants it, the best I can do here is to specialize the non-copy version for types that implement |
@bugadani oh indeed, specialization would be the best approach here. Disregard what I said :). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly OK, except nits.
Co-authored-by: est31 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
let start_addr = arena.alloc_raw_slice(len); | ||
let mut mem = start_addr; | ||
|
||
while let Some(value) = iter.next() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this unsound if next
panics? I assume the arena would try to call drop on uninitialized memory in that case (*).
(*) When I did this same optimization I added a drop guard which fills the slice with default values (**).
(**) Default can also panic but in that case the process abort, though I know in my case that Default
won't panic for any of the types that use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, thanks for noticing that. I guess this sort of explains why this optimization wasn't done before 😅
I'm trying to make this as general as possible, so no T: Default
here :) However, this means the only way I see to solve this issue is to catch the unwind and undo the failed partial allocation by hand. Which could get messy.
I wonder if it's worth doing, considering the otherwise minimal impact of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with that is that there may be more things allocated after this allocation which has already succeded and handed out a reference. So there isn't a way to undo this allocation as far as I can see (without adding another Vec
to keep track of failed allocations I guess, I suppose that Vec
wouldn't be used 99% of the time so it wouldn't add any overhead other than an increased arena size and complexity).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it the idea of TrustedLen
that the iterator will always have that many elements? If next
panics, it would violate that contract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure which benefit TrustedLen
would give people at all if next
is allowed to panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bugadani oh right. That's super weird though because then what's the benefit of TrustedLen over ExactSizeIterator
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a last attempt to salvage this PR, the dropping arena could get a Vec<Vec<T>>
member that is used instead of the chunks when given an iterator. Then we can simply put the temporary vec that the pre-PR iterator code creates into that list and don't have to copy stuff into the chunk. We already have to pay the cost of deallocating the Vec anyways, all we do is move it into the future. This bloats up the structures of the native allocator and destroys cache locality, but might be worth a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@est31 I guess ExactSizeIterator
isn't unsafe
and so it's technically allowed to misbehave? I'm not sure.
Regarding the memory question:
TypedArena works in pages. When a page can't hold the requested data, a new page is created. The allocator then uses pointers in the last page. This complicates things a bit, but I don't think by a huge deal.
If an allocation panics, we can drop the memory that we allocated, as well as the memory that the inner operations allocated. We know about these regions, because we have them saved on our stack, or we can if currently something is missing.
If a nested tree of of alloc_from_iter
calls fail for whatever reason, the panic will bubble up, and can be caught by each level of alloc call, basically unwinding their work.
The only question is, how do I bubble up the innermost panic? :)
Since Iterator
isn't UnwindSafe
, this looks like a lot more complicated than what should be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that could work, with only a little overhead, is to allocate a new chunk for any allocation calls inside the iterator. I believe this would work without TrustedLen also.
Pseudo rust
let slice = self.alloc_raw_slice();
let chunk_index = self.current_chunk;
// Move to the next chunk, if no next chunk exists the next allocation will force a new chunk to be created
self.current_chunk += 1;
for elem in iterator {
slice.fill(elem);
}
// Move back to the current chunk so we can try filling the rest of it
self.current_chunk = chunk_index;
// Search for the first chunk
fn get_current_chunk(&self, needed: usize) -> &mut Vec {
for chunk in &mut self.chunks[self.current_chunk..] {
if chunk.capacity() - chunk.len() >= needed {
self.current_chunk = index;
return chunk
}
}
self.allocate_chunk()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, that sounds like an interesting idea! There is a theoretical possibility of wasting a ton of memory, but let's hope it's not an issue in practice :)
Thank you everybody for your time. |
TypedArena now tries to not copy iterator contents into a
SmallVec
.I don't know if this wasn't done on purpose, but I copied this code over from
DroplessArena
just in case this was an oversight.