Skip to content

TypedArena: Eliminate intermediate copy if Iterator size is known #77945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

bugadani
Copy link
Contributor

TypedArena now tries to not copy iterator contents into a SmallVec.

I don't know if this wasn't done on purpose, but I copied this code over from DroplessArena just in case this was an oversight.

@rust-highfive
Copy link
Contributor

r? @varkor

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 14, 2020
@jonas-schievink
Copy link
Contributor

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Collaborator

bors commented Oct 14, 2020

⌛ Trying commit 455f4de686ab86152f096515da0669d68b32fbbb with merge 9d9c665b6419c577f598d8cc19a94c06e133f859...

@bors
Copy link
Collaborator

bors commented Oct 14, 2020

☀️ Try build successful - checks-actions, checks-azure
Build commit: 9d9c665b6419c577f598d8cc19a94c06e133f859 (9d9c665b6419c577f598d8cc19a94c06e133f859)

@rust-timer
Copy link
Collaborator

Queued 9d9c665b6419c577f598d8cc19a94c06e133f859 with parent 5565241, future comparison URL.

@est31
Copy link
Member

est31 commented Oct 14, 2020

Btw, you might be aware, but to point it out: the current state of this PR is unsound. Later on, when the arena is dropped, it could drop uninitialized memory, as the TrustedLen trait bound is not used.

I think a good resolution would be to not call alloc_raw_slice but do its content manually. Most importantly, self.ptr.set should only be called right above the return line in write_from_iter.

Performance wise it will most likely behave like the current state of the PR. So it's mostly a concern for after the current perf run :).

@bugadani
Copy link
Contributor Author

Oh my, thanks for pointing that out, I was indeed completely unaware of this!

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (9d9c665b6419c577f598d8cc19a94c06e133f859): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

@bugadani
Copy link
Contributor Author

This doesn't seem to make much sense to me.

The current code errors, fine, let's assume it's incorrect.

But this also errors:

unsafe fn write_from_iter<I: Iterator<Item = T>>(&self, mut iter: I, len: usize) -> &mut [T] {
    let mut i = 0;
    let mem = self.ptr.get();
    // Use a manual loop since LLVM manages to optimize it better for
    // slice iterators
    loop {
        let value = iter.next();
        if i >= len || value.is_none() {
            // only advance the pointer when we know for sure how many elements we have
            self.ptr.set(mem.add(len)); // notice `len` here

            // We only return as many items as the iterator gave us, even
            // though it was supposed to give us `len`
           return slice::from_raw_parts_mut(mem, i);
        }
        ptr::write(mem.add(i), value.unwrap());
        i += 1;
    }
}

But this doesn't:

unsafe fn write_from_iter<I: Iterator<Item = T>>(&self, mut iter: I, len: usize) -> &mut [T] {
    let mut i = 0;
    let mem = self.ptr.get();
    self.ptr.set(mem.add(len)); // pointer set here
    // Use a manual loop since LLVM manages to optimize it better for
    // slice iterators
    loop {
        let value = iter.next();
        if i >= len || value.is_none() {
            // only advance the pointer when we know for sure how many elements we have

            // We only return as many items as the iterator gave us, even
            // though it was supposed to give us `len`
           return slice::from_raw_parts_mut(mem, i);
        }
        ptr::write(mem.add(i), value.unwrap());
        i += 1;
    }
}

While both unsound, the last 2 versions should be otherwise equivalent since ptr is not read in this function, except to get mem.

The error message corrupted size vs. prev_size seems to indicate there is a bad write somewhere, but why would that depend on the location where we copy an address? I think I'm missing something obvious here.

@est31
Copy link
Member

est31 commented Oct 15, 2020

Maybe this might be due to code inside the next function of the iterator accessing the arena?

@bugadani
Copy link
Contributor Author

bugadani commented Oct 15, 2020

Huh, maybe, that's something I didn't think of. Seems to happen fairly regularly during ast lowering, so yeah, probably that's what causes it. Thank you.

@est31
Copy link
Member

est31 commented Oct 15, 2020

@bugadani for measuring the perf gains you could just disable arena dropping for now and make it leak, and otherwise revert to the original state of the PR. This would give an upper bound on the perf gains even though it includes the gains of leaking as well, so maybe make a separate PR where you only leak. Comparing the two PRs would give a more precise number(but of course the base changed so percentages won't be 100% accurate).

@bugadani
Copy link
Contributor Author

@est31 I'm not sure why I'd want to do that? The current perf run shows a tiny improvement already, and if that warrants it, the best I can do here is to specialize the non-copy version for types that implement TrustedLen. Otherwise I don't think another perf run would make sense.

@est31
Copy link
Member

est31 commented Oct 15, 2020

@bugadani oh indeed, specialization would be the best approach here. Disregard what I said :).

Copy link
Member

@est31 est31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly OK, except nits.

Copy link
Member

@est31 est31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@jyn514 jyn514 added I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 16, 2020
let start_addr = arena.alloc_raw_slice(len);
let mut mem = start_addr;

while let Some(value) = iter.next() {
Copy link
Contributor

@Marwes Marwes Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this unsound if next panics? I assume the arena would try to call drop on uninitialized memory in that case (*).

(*) When I did this same optimization I added a drop guard which fills the slice with default values (**).
(**) Default can also panic but in that case the process abort, though I know in my case that Default won't panic for any of the types that use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, thanks for noticing that. I guess this sort of explains why this optimization wasn't done before 😅

I'm trying to make this as general as possible, so no T: Default here :) However, this means the only way I see to solve this issue is to catch the unwind and undo the failed partial allocation by hand. Which could get messy.

I wonder if it's worth doing, considering the otherwise minimal impact of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with that is that there may be more things allocated after this allocation which has already succeded and handed out a reference. So there isn't a way to undo this allocation as far as I can see (without adding another Vec to keep track of failed allocations I guess, I suppose that Vec wouldn't be used 99% of the time so it wouldn't add any overhead other than an increased arena size and complexity).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it the idea of TrustedLen that the iterator will always have that many elements? If next panics, it would violate that contract.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which benefit TrustedLen would give people at all if next is allowed to panic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bugadani oh right. That's super weird though because then what's the benefit of TrustedLen over ExactSizeIterator?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a last attempt to salvage this PR, the dropping arena could get a Vec<Vec<T>> member that is used instead of the chunks when given an iterator. Then we can simply put the temporary vec that the pre-PR iterator code creates into that list and don't have to copy stuff into the chunk. We already have to pay the cost of deallocating the Vec anyways, all we do is move it into the future. This bloats up the structures of the native allocator and destroys cache locality, but might be worth a try.

Copy link
Contributor Author

@bugadani bugadani Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@est31 I guess ExactSizeIterator isn't unsafe and so it's technically allowed to misbehave? I'm not sure.

Regarding the memory question:

TypedArena works in pages. When a page can't hold the requested data, a new page is created. The allocator then uses pointers in the last page. This complicates things a bit, but I don't think by a huge deal.

If an allocation panics, we can drop the memory that we allocated, as well as the memory that the inner operations allocated. We know about these regions, because we have them saved on our stack, or we can if currently something is missing.

If a nested tree of of alloc_from_iter calls fail for whatever reason, the panic will bubble up, and can be caught by each level of alloc call, basically unwinding their work.

The only question is, how do I bubble up the innermost panic? :)

Since Iterator isn't UnwindSafe, this looks like a lot more complicated than what should be done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that could work, with only a little overhead, is to allocate a new chunk for any allocation calls inside the iterator. I believe this would work without TrustedLen also.

Pseudo rust

let slice = self.alloc_raw_slice();
let chunk_index = self.current_chunk;
// Move to the next chunk, if no next chunk exists the next allocation will force a new chunk to be created
self.current_chunk += 1;
for elem in iterator {
    slice.fill(elem);
}
// Move back to the current chunk so we can try filling the rest of it
self.current_chunk = chunk_index;

// Search for the first chunk
fn get_current_chunk(&self, needed: usize) -> &mut Vec {
    for chunk in &mut self.chunks[self.current_chunk..] {
        if chunk.capacity()  - chunk.len() >= needed {
             self.current_chunk = index;
             return chunk
        }
    }
    self.allocate_chunk()
}

Copy link
Contributor Author

@bugadani bugadani Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, that sounds like an interesting idea! There is a theoretical possibility of wasting a ton of memory, but let's hope it's not an issue in practice :)

@bugadani bugadani closed this Oct 16, 2020
@bugadani
Copy link
Contributor Author

Thank you everybody for your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-compiletime Issue: Problems and improvements with respect to compile times. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants