Closed
Description
When calling a boxed closure the captured data is copied from heap to the stack before the call. It happens both with trait object and a concrete type inside the box.
#[inline(never)]
pub fn do_call<F>(b: Box<F>) where F: FnOnce() {
//pub fn do_call(b: Box<dyn FnOnce()>) {
b();
}
pub unsafe fn foo(large: [u32; 1000]) {
do_call(
Box::new(move || {
dummy(&large);
})
);
}
extern "C" {
fn dummy(data: &[u32; 1000]);
}
example::do_call:
push r14
push rbx
sub rsp, 4008
mov rbx, rdi
lea r14, [rsp + 8]
mov edx, 4000
mov rdi, r14
mov rsi, rbx
call qword ptr [rip + memcpy@GOTPCREL]
mov rdi, r14
call qword ptr [rip + dummy@GOTPCREL]
mov esi, 4000
mov edx, 4
mov rdi, rbx
call qword ptr [rip + __rust_dealloc@GOTPCREL]
add rsp, 4008
pop rbx
pop r14
ret
Godbolt link: https://godbolt.org/z/srCEBf
I think this extra memcpy is unnecessary, and it should be possible to call boxed closures without moving it out from the heap.
Similar C++ code doesn't do any extra copies in do_call (even when compiled without optimizations -O0), godbolt link: https://godbolt.org/z/VY8OLn