Closed
Description
#[feature(asm)];
#[inline(never)]
unsafe fn print_first_half(arr: [u8, ..16]) {
let mut out: u64;
asm!("movups $1, %xmm0
pextrq $$0, %xmm0, $0"
: "=r"(out) : "m"(arr) : "xmm0");
println!("{:?}", out);
}
fn main() {
let arr: [u8, ..16] = [0, ..16];
unsafe { print_first_half(arr); }
}
$ rustc -v
rustc 0.10-pre (68a4f7d 2014-02-24 12:42:02 -0800)
host: x86_64-unknown-linux-gnu
$ rustc -O foo.rs && ./foo
140489369304528u64
This should be 0u64
; try replacing the movups
with xorps %xmm0, %xmm0
. Here's the generated code:
$ objdump -d foo
…
4069f9: 48 8b 07 mov (%rdi),%rax
4069fc: 48 8b 4f 08 mov 0x8(%rdi),%rcx
406a00: 48 89 4c 24 48 mov %rcx,0x48(%rsp)
406a05: 48 89 44 24 40 mov %rax,0x40(%rsp)
406a0a: 48 8d 44 24 40 lea 0x40(%rsp),%rax
406a0f: 48 89 44 24 08 mov %rax,0x8(%rsp)
406a14: 0f 10 44 24 08 movups 0x8(%rsp),%xmm0
406a19: 66 48 0f 3a 16 c0 00 pextrq $0x0,%xmm0,%rax
…
So it copies the array to 0x40(%rsp)
(in two 64-bit pieces), then puts that address at 0x8(%rsp)
, and movups
loads 16 bytes from there rather than from the array itself.
In GCC, I would do
void f(char *arr) {
asm("movups %0, %%xmm0" :: "m"(*arr));
}
which gcc -O3
turns into the optimal
0: 0f 10 07 movups (%rdi),%xmm0
Attempting to do the same in Rust
asm!("movups $0, %xmm0" :: "m"(*(arr.as_ptr())) : "xmm0");
produces even wronger code
4069f9: 8a 07 mov (%rdi),%al
4069fb: 88 04 24 mov %al,(%rsp)
4069fe: 0f 10 04 24 movups (%rsp),%xmm0
Workarounds:
-
When the array is a static with a name, just name it within the
asm!
. See pub static disappears if only used from asm #13365.
asm!("movups ($0), %xmm0" : : "r"(arr.as_ptr()) : "xmm0");
which generates optimal code in this case, because the array is already pointed to by %rdi
, but in general may clobber a register and emit a load when neither is necessary.