Skip to content

Array as asm input generates wrong code #13366

Closed
@kmcallister

Description

@kmcallister
#[feature(asm)];

#[inline(never)]
unsafe fn print_first_half(arr: [u8, ..16]) {
    let mut out: u64;
    asm!("movups $1, %xmm0
          pextrq $$0, %xmm0, $0"
          : "=r"(out) : "m"(arr) : "xmm0");
    println!("{:?}", out);
}

fn main() {
    let arr: [u8, ..16] = [0, ..16];
    unsafe { print_first_half(arr); }
}
$ rustc -v
rustc 0.10-pre (68a4f7d 2014-02-24 12:42:02 -0800)
host: x86_64-unknown-linux-gnu

$ rustc -O foo.rs && ./foo
140489369304528u64

This should be 0u64; try replacing the movups with xorps %xmm0, %xmm0. Here's the generated code:

$ objdump -d foo
…
  4069f9:       48 8b 07                mov    (%rdi),%rax
  4069fc:       48 8b 4f 08             mov    0x8(%rdi),%rcx
  406a00:       48 89 4c 24 48          mov    %rcx,0x48(%rsp)
  406a05:       48 89 44 24 40          mov    %rax,0x40(%rsp)
  406a0a:       48 8d 44 24 40          lea    0x40(%rsp),%rax
  406a0f:       48 89 44 24 08          mov    %rax,0x8(%rsp)
  406a14:       0f 10 44 24 08          movups 0x8(%rsp),%xmm0
  406a19:       66 48 0f 3a 16 c0 00    pextrq $0x0,%xmm0,%rax
…

So it copies the array to 0x40(%rsp) (in two 64-bit pieces), then puts that address at 0x8(%rsp), and movups loads 16 bytes from there rather than from the array itself.

In GCC, I would do

void f(char *arr) {
    asm("movups %0, %%xmm0" :: "m"(*arr));
}

which gcc -O3 turns into the optimal

   0:   0f 10 07                movups (%rdi),%xmm0

Attempting to do the same in Rust

asm!("movups $0, %xmm0" :: "m"(*(arr.as_ptr())) : "xmm0");

produces even wronger code

  4069f9:       8a 07                   mov    (%rdi),%al
  4069fb:       88 04 24                mov    %al,(%rsp)
  4069fe:       0f 10 04 24             movups (%rsp),%xmm0

Workarounds:

  1. When the array is a static with a name, just name it within the asm!. See pub static disappears if only used from asm #13365.

asm!("movups ($0), %xmm0" : : "r"(arr.as_ptr()) : "xmm0");

which generates optimal code in this case, because the array is already pointed to by %rdi, but in general may clobber a register and emit a load when neither is necessary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-inline-assemblyArea: Inline assembly (`asm!(…)`)C-bugCategory: This is a bug.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.requires-nightlyThis issue requires a nightly compiler in some way.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions