`align_offset` seems to generate worse code than is desirable (unless `size_of::<T>() == 1`)

There are many examples where `align_offset` produces worse code than it should (worse than you'd write by hand, at the very least, and in the case of slices, worse then I think it needs to). You can often work around this by going through a `*const u8` pointer, although this is surprising, and not always an option (for example, it is not an option for users of `align_to`).

Here's a minimal example extracted from some real code but massaged somewhat: https://rust.godbolt.org/z/aq8Yo8oYj. For posterity, the generated code here currently is:

```
example::simd_align1:
        mov     eax, edi
        and     eax, 12
        mov     ecx, edi
        shr     ecx, 2
        neg     ecx
        and     ecx, 3
        test    rax, rax
        cmove   rcx, rax
        lea     rax, [rdi + 4*rcx]
        ret

example::simd_align2:
        lea     rax, [rdi + 15]
        and     rax, -16
        ret
```

(And `simd_align1` here is actually better than I was getting in my real code, where it ended up as a branch. That said, this is bad enough to demonstrate the problem).

I guess this is because `align_offset` gets passed a `*const T` which itself may not be aligned to `T` (for example, `(1 as *const f32).align_offset(16)` can't be satisfied). That problem is avoided by going through u8 (as is done with `simd_align2`), since that should always succeed (for non-const which isn't relevant here), and hits the `stride == 1` special case added to fix https://github.com/rust-lang/rust/issues/75579).

However, in *this* case the compiler should know that the pointer is aligned to 4, since it comes from a slice (and the pointer does end up with `align(4)` in the LLVM). I *think* this means there's no actual reason that `simd_align1` must be less efficient than `simd_align2` (unless I'm missing something), and the issue is just that our `align_offset` impl doesn't have a fast path for this situation. That said, maybe I'm wrong, and the generated code is bad for some other reason.

Either way, it would be very nice for this to do better for the slice use case -- as I mentioned, going through `*const u8` is not viable when `align_offset` is invoked from an API like `align_to` (I suspect if we can't fix it for `align_offset`, we probably could work around it in `align_to` more directly, but it would be better to do it in `align_offset` if possible. That said, if I'm wrong about why this is happening, all bets are certainly off).

See also https://github.com/rust-lang/rust/issues/72356, which seems related (and my hunch is that it will be fixed if we fix this issue, at least in this case).

CC @nagisa, since [you asked](https://github.com/rust-lang/rust/blob/aedf78e56b2279cc869962feac5153b6ba7001ed/library/core/src/ptr/mod.rs#L1591).

P.S. Sorry that this is a bit of a rambling issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`align_offset` seems to generate worse code than is desirable (unless `size_of::<T>() == 1`) #98809

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

align_offset seems to generate worse code than is desirable (unless size_of::<T>() == 1) #98809

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`align_offset` seems to generate worse code than is desirable (unless `size_of::<T>() == 1`) #98809