Skip to content

align_offset seems to generate worse code than is desirable (unless size_of::<T>() == 1) #98809

Closed
@thomcc

Description

@thomcc

There are many examples where align_offset produces worse code than it should (worse than you'd write by hand, at the very least, and in the case of slices, worse then I think it needs to). You can often work around this by going through a *const u8 pointer, although this is surprising, and not always an option (for example, it is not an option for users of align_to).

Here's a minimal example extracted from some real code but massaged somewhat: https://rust.godbolt.org/z/aq8Yo8oYj. For posterity, the generated code here currently is:

example::simd_align1:
        mov     eax, edi
        and     eax, 12
        mov     ecx, edi
        shr     ecx, 2
        neg     ecx
        and     ecx, 3
        test    rax, rax
        cmove   rcx, rax
        lea     rax, [rdi + 4*rcx]
        ret

example::simd_align2:
        lea     rax, [rdi + 15]
        and     rax, -16
        ret

(And simd_align1 here is actually better than I was getting in my real code, where it ended up as a branch. That said, this is bad enough to demonstrate the problem).

I guess this is because align_offset gets passed a *const T which itself may not be aligned to T (for example, (1 as *const f32).align_offset(16) can't be satisfied). That problem is avoided by going through u8 (as is done with simd_align2), since that should always succeed (for non-const which isn't relevant here), and hits the stride == 1 special case added to fix #75579).

However, in this case the compiler should know that the pointer is aligned to 4, since it comes from a slice (and the pointer does end up with align(4) in the LLVM). I think this means there's no actual reason that simd_align1 must be less efficient than simd_align2 (unless I'm missing something), and the issue is just that our align_offset impl doesn't have a fast path for this situation. That said, maybe I'm wrong, and the generated code is bad for some other reason.

Either way, it would be very nice for this to do better for the slice use case -- as I mentioned, going through *const u8 is not viable when align_offset is invoked from an API like align_to (I suspect if we can't fix it for align_offset, we probably could work around it in align_to more directly, but it would be better to do it in align_offset if possible. That said, if I'm wrong about why this is happening, all bets are certainly off).

See also #72356, which seems related (and my hunch is that it will be fixed if we fix this issue, at least in this case).

CC @nagisa, since you asked.

P.S. Sorry that this is a bit of a rambling issue.

Metadata

Metadata

Assignees

Labels

A-codegenArea: Code generationA-raw-pointersArea: raw pointers, MaybeUninit, NonNullI-slowIssue: Problems and improvements with respect to performance of generated code.T-libsRelevant to the library team, which will review and decide on the PR/issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions