Description
There are many examples where align_offset
produces worse code than it should (worse than you'd write by hand, at the very least, and in the case of slices, worse then I think it needs to). You can often work around this by going through a *const u8
pointer, although this is surprising, and not always an option (for example, it is not an option for users of align_to
).
Here's a minimal example extracted from some real code but massaged somewhat: https://rust.godbolt.org/z/aq8Yo8oYj. For posterity, the generated code here currently is:
example::simd_align1:
mov eax, edi
and eax, 12
mov ecx, edi
shr ecx, 2
neg ecx
and ecx, 3
test rax, rax
cmove rcx, rax
lea rax, [rdi + 4*rcx]
ret
example::simd_align2:
lea rax, [rdi + 15]
and rax, -16
ret
(And simd_align1
here is actually better than I was getting in my real code, where it ended up as a branch. That said, this is bad enough to demonstrate the problem).
I guess this is because align_offset
gets passed a *const T
which itself may not be aligned to T
(for example, (1 as *const f32).align_offset(16)
can't be satisfied). That problem is avoided by going through u8 (as is done with simd_align2
), since that should always succeed (for non-const which isn't relevant here), and hits the stride == 1
special case added to fix #75579).
However, in this case the compiler should know that the pointer is aligned to 4, since it comes from a slice (and the pointer does end up with align(4)
in the LLVM). I think this means there's no actual reason that simd_align1
must be less efficient than simd_align2
(unless I'm missing something), and the issue is just that our align_offset
impl doesn't have a fast path for this situation. That said, maybe I'm wrong, and the generated code is bad for some other reason.
Either way, it would be very nice for this to do better for the slice use case -- as I mentioned, going through *const u8
is not viable when align_offset
is invoked from an API like align_to
(I suspect if we can't fix it for align_offset
, we probably could work around it in align_to
more directly, but it would be better to do it in align_offset
if possible. That said, if I'm wrong about why this is happening, all bets are certainly off).
See also #72356, which seems related (and my hunch is that it will be fixed if we fix this issue, at least in this case).
P.S. Sorry that this is a bit of a rambling issue.