Description
I came across this when examining a loop that runs slower than I expected. It involves explicit and implicit conversions between 8-bit and 32/64-bit values, and as I looked through the generated assembly using Godbolt compiler explorer, I found lots of movzx instructions that don't seem to break dependency or play a role in correctness, not to mention many use the same register like movzx eax al
, which cannot be eliminated.
I then tried some simple examples on Godbolt, and found that this behavior is persistent and easily reproducible, even when I specify -march=skylake
. Here's an example:
#include <stdint.h>
int add2bytes(uint8_t* a, uint8_t* b) {
return uint8_t(*a + *b);
}
Clang 14 -O3
add2bytes(unsigned char*, unsigned char*): # @add2bytes(unsigned char*, unsigned char*)
mov al, byte ptr [rsi]
add al, byte ptr [rdi]
movzx eax, al
ret
movzx
would be better in place of the mov
instead of being at the end, so that dependency on old RAX value can be broken from the start and also clearing the upper bits of RAX in the process.
I also asked this on Stack Overflow and [Peter Cordes] has a detailed response (https://stackoverflow.com/a/72953035/14730360) explaining how this behavior is bad for pretty much all X86 processors.
Godbolt link with code for examples: https://godbolt.org/z/z45xr4hq1
Here's one that's closer to what I was originally examining:
int foo(uint8_t* a, uint8_t i, uint8_t j) {
return a[a[i] | a[j]];
}
Clang 14 -O3
:
foo(unsigned char*, unsigned char, unsigned char): # @foo(unsigned char*, unsigned char, unsigned char)
mov eax, esi
mov ecx, edx
mov cl, byte ptr [rdi + rcx]
or cl, byte ptr [rdi + rax]
movzx eax, cl
movzx eax, byte ptr [rdi + rax]
ret
movzx eax, cl
here just seems unnecessary. The upper bits of RCX should already be clean as it is used as index in mov cl, byte ptr [rdi + rcx]
. The subsequent or
does not affect its upper bits, and the dependency of RCX on this or
is not something that movzx eax cl
can break. So I think it's better to just do movzx eax, byte ptr [rdi + rcx]
after the or
.