Description
I found something strange while looking at a Wasm binary compiled from C with the latest Clang. This function:
typedef struct { double x, y; } xy_t;
xy_t mul_xy(xy_t a, xy_t b)
{
a.x *= b.x;
a.y *= b.y;
return a;
}
compiles, using -msimd128 -Oz
, to this:
(the squares on the left are the bytes from the .wasm, the grey rectangle is their interpretation, the purple rectangle is their C decompilation and the blue rectangle is the stack height change)
or in WAT format:
mul_xy:
local.get 1
local.get 2
v128.load 0:p2align=3
local.get 1
v128.load 0:p2align=3
f64x2.mul
v128.store 0:p2align=3
local.get 0
local.get 1
v128.load 0:p2align=3
v128.store 0:p2align=3
end_function
In Wasm mul_xy
now has 3 arguments as pointers, local0
points to where the result is actually wanted, local1
points to vector a
and local2
points to vector b
. The problem is that whereas you'd expect the f64x2 multiplication to be stored directly at local0
, it's instead first stored at local1
, then the vector at local1
is reloaded to be stored at local0
. So we needlessly do some extra memory operations plus we force the caller to handle how the value of a
will be modified when that's not what we want (although it seems that the values get copied to the stack regardless).
If I modify the C function to this:
xy_t mul_xy(xy_t a, xy_t b)
{
xy_t c;
c.x = a.x * b.x;
c.y = a.y * b.y;
return c;
}
then it compiles to the correct expected Wasm:
mul_xy:
local.get 0
local.get 1
v128.load 0:p2align=3
local.get 2
v128.load 0:p2align=3
f64x2.mul
v128.store 0:p2align=3
end_function
So there's something wrong with how the compiler doesn't realise that the two versions of the C function are functionally the same.