Closed
Description
given this C code
https://godbolt.org/z/WvfG8TTxf
#include <vecintrin.h>
#include <stdbool.h>
bool vectors_equal_builtin(vector int a, vector int b) {
return vec_all_eq(a, b);
}
typedef int vec4i __attribute__((vector_size(16)));
bool vectors_equal_manual(vec4i a, vec4i b) {
return __builtin_reduce_and(a == b);
}
The manual implementation fails to optimize to the builtin one.
vectors_equal_builtin:
vceqfs %v0, %v24, %v26
lghi %r2, 0
locghie %r2, 1
br %r14
vectors_equal_manual:
aghi %r15, -168
vceqf %v0, %v24, %v26
vno %v0, %v0, %v0
vlgvf %r1, %v0, 0
vlgvf %r0, %v0, 1
sll %r1, 3
rosbg %r1, %r0, 61, 61, 2
vlgvf %r0, %v0, 2
rosbg %r1, %r0, 62, 62, 1
vlgvf %r0, %v0, 3
rosbg %r1, %r0, 63, 63, 0
tmll %r1, 15
lghi %r2, 0
locghie %r2, 1
aghi %r15, 168
br %r14
define dso_local noundef zeroext i1 @vectors_equal_manual(<4 x i32> noundef %a, <4 x i32> noundef %b) local_unnamed_addr {
entry:
%0 = icmp ne <4 x i32> %a, %b
%1 = bitcast <4 x i1> %0 to i4
%2 = icmp eq i4 %1, 0
ret i1 %2
}
There are many varitions on vec_all_eq
(see https://www.ibm.com/docs/en/zos/2.4.0?topic=functions-any-predicates), and it would be neat if those all optimized. It might be possible to simplify clang's vecintrin.h
too.
This came up while implementing vec_all_eq
in the rust standard library, where fewer custom intrinsics are better in every way.
cc @uweigand (posted here so it can be linked to)