Closed
Description
This is an offshoot from #65205. Consider:
#include <immintrin.h>
__attribute__((target("avx512bw")))
static inline __m512i MM512_MASK_ADD_EPI8(__m512i src,
__mmask64 k,
__m512i a,
__m512i b) {
__asm__("vpaddb\t{%3, %2, %0 %{%1%}" : "+v"(src) : "Yk"(k), "v"(a), "v"(b));
return src;
}
__attribute__((target("avx512bw,avx512dq")))
__m512i G(__m512i src, __mmask64 k, __m512i a, __m512i b) {
return MM512_MASK_ADD_EPI8(src, k, a, b);
}
The credit for the testcase goes to @kalcutter.
clang refuses to inline:
$ clang -O2 -S target_feature.cc -o /dev/stdout | grep call
callq _ZL19MM512_MASK_ADD_EPI8Dv8_xyS_S_
We should be able to inline the callee into the caller because the caller is allowed to use a superset of the instruction set that the callee is allowed to use.