-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[Clang][AArch64] Add fp8 variants for untyped NEON intrinsics #128019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang-codegen Author: None (Lukacma) Changes
Patch is 7.06 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/128019.diff 53 Files Affected:
diff --git a/clang/include/clang/Basic/TargetBuiltins.h b/clang/include/clang/Basic/TargetBuiltins.h
index 4781054240b5b..c1ba65064f159 100644
--- a/clang/include/clang/Basic/TargetBuiltins.h
+++ b/clang/include/clang/Basic/TargetBuiltins.h
@@ -263,6 +263,10 @@ namespace clang {
EltType ET = getEltType();
return ET == Poly8 || ET == Poly16 || ET == Poly64;
}
+ bool isFloatingPoint() const {
+ EltType ET = getEltType();
+ return ET == Float16 || ET == Float32 || ET == Float64 || ET == BFloat16;
+ }
bool isUnsigned() const { return (Flags & UnsignedFlag) != 0; }
bool isQuad() const { return (Flags & QuadFlag) != 0; }
unsigned getEltSizeInBits() const {
diff --git a/clang/include/clang/Basic/arm_neon.td b/clang/include/clang/Basic/arm_neon.td
index 3e73dd054933f..90f0e90e4a7f8 100644
--- a/clang/include/clang/Basic/arm_neon.td
+++ b/clang/include/clang/Basic/arm_neon.td
@@ -31,8 +31,8 @@ def OP_MLAL : Op<(op "+", $p0, (call "vmull", $p1, $p2))>;
def OP_MULLHi : Op<(call "vmull", (call "vget_high", $p0),
(call "vget_high", $p1))>;
def OP_MULLHi_P64 : Op<(call "vmull",
- (cast "poly64_t", (call "vget_high", $p0)),
- (cast "poly64_t", (call "vget_high", $p1)))>;
+ (bitcast "poly64_t", (call "vget_high", $p0)),
+ (bitcast "poly64_t", (call "vget_high", $p1)))>;
def OP_MULLHi_N : Op<(call "vmull_n", (call "vget_high", $p0), $p1)>;
def OP_MLALHi : Op<(call "vmlal", $p0, (call "vget_high", $p1),
(call "vget_high", $p2))>;
@@ -95,11 +95,11 @@ def OP_TRN2 : Op<(shuffle $p0, $p1, (interleave
def OP_ZIP2 : Op<(shuffle $p0, $p1, (highhalf (interleave mask0, mask1)))>;
def OP_UZP2 : Op<(shuffle $p0, $p1, (add (decimate (rotl mask0, 1), 2),
(decimate (rotl mask1, 1), 2)))>;
-def OP_EQ : Op<(cast "R", (op "==", $p0, $p1))>;
-def OP_GE : Op<(cast "R", (op ">=", $p0, $p1))>;
-def OP_LE : Op<(cast "R", (op "<=", $p0, $p1))>;
-def OP_GT : Op<(cast "R", (op ">", $p0, $p1))>;
-def OP_LT : Op<(cast "R", (op "<", $p0, $p1))>;
+def OP_EQ : Op<(bitcast "R", (op "==", $p0, $p1))>;
+def OP_GE : Op<(bitcast "R", (op ">=", $p0, $p1))>;
+def OP_LE : Op<(bitcast "R", (op "<=", $p0, $p1))>;
+def OP_GT : Op<(bitcast "R", (op ">", $p0, $p1))>;
+def OP_LT : Op<(bitcast "R", (op "<", $p0, $p1))>;
def OP_NEG : Op<(op "-", $p0)>;
def OP_NOT : Op<(op "~", $p0)>;
def OP_AND : Op<(op "&", $p0, $p1)>;
@@ -108,20 +108,20 @@ def OP_XOR : Op<(op "^", $p0, $p1)>;
def OP_ANDN : Op<(op "&", $p0, (op "~", $p1))>;
def OP_ORN : Op<(op "|", $p0, (op "~", $p1))>;
def OP_CAST : LOp<[(save_temp $promote, $p0),
- (cast "R", $promote)]>;
+ (bitcast "R", $promote)]>;
def OP_HI : Op<(shuffle $p0, $p0, (highhalf mask0))>;
def OP_LO : Op<(shuffle $p0, $p0, (lowhalf mask0))>;
def OP_CONC : Op<(shuffle $p0, $p1, (add mask0, mask1))>;
def OP_DUP : Op<(dup $p0)>;
def OP_DUP_LN : Op<(call_mangled "splat_lane", $p0, $p1)>;
-def OP_SEL : Op<(cast "R", (op "|",
- (op "&", $p0, (cast $p0, $p1)),
- (op "&", (op "~", $p0), (cast $p0, $p2))))>;
+def OP_SEL : Op<(bitcast "R", (op "|",
+ (op "&", $p0, (bitcast $p0, $p1)),
+ (op "&", (op "~", $p0), (bitcast $p0, $p2))))>;
def OP_REV16 : Op<(shuffle $p0, $p0, (rev 16, mask0))>;
def OP_REV32 : Op<(shuffle $p0, $p0, (rev 32, mask0))>;
def OP_REV64 : Op<(shuffle $p0, $p0, (rev 64, mask0))>;
def OP_XTN : Op<(call "vcombine", $p0, (call "vmovn", $p1))>;
-def OP_SQXTUN : Op<(call "vcombine", (cast $p0, "U", $p0),
+def OP_SQXTUN : Op<(call "vcombine", (bitcast $p0, "U", $p0),
(call "vqmovun", $p1))>;
def OP_QXTN : Op<(call "vcombine", $p0, (call "vqmovn", $p1))>;
def OP_VCVT_NA_HI_F16 : Op<(call "vcombine", $p0, (call "vcvt_f16_f32", $p1))>;
@@ -129,12 +129,12 @@ def OP_VCVT_NA_HI_F32 : Op<(call "vcombine", $p0, (call "vcvt_f32_f64", $p1))>;
def OP_VCVT_EX_HI_F32 : Op<(call "vcvt_f32_f16", (call "vget_high", $p0))>;
def OP_VCVT_EX_HI_F64 : Op<(call "vcvt_f64_f32", (call "vget_high", $p0))>;
def OP_VCVTX_HI : Op<(call "vcombine", $p0, (call "vcvtx_f32", $p1))>;
-def OP_REINT : Op<(cast "R", $p0)>;
+def OP_REINT : Op<(bitcast "R", $p0)>;
def OP_ADDHNHi : Op<(call "vcombine", $p0, (call "vaddhn", $p1, $p2))>;
def OP_RADDHNHi : Op<(call "vcombine", $p0, (call "vraddhn", $p1, $p2))>;
def OP_SUBHNHi : Op<(call "vcombine", $p0, (call "vsubhn", $p1, $p2))>;
def OP_RSUBHNHi : Op<(call "vcombine", $p0, (call "vrsubhn", $p1, $p2))>;
-def OP_ABDL : Op<(cast "R", (call "vmovl", (cast $p0, "U",
+def OP_ABDL : Op<(bitcast "R", (call "vmovl", (bitcast $p0, "U",
(call "vabd", $p0, $p1))))>;
def OP_ABDLHi : Op<(call "vabdl", (call "vget_high", $p0),
(call "vget_high", $p1))>;
@@ -152,15 +152,15 @@ def OP_QDMLSLHi : Op<(call "vqdmlsl", $p0, (call "vget_high", $p1),
(call "vget_high", $p2))>;
def OP_QDMLSLHi_N : Op<(call "vqdmlsl_n", $p0, (call "vget_high", $p1), $p2)>;
def OP_DIV : Op<(op "/", $p0, $p1)>;
-def OP_LONG_HI : Op<(cast "R", (call (name_replace "_high_", "_"),
+def OP_LONG_HI : Op<(bitcast "R", (call (name_replace "_high_", "_"),
(call "vget_high", $p0), $p1))>;
-def OP_NARROW_HI : Op<(cast "R", (call "vcombine",
- (cast "R", "H", $p0),
- (cast "R", "H",
+def OP_NARROW_HI : Op<(bitcast "R", (call "vcombine",
+ (bitcast "R", "H", $p0),
+ (bitcast "R", "H",
(call (name_replace "_high_", "_"),
$p1, $p2))))>;
def OP_MOVL_HI : LOp<[(save_temp $a1, (call "vget_high", $p0)),
- (cast "R",
+ (bitcast "R",
(call "vshll_n", $a1, (literal "int32_t", "0")))]>;
def OP_COPY_LN : Op<(call "vset_lane", (call "vget_lane", $p2, $p3), $p0, $p1)>;
def OP_SCALAR_MUL_LN : Op<(op "*", $p0, (call "vget_lane", $p1, $p2))>;
@@ -221,18 +221,18 @@ def OP_FMLSL_LN_Hi : Op<(call "vfmlsl_high", $p0, $p1,
def OP_USDOT_LN
: Op<(call "vusdot", $p0, $p1,
- (cast "8", "S", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)))>;
+ (bitcast "8", "S", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)))>;
def OP_USDOT_LNQ
: Op<(call "vusdot", $p0, $p1,
- (cast "8", "S", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)))>;
+ (bitcast "8", "S", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)))>;
// sudot splats the second vector and then calls vusdot
def OP_SUDOT_LN
: Op<(call "vusdot", $p0,
- (cast "8", "U", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)), $p1)>;
+ (bitcast "8", "U", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)), $p1)>;
def OP_SUDOT_LNQ
: Op<(call "vusdot", $p0,
- (cast "8", "U", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)), $p1)>;
+ (bitcast "8", "U", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)), $p1)>;
def OP_BFDOT_LN
: Op<(call "vbfdot", $p0, $p1,
@@ -263,7 +263,7 @@ def OP_VCVT_BF16_F32_A32
: Op<(call "__a32_vcvt_bf16", $p0)>;
def OP_VCVT_BF16_F32_LO_A32
- : Op<(call "vcombine", (cast "bfloat16x4_t", (literal "uint64_t", "0ULL")),
+ : Op<(call "vcombine", (bitcast "bfloat16x4_t", (literal "uint64_t", "0ULL")),
(call "__a32_vcvt_bf16", $p0))>;
def OP_VCVT_BF16_F32_HI_A32
: Op<(call "vcombine", (call "__a32_vcvt_bf16", $p1),
@@ -924,12 +924,12 @@ def CFMLE : SOpInst<"vcle", "U..", "lUldQdQlQUl", OP_LE>;
def CFMGT : SOpInst<"vcgt", "U..", "lUldQdQlQUl", OP_GT>;
def CFMLT : SOpInst<"vclt", "U..", "lUldQdQlQUl", OP_LT>;
-def CMEQ : SInst<"vceqz", "U.",
+def CMEQ : SInst<"vceqz", "U(.!)",
"csilfUcUsUiUlPcPlQcQsQiQlQfQUcQUsQUiQUlQPcdQdQPl">;
-def CMGE : SInst<"vcgez", "U.", "csilfdQcQsQiQlQfQd">;
-def CMLE : SInst<"vclez", "U.", "csilfdQcQsQiQlQfQd">;
-def CMGT : SInst<"vcgtz", "U.", "csilfdQcQsQiQlQfQd">;
-def CMLT : SInst<"vcltz", "U.", "csilfdQcQsQiQlQfQd">;
+def CMGE : SInst<"vcgez", "U(.!)", "csilfdQcQsQiQlQfQd">;
+def CMLE : SInst<"vclez", "U(.!)", "csilfdQcQsQiQlQfQd">;
+def CMGT : SInst<"vcgtz", "U(.!)", "csilfdQcQsQiQlQfQd">;
+def CMLT : SInst<"vcltz", "U(.!)", "csilfdQcQsQiQlQfQd">;
////////////////////////////////////////////////////////////////////////////////
// Max/Min Integer
@@ -1667,11 +1667,11 @@ let TargetGuard = "fullfp16,neon" in {
// ARMv8.2-A FP16 one-operand vector intrinsics.
// Comparison
- def CMEQH : SInst<"vceqz", "U.", "hQh">;
- def CMGEH : SInst<"vcgez", "U.", "hQh">;
- def CMGTH : SInst<"vcgtz", "U.", "hQh">;
- def CMLEH : SInst<"vclez", "U.", "hQh">;
- def CMLTH : SInst<"vcltz", "U.", "hQh">;
+ def CMEQH : SInst<"vceqz", "U(.!)", "hQh">;
+ def CMGEH : SInst<"vcgez", "U(.!)", "hQh">;
+ def CMGTH : SInst<"vcgtz", "U(.!)", "hQh">;
+ def CMLEH : SInst<"vclez", "U(.!)", "hQh">;
+ def CMLTH : SInst<"vcltz", "U(.!)", "hQh">;
// Vector conversion
def VCVT_F16 : SInst<"vcvt_f16", "F(.!)", "sUsQsQUs">;
@@ -2090,17 +2090,17 @@ let ArchGuard = "defined(__aarch64__) || defined(__arm64ec__)", TargetGuard = "r
// Lookup table read with 2-bit/4-bit indices
let ArchGuard = "defined(__aarch64__)", TargetGuard = "lut" in {
- def VLUTI2_B : SInst<"vluti2_lane", "Q.(qU)I", "cUcPcQcQUcQPc",
+ def VLUTI2_B : SInst<"vluti2_lane", "Q.(qU)I", "cUcPcmQcQUcQPcQm",
[ImmCheck<2, ImmCheck0_1>]>;
- def VLUTI2_B_Q : SInst<"vluti2_laneq", "Q.(QU)I", "cUcPcQcQUcQPc",
+ def VLUTI2_B_Q : SInst<"vluti2_laneq", "Q.(QU)I", "cUcPcmQcQUcQPcQm",
[ImmCheck<2, ImmCheck0_3>]>;
def VLUTI2_H : SInst<"vluti2_lane", "Q.(<qU)I", "sUsPshQsQUsQPsQh",
[ImmCheck<2, ImmCheck0_3>]>;
def VLUTI2_H_Q : SInst<"vluti2_laneq", "Q.(<QU)I", "sUsPshQsQUsQPsQh",
[ImmCheck<2, ImmCheck0_7>]>;
- def VLUTI4_B : SInst<"vluti4_lane", "..(qU)I", "QcQUcQPc",
+ def VLUTI4_B : SInst<"vluti4_lane", "..(qU)I", "QcQUcQPcQm",
[ImmCheck<2, ImmCheck0_0>]>;
- def VLUTI4_B_Q : SInst<"vluti4_laneq", "..UI", "QcQUcQPc",
+ def VLUTI4_B_Q : SInst<"vluti4_laneq", "..UI", "QcQUcQPcQm",
[ImmCheck<2, ImmCheck0_1>]>;
def VLUTI4_H_X2 : SInst<"vluti4_lane_x2", ".2(<qU)I", "QsQUsQPsQh",
[ImmCheck<3, ImmCheck0_1>]>;
@@ -2194,4 +2194,70 @@ let ArchGuard = "defined(__aarch64__)", TargetGuard = "fp8,neon" in {
// fscale
def FSCALE_V128 : WInst<"vscale", "..(.S)", "QdQfQh">;
def FSCALE_V64 : WInst<"vscale", "(.q)(.q)(.qS)", "fh">;
+}
+
+//FP8 versions of untyped intrinsics
+let ArchGuard = "defined(__aarch64__)" in {
+ def VGET_LANE_MF8 : IInst<"vget_lane", "1.I", "mQm", [ImmCheck<1, ImmCheckLaneIndex, 0>]>;
+ def SPLAT_MF8 : WInst<"splat_lane", ".(!q)I", "mQm", [ImmCheck<1, ImmCheckLaneIndex, 0>]>;
+ def SPLATQ_MF8 : WInst<"splat_laneq", ".(!Q)I", "mQm", [ImmCheck<1, ImmCheckLaneIndex, 0>]>;
+ def VSET_LANE_MF8 : IInst<"vset_lane", ".1.I", "mQm", [ImmCheck<2, ImmCheckLaneIndex, 1>]>;
+ def VCREATE_MF8 : NoTestOpInst<"vcreate", ".(IU>)", "m", OP_CAST> { let BigEndianSafe = 1; }
+ let InstName = "vmov" in {
+ def VDUP_N_MF8 : WOpInst<"vdup_n", ".1", "mQm", OP_DUP>;
+ def VMOV_N_MF8 : WOpInst<"vmov_n", ".1", "mQm", OP_DUP>;
+ }
+ let InstName = "" in
+ def VDUP_LANE_MF8: WOpInst<"vdup_lane", ".qI", "mQm", OP_DUP_LN>;
+ def VCOMBINE_MF8 : NoTestOpInst<"vcombine", "Q..", "m", OP_CONC>;
+ let InstName = "vmov" in {
+ def VGET_HIGH_MF8 : NoTestOpInst<"vget_high", ".Q", "m", OP_HI>;
+ def VGET_LOW_MF8 : NoTestOpInst<"vget_low", ".Q", "m", OP_LO>;
+ }
+ let InstName = "vtbl" in {
+ def VTBL1_MF8 : WInst<"vtbl1", "..p", "m">;
+ def VTBL2_MF8 : WInst<"vtbl2", ".2p", "m">;
+ def VTBL3_MF8 : WInst<"vtbl3", ".3p", "m">;
+ def VTBL4_MF8 : WInst<"vtbl4", ".4p", "m">;
+ }
+ let InstName = "vtbx" in {
+ def VTBX1_MF8 : WInst<"vtbx1", "...p", "m">;
+ def VTBX2_MF8 : WInst<"vtbx2", "..2p", "m">;
+ def VTBX3_MF8 : WInst<"vtbx3", "..3p", "m">;
+ def VTBX4_MF8 : WInst<"vtbx4", "..4p", "m">;
+ }
+ def VEXT_MF8 : WInst<"vext", "...I", "mQm", [ImmCheck<2, ImmCheckLaneIndex, 0>]>;
+ def VREV64_MF8 : WOpInst<"vrev64", "..", "mQm", OP_REV64>;
+ def VREV32_MF8 : WOpInst<"vrev32", "..", "mQm", OP_REV32>;
+ def VREV16_MF8 : WOpInst<"vrev16", "..", "mQm", OP_REV16>;
+ let isHiddenLInst = 1 in
+ def VBSL_MF8 : SInst<"vbsl", ".U..", "mQm">;
+ def VTRN_MF8 : WInst<"vtrn", "2..", "mQm">;
+ def VZIP_MF8 : WInst<"vzip", "2..", "mQm">;
+ def VUZP_MF8 : WInst<"vuzp", "2..", "mQm">;
+ def COPY_LANE_MF8 : IOpInst<"vcopy_lane", "..I.I", "m", OP_COPY_LN>;
+ def COPYQ_LANE_MF8 : IOpInst<"vcopy_lane", "..IqI", "Qm", OP_COPY_LN>;
+ def COPY_LANEQ_MF8 : IOpInst<"vcopy_laneq", "..IQI", "m", OP_COPY_LN>;
+ def COPYQ_LANEQ_MF8 : IOpInst<"vcopy_laneq", "..I.I", "Qm", OP_COPY_LN>;
+ def VDUP_LANE2_MF8 : WOpInst<"vdup_laneq", ".QI", "mQm", OP_DUP_LN>;
+ def VTRN1_MF8 : SOpInst<"vtrn1", "...", "mQm", OP_TRN1>;
+ def VZIP1_MF8 : SOpInst<"vzip1", "...", "mQm", OP_ZIP1>;
+ def VUZP1_MF8 : SOpInst<"vuzp1", "...", "mQm", OP_UZP1>;
+ def VTRN2_MF8 : SOpInst<"vtrn2", "...", "mQm", OP_TRN2>;
+ def VZIP2_MF8 : SOpInst<"vzip2", "...", "mQm", OP_ZIP2>;
+ def VUZP2_MF8 : SOpInst<"vuzp2", "...", "mQm", OP_UZP2>;
+ let InstName = "vtbl" in {
+ def VQTBL1_A64_MF8 : WInst<"vqtbl1", ".QU", "mQm">;
+ def VQTBL2_A64_MF8 : WInst<"vqtbl2", ".(2Q)U", "mQm">;
+ def VQTBL3_A64_MF8 : WInst<"vqtbl3", ".(3Q)U", "mQm">;
+ def VQTBL4_A64_MF8 : WInst<"vqtbl4", ".(4Q)U", "mQm">;
+ }
+ let InstName = "vtbx" in {
+ def VQTBX1_A64_MF8 : WInst<"vqtbx1", "..QU", "mQm">;
+ def VQTBX2_A64_MF8 : WInst<"vqtbx2", "..(2Q)U", "mQm">;
+ def VQTBX3_A64_MF8 : WInst<"vqtbx3", "..(3Q)U", "mQm">;
+ def VQTBX4_A64_MF8 : WInst<"vqtbx4", "..(4Q)U", "mQm">;
+ }
+ def SCALAR_VDUP_LANE_MF8 : IInst<"vdup_lane", "1.I", "Sm", [ImmCheck<1, ImmCheckLaneIndex, 0>]>;
+ def SCALAR_VDUP_LANEQ_MF8 : IInst<"vdup_laneq", "1QI", "Sm", [ImmCheck<1, ImmCheckLaneIndex, 0>]>;
}
\ No newline at end of file
diff --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp
index 5c6ca4c9ee4de..655abd0ecdb50 100644
--- a/clang/lib/AST/ExprConstant.cpp
+++ b/clang/lib/AST/ExprConstant.cpp
@@ -11172,6 +11172,11 @@ VectorExprEvaluator::VisitInitListExpr(const InitListExpr *E) {
QualType EltTy = VT->getElementType();
SmallVector<APValue, 4> Elements;
+ // MFloat8 type doesn't have constants and thus constant folding
+ // is impossible.
+ if (EltTy->isMFloat8Type())
+ return false;
+
// The number of initializers can be less than the number of
// vector elements. For OpenCL, this can be due to nested vector
// initialization. For GCC compatibility, missing trailing elements
diff --git a/clang/lib/AST/Type.cpp b/clang/lib/AST/Type.cpp
index 8c11ec2e1fe24..2f7a3a5688973 100644
--- a/clang/lib/AST/Type.cpp
+++ b/clang/lib/AST/Type.cpp
@@ -2777,6 +2777,11 @@ static bool isTriviallyCopyableTypeImpl(const QualType &type,
if (CanonicalType->isScalarType() || CanonicalType->isVectorType())
return true;
+ // Mfloat8 type is a special case as it not scalar, but is still trivially
+ // copyable.
+ if (CanonicalType->isMFloat8Type())
+ return true;
+
if (const auto *RT = CanonicalType->getAs<RecordType>()) {
if (const auto *ClassDecl = dyn_cast<CXXRecordDecl>(RT->getDecl())) {
if (IsCopyConstructible) {
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 361e4c4bf2e2e..03062f01907d1 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -8189,8 +8189,9 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
// Determine the type of this overloaded NEON intrinsic.
NeonTypeFlags Type(NeonTypeConst->getZExtValue());
- bool Usgn = Type.isUnsigned();
- bool Quad = Type.isQuad();
+ const bool Usgn = Type.isUnsigned();
+ const bool Quad = Type.isQuad();
+ const bool Floating = Type.isFloatingPoint();
const bool HasLegalHalfType = getTarget().hasLegalHalfType();
const bool AllowBFloatArgsAndRet =
getTargetHooks().getABIInfo().allowBFloatArgsAndRet();
@@ -8291,24 +8292,28 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
}
case NEON::BI__builtin_neon_vceqz_v:
case NEON::BI__builtin_neon_vceqzq_v:
- return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OEQ,
- ICmpInst::ICMP_EQ, "vceqz");
+ return EmitAArch64CompareBuiltinExpr(
+ Ops[0], Ty, Floating ? ICmpInst::FCMP_OEQ : ICmpInst::ICMP_EQ, "vceqz");
case NEON::BI__builtin_neon_vcgez_v:
case NEON::BI__builtin_neon_vcgezq_v:
- return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OGE,
- ICmpInst::ICMP_SGE, "vcgez");
+ return EmitAArch64CompareBuiltinExpr(
+ Ops[0], Ty, Floating ? ICmpInst::FCMP_OGE : ICmpInst::ICMP_SGE,
+ "vcgez");
case NEON::BI__builtin_neon_vclez_v:
case NEON::BI__builtin_neon_vclezq_v:
- return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OLE,
- ICmpInst::ICMP_SLE, "vclez");
+ return EmitAArch64CompareBuiltinExpr(
+ Ops[0], Ty, Floating ? ICmpInst::FCMP_OLE : ICmpInst::ICMP_SLE,
+ "vclez");
case NEON::BI__builtin_neon_vcgtz_v:
case NEON::BI__builtin_neon_vcgtzq_v:
- return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OGT,
- ICmpInst::ICMP_SGT, "vcgtz");
+ return EmitAArch64CompareBuiltinExpr(
+ Ops[0], Ty, Floating ? ICmpInst::FCMP_OGT : ICmpInst::ICMP_SGT,
+ "vcgtz");
case NEON::BI__builtin_neon_vcltz_v:
case NEON::BI__builtin_neon_vcltzq_v:
- return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OLT,
- ICmpInst::ICMP_SLT, "vcltz");
+ return EmitAArch64CompareBuiltinExpr(
+ Ops[0], Ty, Floating ? ICmpInst::FCMP_OLT : ICmpInst::ICMP_SLT,
+ "vcltz");
case NEON::BI__builtin_neon_vclz_v:
case NEON::BI__builtin_neon_vclzq_v:
// We generate target-independent intrinsic, which needs a second argument
@@ -8871,28 +8876,32 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
return Builder.CreateBitCast(Result, ResultType, NameHint);
}
-Value *CodeGenFunction::EmitAArch64CompareBuiltinExpr(
- Value *Op, llvm::Type *Ty, const CmpInst::Predicate Fp,
- const CmpInst::Predicate Ip, const Twine &Name) {
- llvm::Type *OTy = Op->getType();
-
- // FIXME: this is utterly horrific. We should not be looking at previous
- // codegen context to find out what needs doing. Unfortunately TableGen
- // currently gives us exactly the same calls for vceqz_f32 and vceqz_s32
- // (etc).
- if (BitCastInst *BI = dyn_cast<BitCastInst>(Op))
- OTy = BI->getOperand(0)->getType();
-
- Op = Builder.CreateBitCast(Op, OTy);
- if (OTy->getScalarType()->isFloatingPointTy()) {
- if (Fp == CmpInst::FCMP_OEQ)
- Op = Builder.CreateFCmp(Fp, Op, Constant::getNullValue(OTy));
+Value *
+CodeGenFunction::EmitAArch64CompareBuiltinExpr(Value *Op, llvm::Type *Ty,
+ const CmpInst::Predicate Pred,
+ const Twine &Name) {
+
+ if (isa<FixedVectorType>(Ty)) {
+ // Vector types are cast to i8 vectors. Recover original type.
+ Op = Builder.CreateBitCast(Op, Ty);
+ }
+
+ if (CmpInst::isFPPredicate(Pred)) {
+...
[truncated]
|
@llvm/pr-subscribers-backend-aarch64 Author: None (Lukacma) Changes
Patch is 7.06 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/128019.diff 53 Files Affected:
diff --git a/clang/include/clang/Basic/TargetBuiltins.h b/clang/include/clang/Basic/TargetBuiltins.h
index 4781054240b5b..c1ba65064f159 100644
--- a/clang/include/clang/Basic/TargetBuiltins.h
+++ b/clang/include/clang/Basic/TargetBuiltins.h
@@ -263,6 +263,10 @@ namespace clang {
EltType ET = getEltType();
return ET == Poly8 || ET == Poly16 || ET == Poly64;
}
+ bool isFloatingPoint() const {
+ EltType ET = getEltType();
+ return ET == Float16 || ET == Float32 || ET == Float64 || ET == BFloat16;
+ }
bool isUnsigned() const { return (Flags & UnsignedFlag) != 0; }
bool isQuad() const { return (Flags & QuadFlag) != 0; }
unsigned getEltSizeInBits() const {
diff --git a/clang/include/clang/Basic/arm_neon.td b/clang/include/clang/Basic/arm_neon.td
index 3e73dd054933f..90f0e90e4a7f8 100644
--- a/clang/include/clang/Basic/arm_neon.td
+++ b/clang/include/clang/Basic/arm_neon.td
@@ -31,8 +31,8 @@ def OP_MLAL : Op<(op "+", $p0, (call "vmull", $p1, $p2))>;
def OP_MULLHi : Op<(call "vmull", (call "vget_high", $p0),
(call "vget_high", $p1))>;
def OP_MULLHi_P64 : Op<(call "vmull",
- (cast "poly64_t", (call "vget_high", $p0)),
- (cast "poly64_t", (call "vget_high", $p1)))>;
+ (bitcast "poly64_t", (call "vget_high", $p0)),
+ (bitcast "poly64_t", (call "vget_high", $p1)))>;
def OP_MULLHi_N : Op<(call "vmull_n", (call "vget_high", $p0), $p1)>;
def OP_MLALHi : Op<(call "vmlal", $p0, (call "vget_high", $p1),
(call "vget_high", $p2))>;
@@ -95,11 +95,11 @@ def OP_TRN2 : Op<(shuffle $p0, $p1, (interleave
def OP_ZIP2 : Op<(shuffle $p0, $p1, (highhalf (interleave mask0, mask1)))>;
def OP_UZP2 : Op<(shuffle $p0, $p1, (add (decimate (rotl mask0, 1), 2),
(decimate (rotl mask1, 1), 2)))>;
-def OP_EQ : Op<(cast "R", (op "==", $p0, $p1))>;
-def OP_GE : Op<(cast "R", (op ">=", $p0, $p1))>;
-def OP_LE : Op<(cast "R", (op "<=", $p0, $p1))>;
-def OP_GT : Op<(cast "R", (op ">", $p0, $p1))>;
-def OP_LT : Op<(cast "R", (op "<", $p0, $p1))>;
+def OP_EQ : Op<(bitcast "R", (op "==", $p0, $p1))>;
+def OP_GE : Op<(bitcast "R", (op ">=", $p0, $p1))>;
+def OP_LE : Op<(bitcast "R", (op "<=", $p0, $p1))>;
+def OP_GT : Op<(bitcast "R", (op ">", $p0, $p1))>;
+def OP_LT : Op<(bitcast "R", (op "<", $p0, $p1))>;
def OP_NEG : Op<(op "-", $p0)>;
def OP_NOT : Op<(op "~", $p0)>;
def OP_AND : Op<(op "&", $p0, $p1)>;
@@ -108,20 +108,20 @@ def OP_XOR : Op<(op "^", $p0, $p1)>;
def OP_ANDN : Op<(op "&", $p0, (op "~", $p1))>;
def OP_ORN : Op<(op "|", $p0, (op "~", $p1))>;
def OP_CAST : LOp<[(save_temp $promote, $p0),
- (cast "R", $promote)]>;
+ (bitcast "R", $promote)]>;
def OP_HI : Op<(shuffle $p0, $p0, (highhalf mask0))>;
def OP_LO : Op<(shuffle $p0, $p0, (lowhalf mask0))>;
def OP_CONC : Op<(shuffle $p0, $p1, (add mask0, mask1))>;
def OP_DUP : Op<(dup $p0)>;
def OP_DUP_LN : Op<(call_mangled "splat_lane", $p0, $p1)>;
-def OP_SEL : Op<(cast "R", (op "|",
- (op "&", $p0, (cast $p0, $p1)),
- (op "&", (op "~", $p0), (cast $p0, $p2))))>;
+def OP_SEL : Op<(bitcast "R", (op "|",
+ (op "&", $p0, (bitcast $p0, $p1)),
+ (op "&", (op "~", $p0), (bitcast $p0, $p2))))>;
def OP_REV16 : Op<(shuffle $p0, $p0, (rev 16, mask0))>;
def OP_REV32 : Op<(shuffle $p0, $p0, (rev 32, mask0))>;
def OP_REV64 : Op<(shuffle $p0, $p0, (rev 64, mask0))>;
def OP_XTN : Op<(call "vcombine", $p0, (call "vmovn", $p1))>;
-def OP_SQXTUN : Op<(call "vcombine", (cast $p0, "U", $p0),
+def OP_SQXTUN : Op<(call "vcombine", (bitcast $p0, "U", $p0),
(call "vqmovun", $p1))>;
def OP_QXTN : Op<(call "vcombine", $p0, (call "vqmovn", $p1))>;
def OP_VCVT_NA_HI_F16 : Op<(call "vcombine", $p0, (call "vcvt_f16_f32", $p1))>;
@@ -129,12 +129,12 @@ def OP_VCVT_NA_HI_F32 : Op<(call "vcombine", $p0, (call "vcvt_f32_f64", $p1))>;
def OP_VCVT_EX_HI_F32 : Op<(call "vcvt_f32_f16", (call "vget_high", $p0))>;
def OP_VCVT_EX_HI_F64 : Op<(call "vcvt_f64_f32", (call "vget_high", $p0))>;
def OP_VCVTX_HI : Op<(call "vcombine", $p0, (call "vcvtx_f32", $p1))>;
-def OP_REINT : Op<(cast "R", $p0)>;
+def OP_REINT : Op<(bitcast "R", $p0)>;
def OP_ADDHNHi : Op<(call "vcombine", $p0, (call "vaddhn", $p1, $p2))>;
def OP_RADDHNHi : Op<(call "vcombine", $p0, (call "vraddhn", $p1, $p2))>;
def OP_SUBHNHi : Op<(call "vcombine", $p0, (call "vsubhn", $p1, $p2))>;
def OP_RSUBHNHi : Op<(call "vcombine", $p0, (call "vrsubhn", $p1, $p2))>;
-def OP_ABDL : Op<(cast "R", (call "vmovl", (cast $p0, "U",
+def OP_ABDL : Op<(bitcast "R", (call "vmovl", (bitcast $p0, "U",
(call "vabd", $p0, $p1))))>;
def OP_ABDLHi : Op<(call "vabdl", (call "vget_high", $p0),
(call "vget_high", $p1))>;
@@ -152,15 +152,15 @@ def OP_QDMLSLHi : Op<(call "vqdmlsl", $p0, (call "vget_high", $p1),
(call "vget_high", $p2))>;
def OP_QDMLSLHi_N : Op<(call "vqdmlsl_n", $p0, (call "vget_high", $p1), $p2)>;
def OP_DIV : Op<(op "/", $p0, $p1)>;
-def OP_LONG_HI : Op<(cast "R", (call (name_replace "_high_", "_"),
+def OP_LONG_HI : Op<(bitcast "R", (call (name_replace "_high_", "_"),
(call "vget_high", $p0), $p1))>;
-def OP_NARROW_HI : Op<(cast "R", (call "vcombine",
- (cast "R", "H", $p0),
- (cast "R", "H",
+def OP_NARROW_HI : Op<(bitcast "R", (call "vcombine",
+ (bitcast "R", "H", $p0),
+ (bitcast "R", "H",
(call (name_replace "_high_", "_"),
$p1, $p2))))>;
def OP_MOVL_HI : LOp<[(save_temp $a1, (call "vget_high", $p0)),
- (cast "R",
+ (bitcast "R",
(call "vshll_n", $a1, (literal "int32_t", "0")))]>;
def OP_COPY_LN : Op<(call "vset_lane", (call "vget_lane", $p2, $p3), $p0, $p1)>;
def OP_SCALAR_MUL_LN : Op<(op "*", $p0, (call "vget_lane", $p1, $p2))>;
@@ -221,18 +221,18 @@ def OP_FMLSL_LN_Hi : Op<(call "vfmlsl_high", $p0, $p1,
def OP_USDOT_LN
: Op<(call "vusdot", $p0, $p1,
- (cast "8", "S", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)))>;
+ (bitcast "8", "S", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)))>;
def OP_USDOT_LNQ
: Op<(call "vusdot", $p0, $p1,
- (cast "8", "S", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)))>;
+ (bitcast "8", "S", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)))>;
// sudot splats the second vector and then calls vusdot
def OP_SUDOT_LN
: Op<(call "vusdot", $p0,
- (cast "8", "U", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)), $p1)>;
+ (bitcast "8", "U", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)), $p1)>;
def OP_SUDOT_LNQ
: Op<(call "vusdot", $p0,
- (cast "8", "U", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)), $p1)>;
+ (bitcast "8", "U", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)), $p1)>;
def OP_BFDOT_LN
: Op<(call "vbfdot", $p0, $p1,
@@ -263,7 +263,7 @@ def OP_VCVT_BF16_F32_A32
: Op<(call "__a32_vcvt_bf16", $p0)>;
def OP_VCVT_BF16_F32_LO_A32
- : Op<(call "vcombine", (cast "bfloat16x4_t", (literal "uint64_t", "0ULL")),
+ : Op<(call "vcombine", (bitcast "bfloat16x4_t", (literal "uint64_t", "0ULL")),
(call "__a32_vcvt_bf16", $p0))>;
def OP_VCVT_BF16_F32_HI_A32
: Op<(call "vcombine", (call "__a32_vcvt_bf16", $p1),
@@ -924,12 +924,12 @@ def CFMLE : SOpInst<"vcle", "U..", "lUldQdQlQUl", OP_LE>;
def CFMGT : SOpInst<"vcgt", "U..", "lUldQdQlQUl", OP_GT>;
def CFMLT : SOpInst<"vclt", "U..", "lUldQdQlQUl", OP_LT>;
-def CMEQ : SInst<"vceqz", "U.",
+def CMEQ : SInst<"vceqz", "U(.!)",
"csilfUcUsUiUlPcPlQcQsQiQlQfQUcQUsQUiQUlQPcdQdQPl">;
-def CMGE : SInst<"vcgez", "U.", "csilfdQcQsQiQlQfQd">;
-def CMLE : SInst<"vclez", "U.", "csilfdQcQsQiQlQfQd">;
-def CMGT : SInst<"vcgtz", "U.", "csilfdQcQsQiQlQfQd">;
-def CMLT : SInst<"vcltz", "U.", "csilfdQcQsQiQlQfQd">;
+def CMGE : SInst<"vcgez", "U(.!)", "csilfdQcQsQiQlQfQd">;
+def CMLE : SInst<"vclez", "U(.!)", "csilfdQcQsQiQlQfQd">;
+def CMGT : SInst<"vcgtz", "U(.!)", "csilfdQcQsQiQlQfQd">;
+def CMLT : SInst<"vcltz", "U(.!)", "csilfdQcQsQiQlQfQd">;
////////////////////////////////////////////////////////////////////////////////
// Max/Min Integer
@@ -1667,11 +1667,11 @@ let TargetGuard = "fullfp16,neon" in {
// ARMv8.2-A FP16 one-operand vector intrinsics.
// Comparison
- def CMEQH : SInst<"vceqz", "U.", "hQh">;
- def CMGEH : SInst<"vcgez", "U.", "hQh">;
- def CMGTH : SInst<"vcgtz", "U.", "hQh">;
- def CMLEH : SInst<"vclez", "U.", "hQh">;
- def CMLTH : SInst<"vcltz", "U.", "hQh">;
+ def CMEQH : SInst<"vceqz", "U(.!)", "hQh">;
+ def CMGEH : SInst<"vcgez", "U(.!)", "hQh">;
+ def CMGTH : SInst<"vcgtz", "U(.!)", "hQh">;
+ def CMLEH : SInst<"vclez", "U(.!)", "hQh">;
+ def CMLTH : SInst<"vcltz", "U(.!)", "hQh">;
// Vector conversion
def VCVT_F16 : SInst<"vcvt_f16", "F(.!)", "sUsQsQUs">;
@@ -2090,17 +2090,17 @@ let ArchGuard = "defined(__aarch64__) || defined(__arm64ec__)", TargetGuard = "r
// Lookup table read with 2-bit/4-bit indices
let ArchGuard = "defined(__aarch64__)", TargetGuard = "lut" in {
- def VLUTI2_B : SInst<"vluti2_lane", "Q.(qU)I", "cUcPcQcQUcQPc",
+ def VLUTI2_B : SInst<"vluti2_lane", "Q.(qU)I", "cUcPcmQcQUcQPcQm",
[ImmCheck<2, ImmCheck0_1>]>;
- def VLUTI2_B_Q : SInst<"vluti2_laneq", "Q.(QU)I", "cUcPcQcQUcQPc",
+ def VLUTI2_B_Q : SInst<"vluti2_laneq", "Q.(QU)I", "cUcPcmQcQUcQPcQm",
[ImmCheck<2, ImmCheck0_3>]>;
def VLUTI2_H : SInst<"vluti2_lane", "Q.(<qU)I", "sUsPshQsQUsQPsQh",
[ImmCheck<2, ImmCheck0_3>]>;
def VLUTI2_H_Q : SInst<"vluti2_laneq", "Q.(<QU)I", "sUsPshQsQUsQPsQh",
[ImmCheck<2, ImmCheck0_7>]>;
- def VLUTI4_B : SInst<"vluti4_lane", "..(qU)I", "QcQUcQPc",
+ def VLUTI4_B : SInst<"vluti4_lane", "..(qU)I", "QcQUcQPcQm",
[ImmCheck<2, ImmCheck0_0>]>;
- def VLUTI4_B_Q : SInst<"vluti4_laneq", "..UI", "QcQUcQPc",
+ def VLUTI4_B_Q : SInst<"vluti4_laneq", "..UI", "QcQUcQPcQm",
[ImmCheck<2, ImmCheck0_1>]>;
def VLUTI4_H_X2 : SInst<"vluti4_lane_x2", ".2(<qU)I", "QsQUsQPsQh",
[ImmCheck<3, ImmCheck0_1>]>;
@@ -2194,4 +2194,70 @@ let ArchGuard = "defined(__aarch64__)", TargetGuard = "fp8,neon" in {
// fscale
def FSCALE_V128 : WInst<"vscale", "..(.S)", "QdQfQh">;
def FSCALE_V64 : WInst<"vscale", "(.q)(.q)(.qS)", "fh">;
+}
+
+//FP8 versions of untyped intrinsics
+let ArchGuard = "defined(__aarch64__)" in {
+ def VGET_LANE_MF8 : IInst<"vget_lane", "1.I", "mQm", [ImmCheck<1, ImmCheckLaneIndex, 0>]>;
+ def SPLAT_MF8 : WInst<"splat_lane", ".(!q)I", "mQm", [ImmCheck<1, ImmCheckLaneIndex, 0>]>;
+ def SPLATQ_MF8 : WInst<"splat_laneq", ".(!Q)I", "mQm", [ImmCheck<1, ImmCheckLaneIndex, 0>]>;
+ def VSET_LANE_MF8 : IInst<"vset_lane", ".1.I", "mQm", [ImmCheck<2, ImmCheckLaneIndex, 1>]>;
+ def VCREATE_MF8 : NoTestOpInst<"vcreate", ".(IU>)", "m", OP_CAST> { let BigEndianSafe = 1; }
+ let InstName = "vmov" in {
+ def VDUP_N_MF8 : WOpInst<"vdup_n", ".1", "mQm", OP_DUP>;
+ def VMOV_N_MF8 : WOpInst<"vmov_n", ".1", "mQm", OP_DUP>;
+ }
+ let InstName = "" in
+ def VDUP_LANE_MF8: WOpInst<"vdup_lane", ".qI", "mQm", OP_DUP_LN>;
+ def VCOMBINE_MF8 : NoTestOpInst<"vcombine", "Q..", "m", OP_CONC>;
+ let InstName = "vmov" in {
+ def VGET_HIGH_MF8 : NoTestOpInst<"vget_high", ".Q", "m", OP_HI>;
+ def VGET_LOW_MF8 : NoTestOpInst<"vget_low", ".Q", "m", OP_LO>;
+ }
+ let InstName = "vtbl" in {
+ def VTBL1_MF8 : WInst<"vtbl1", "..p", "m">;
+ def VTBL2_MF8 : WInst<"vtbl2", ".2p", "m">;
+ def VTBL3_MF8 : WInst<"vtbl3", ".3p", "m">;
+ def VTBL4_MF8 : WInst<"vtbl4", ".4p", "m">;
+ }
+ let InstName = "vtbx" in {
+ def VTBX1_MF8 : WInst<"vtbx1", "...p", "m">;
+ def VTBX2_MF8 : WInst<"vtbx2", "..2p", "m">;
+ def VTBX3_MF8 : WInst<"vtbx3", "..3p", "m">;
+ def VTBX4_MF8 : WInst<"vtbx4", "..4p", "m">;
+ }
+ def VEXT_MF8 : WInst<"vext", "...I", "mQm", [ImmCheck<2, ImmCheckLaneIndex, 0>]>;
+ def VREV64_MF8 : WOpInst<"vrev64", "..", "mQm", OP_REV64>;
+ def VREV32_MF8 : WOpInst<"vrev32", "..", "mQm", OP_REV32>;
+ def VREV16_MF8 : WOpInst<"vrev16", "..", "mQm", OP_REV16>;
+ let isHiddenLInst = 1 in
+ def VBSL_MF8 : SInst<"vbsl", ".U..", "mQm">;
+ def VTRN_MF8 : WInst<"vtrn", "2..", "mQm">;
+ def VZIP_MF8 : WInst<"vzip", "2..", "mQm">;
+ def VUZP_MF8 : WInst<"vuzp", "2..", "mQm">;
+ def COPY_LANE_MF8 : IOpInst<"vcopy_lane", "..I.I", "m", OP_COPY_LN>;
+ def COPYQ_LANE_MF8 : IOpInst<"vcopy_lane", "..IqI", "Qm", OP_COPY_LN>;
+ def COPY_LANEQ_MF8 : IOpInst<"vcopy_laneq", "..IQI", "m", OP_COPY_LN>;
+ def COPYQ_LANEQ_MF8 : IOpInst<"vcopy_laneq", "..I.I", "Qm", OP_COPY_LN>;
+ def VDUP_LANE2_MF8 : WOpInst<"vdup_laneq", ".QI", "mQm", OP_DUP_LN>;
+ def VTRN1_MF8 : SOpInst<"vtrn1", "...", "mQm", OP_TRN1>;
+ def VZIP1_MF8 : SOpInst<"vzip1", "...", "mQm", OP_ZIP1>;
+ def VUZP1_MF8 : SOpInst<"vuzp1", "...", "mQm", OP_UZP1>;
+ def VTRN2_MF8 : SOpInst<"vtrn2", "...", "mQm", OP_TRN2>;
+ def VZIP2_MF8 : SOpInst<"vzip2", "...", "mQm", OP_ZIP2>;
+ def VUZP2_MF8 : SOpInst<"vuzp2", "...", "mQm", OP_UZP2>;
+ let InstName = "vtbl" in {
+ def VQTBL1_A64_MF8 : WInst<"vqtbl1", ".QU", "mQm">;
+ def VQTBL2_A64_MF8 : WInst<"vqtbl2", ".(2Q)U", "mQm">;
+ def VQTBL3_A64_MF8 : WInst<"vqtbl3", ".(3Q)U", "mQm">;
+ def VQTBL4_A64_MF8 : WInst<"vqtbl4", ".(4Q)U", "mQm">;
+ }
+ let InstName = "vtbx" in {
+ def VQTBX1_A64_MF8 : WInst<"vqtbx1", "..QU", "mQm">;
+ def VQTBX2_A64_MF8 : WInst<"vqtbx2", "..(2Q)U", "mQm">;
+ def VQTBX3_A64_MF8 : WInst<"vqtbx3", "..(3Q)U", "mQm">;
+ def VQTBX4_A64_MF8 : WInst<"vqtbx4", "..(4Q)U", "mQm">;
+ }
+ def SCALAR_VDUP_LANE_MF8 : IInst<"vdup_lane", "1.I", "Sm", [ImmCheck<1, ImmCheckLaneIndex, 0>]>;
+ def SCALAR_VDUP_LANEQ_MF8 : IInst<"vdup_laneq", "1QI", "Sm", [ImmCheck<1, ImmCheckLaneIndex, 0>]>;
}
\ No newline at end of file
diff --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp
index 5c6ca4c9ee4de..655abd0ecdb50 100644
--- a/clang/lib/AST/ExprConstant.cpp
+++ b/clang/lib/AST/ExprConstant.cpp
@@ -11172,6 +11172,11 @@ VectorExprEvaluator::VisitInitListExpr(const InitListExpr *E) {
QualType EltTy = VT->getElementType();
SmallVector<APValue, 4> Elements;
+ // MFloat8 type doesn't have constants and thus constant folding
+ // is impossible.
+ if (EltTy->isMFloat8Type())
+ return false;
+
// The number of initializers can be less than the number of
// vector elements. For OpenCL, this can be due to nested vector
// initialization. For GCC compatibility, missing trailing elements
diff --git a/clang/lib/AST/Type.cpp b/clang/lib/AST/Type.cpp
index 8c11ec2e1fe24..2f7a3a5688973 100644
--- a/clang/lib/AST/Type.cpp
+++ b/clang/lib/AST/Type.cpp
@@ -2777,6 +2777,11 @@ static bool isTriviallyCopyableTypeImpl(const QualType &type,
if (CanonicalType->isScalarType() || CanonicalType->isVectorType())
return true;
+ // Mfloat8 type is a special case as it not scalar, but is still trivially
+ // copyable.
+ if (CanonicalType->isMFloat8Type())
+ return true;
+
if (const auto *RT = CanonicalType->getAs<RecordType>()) {
if (const auto *ClassDecl = dyn_cast<CXXRecordDecl>(RT->getDecl())) {
if (IsCopyConstructible) {
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 361e4c4bf2e2e..03062f01907d1 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -8189,8 +8189,9 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
// Determine the type of this overloaded NEON intrinsic.
NeonTypeFlags Type(NeonTypeConst->getZExtValue());
- bool Usgn = Type.isUnsigned();
- bool Quad = Type.isQuad();
+ const bool Usgn = Type.isUnsigned();
+ const bool Quad = Type.isQuad();
+ const bool Floating = Type.isFloatingPoint();
const bool HasLegalHalfType = getTarget().hasLegalHalfType();
const bool AllowBFloatArgsAndRet =
getTargetHooks().getABIInfo().allowBFloatArgsAndRet();
@@ -8291,24 +8292,28 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
}
case NEON::BI__builtin_neon_vceqz_v:
case NEON::BI__builtin_neon_vceqzq_v:
- return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OEQ,
- ICmpInst::ICMP_EQ, "vceqz");
+ return EmitAArch64CompareBuiltinExpr(
+ Ops[0], Ty, Floating ? ICmpInst::FCMP_OEQ : ICmpInst::ICMP_EQ, "vceqz");
case NEON::BI__builtin_neon_vcgez_v:
case NEON::BI__builtin_neon_vcgezq_v:
- return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OGE,
- ICmpInst::ICMP_SGE, "vcgez");
+ return EmitAArch64CompareBuiltinExpr(
+ Ops[0], Ty, Floating ? ICmpInst::FCMP_OGE : ICmpInst::ICMP_SGE,
+ "vcgez");
case NEON::BI__builtin_neon_vclez_v:
case NEON::BI__builtin_neon_vclezq_v:
- return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OLE,
- ICmpInst::ICMP_SLE, "vclez");
+ return EmitAArch64CompareBuiltinExpr(
+ Ops[0], Ty, Floating ? ICmpInst::FCMP_OLE : ICmpInst::ICMP_SLE,
+ "vclez");
case NEON::BI__builtin_neon_vcgtz_v:
case NEON::BI__builtin_neon_vcgtzq_v:
- return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OGT,
- ICmpInst::ICMP_SGT, "vcgtz");
+ return EmitAArch64CompareBuiltinExpr(
+ Ops[0], Ty, Floating ? ICmpInst::FCMP_OGT : ICmpInst::ICMP_SGT,
+ "vcgtz");
case NEON::BI__builtin_neon_vcltz_v:
case NEON::BI__builtin_neon_vcltzq_v:
- return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OLT,
- ICmpInst::ICMP_SLT, "vcltz");
+ return EmitAArch64CompareBuiltinExpr(
+ Ops[0], Ty, Floating ? ICmpInst::FCMP_OLT : ICmpInst::ICMP_SLT,
+ "vcltz");
case NEON::BI__builtin_neon_vclz_v:
case NEON::BI__builtin_neon_vclzq_v:
// We generate target-independent intrinsic, which needs a second argument
@@ -8871,28 +8876,32 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
return Builder.CreateBitCast(Result, ResultType, NameHint);
}
-Value *CodeGenFunction::EmitAArch64CompareBuiltinExpr(
- Value *Op, llvm::Type *Ty, const CmpInst::Predicate Fp,
- const CmpInst::Predicate Ip, const Twine &Name) {
- llvm::Type *OTy = Op->getType();
-
- // FIXME: this is utterly horrific. We should not be looking at previous
- // codegen context to find out what needs doing. Unfortunately TableGen
- // currently gives us exactly the same calls for vceqz_f32 and vceqz_s32
- // (etc).
- if (BitCastInst *BI = dyn_cast<BitCastInst>(Op))
- OTy = BI->getOperand(0)->getType();
-
- Op = Builder.CreateBitCast(Op, OTy);
- if (OTy->getScalarType()->isFloatingPointTy()) {
- if (Fp == CmpInst::FCMP_OEQ)
- Op = Builder.CreateFCmp(Fp, Op, Constant::getNullValue(OTy));
+Value *
+CodeGenFunction::EmitAArch64CompareBuiltinExpr(Value *Op, llvm::Type *Ty,
+ const CmpInst::Predicate Pred,
+ const Twine &Name) {
+
+ if (isa<FixedVectorType>(Ty)) {
+ // Vector types are cast to i8 vectors. Recover original type.
+ Op = Builder.CreateBitCast(Op, Ty);
+ }
+
+ if (CmpInst::isFPPredicate(Pred)) {
+...
[truncated]
|
@llvm/pr-subscribers-clang Author: None (Lukacma) Changes
Patch is 7.06 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/128019.diff 53 Files Affected:
diff --git a/clang/include/clang/Basic/TargetBuiltins.h b/clang/include/clang/Basic/TargetBuiltins.h
index 4781054240b5b..c1ba65064f159 100644
--- a/clang/include/clang/Basic/TargetBuiltins.h
+++ b/clang/include/clang/Basic/TargetBuiltins.h
@@ -263,6 +263,10 @@ namespace clang {
EltType ET = getEltType();
return ET == Poly8 || ET == Poly16 || ET == Poly64;
}
+ bool isFloatingPoint() const {
+ EltType ET = getEltType();
+ return ET == Float16 || ET == Float32 || ET == Float64 || ET == BFloat16;
+ }
bool isUnsigned() const { return (Flags & UnsignedFlag) != 0; }
bool isQuad() const { return (Flags & QuadFlag) != 0; }
unsigned getEltSizeInBits() const {
diff --git a/clang/include/clang/Basic/arm_neon.td b/clang/include/clang/Basic/arm_neon.td
index 3e73dd054933f..90f0e90e4a7f8 100644
--- a/clang/include/clang/Basic/arm_neon.td
+++ b/clang/include/clang/Basic/arm_neon.td
@@ -31,8 +31,8 @@ def OP_MLAL : Op<(op "+", $p0, (call "vmull", $p1, $p2))>;
def OP_MULLHi : Op<(call "vmull", (call "vget_high", $p0),
(call "vget_high", $p1))>;
def OP_MULLHi_P64 : Op<(call "vmull",
- (cast "poly64_t", (call "vget_high", $p0)),
- (cast "poly64_t", (call "vget_high", $p1)))>;
+ (bitcast "poly64_t", (call "vget_high", $p0)),
+ (bitcast "poly64_t", (call "vget_high", $p1)))>;
def OP_MULLHi_N : Op<(call "vmull_n", (call "vget_high", $p0), $p1)>;
def OP_MLALHi : Op<(call "vmlal", $p0, (call "vget_high", $p1),
(call "vget_high", $p2))>;
@@ -95,11 +95,11 @@ def OP_TRN2 : Op<(shuffle $p0, $p1, (interleave
def OP_ZIP2 : Op<(shuffle $p0, $p1, (highhalf (interleave mask0, mask1)))>;
def OP_UZP2 : Op<(shuffle $p0, $p1, (add (decimate (rotl mask0, 1), 2),
(decimate (rotl mask1, 1), 2)))>;
-def OP_EQ : Op<(cast "R", (op "==", $p0, $p1))>;
-def OP_GE : Op<(cast "R", (op ">=", $p0, $p1))>;
-def OP_LE : Op<(cast "R", (op "<=", $p0, $p1))>;
-def OP_GT : Op<(cast "R", (op ">", $p0, $p1))>;
-def OP_LT : Op<(cast "R", (op "<", $p0, $p1))>;
+def OP_EQ : Op<(bitcast "R", (op "==", $p0, $p1))>;
+def OP_GE : Op<(bitcast "R", (op ">=", $p0, $p1))>;
+def OP_LE : Op<(bitcast "R", (op "<=", $p0, $p1))>;
+def OP_GT : Op<(bitcast "R", (op ">", $p0, $p1))>;
+def OP_LT : Op<(bitcast "R", (op "<", $p0, $p1))>;
def OP_NEG : Op<(op "-", $p0)>;
def OP_NOT : Op<(op "~", $p0)>;
def OP_AND : Op<(op "&", $p0, $p1)>;
@@ -108,20 +108,20 @@ def OP_XOR : Op<(op "^", $p0, $p1)>;
def OP_ANDN : Op<(op "&", $p0, (op "~", $p1))>;
def OP_ORN : Op<(op "|", $p0, (op "~", $p1))>;
def OP_CAST : LOp<[(save_temp $promote, $p0),
- (cast "R", $promote)]>;
+ (bitcast "R", $promote)]>;
def OP_HI : Op<(shuffle $p0, $p0, (highhalf mask0))>;
def OP_LO : Op<(shuffle $p0, $p0, (lowhalf mask0))>;
def OP_CONC : Op<(shuffle $p0, $p1, (add mask0, mask1))>;
def OP_DUP : Op<(dup $p0)>;
def OP_DUP_LN : Op<(call_mangled "splat_lane", $p0, $p1)>;
-def OP_SEL : Op<(cast "R", (op "|",
- (op "&", $p0, (cast $p0, $p1)),
- (op "&", (op "~", $p0), (cast $p0, $p2))))>;
+def OP_SEL : Op<(bitcast "R", (op "|",
+ (op "&", $p0, (bitcast $p0, $p1)),
+ (op "&", (op "~", $p0), (bitcast $p0, $p2))))>;
def OP_REV16 : Op<(shuffle $p0, $p0, (rev 16, mask0))>;
def OP_REV32 : Op<(shuffle $p0, $p0, (rev 32, mask0))>;
def OP_REV64 : Op<(shuffle $p0, $p0, (rev 64, mask0))>;
def OP_XTN : Op<(call "vcombine", $p0, (call "vmovn", $p1))>;
-def OP_SQXTUN : Op<(call "vcombine", (cast $p0, "U", $p0),
+def OP_SQXTUN : Op<(call "vcombine", (bitcast $p0, "U", $p0),
(call "vqmovun", $p1))>;
def OP_QXTN : Op<(call "vcombine", $p0, (call "vqmovn", $p1))>;
def OP_VCVT_NA_HI_F16 : Op<(call "vcombine", $p0, (call "vcvt_f16_f32", $p1))>;
@@ -129,12 +129,12 @@ def OP_VCVT_NA_HI_F32 : Op<(call "vcombine", $p0, (call "vcvt_f32_f64", $p1))>;
def OP_VCVT_EX_HI_F32 : Op<(call "vcvt_f32_f16", (call "vget_high", $p0))>;
def OP_VCVT_EX_HI_F64 : Op<(call "vcvt_f64_f32", (call "vget_high", $p0))>;
def OP_VCVTX_HI : Op<(call "vcombine", $p0, (call "vcvtx_f32", $p1))>;
-def OP_REINT : Op<(cast "R", $p0)>;
+def OP_REINT : Op<(bitcast "R", $p0)>;
def OP_ADDHNHi : Op<(call "vcombine", $p0, (call "vaddhn", $p1, $p2))>;
def OP_RADDHNHi : Op<(call "vcombine", $p0, (call "vraddhn", $p1, $p2))>;
def OP_SUBHNHi : Op<(call "vcombine", $p0, (call "vsubhn", $p1, $p2))>;
def OP_RSUBHNHi : Op<(call "vcombine", $p0, (call "vrsubhn", $p1, $p2))>;
-def OP_ABDL : Op<(cast "R", (call "vmovl", (cast $p0, "U",
+def OP_ABDL : Op<(bitcast "R", (call "vmovl", (bitcast $p0, "U",
(call "vabd", $p0, $p1))))>;
def OP_ABDLHi : Op<(call "vabdl", (call "vget_high", $p0),
(call "vget_high", $p1))>;
@@ -152,15 +152,15 @@ def OP_QDMLSLHi : Op<(call "vqdmlsl", $p0, (call "vget_high", $p1),
(call "vget_high", $p2))>;
def OP_QDMLSLHi_N : Op<(call "vqdmlsl_n", $p0, (call "vget_high", $p1), $p2)>;
def OP_DIV : Op<(op "/", $p0, $p1)>;
-def OP_LONG_HI : Op<(cast "R", (call (name_replace "_high_", "_"),
+def OP_LONG_HI : Op<(bitcast "R", (call (name_replace "_high_", "_"),
(call "vget_high", $p0), $p1))>;
-def OP_NARROW_HI : Op<(cast "R", (call "vcombine",
- (cast "R", "H", $p0),
- (cast "R", "H",
+def OP_NARROW_HI : Op<(bitcast "R", (call "vcombine",
+ (bitcast "R", "H", $p0),
+ (bitcast "R", "H",
(call (name_replace "_high_", "_"),
$p1, $p2))))>;
def OP_MOVL_HI : LOp<[(save_temp $a1, (call "vget_high", $p0)),
- (cast "R",
+ (bitcast "R",
(call "vshll_n", $a1, (literal "int32_t", "0")))]>;
def OP_COPY_LN : Op<(call "vset_lane", (call "vget_lane", $p2, $p3), $p0, $p1)>;
def OP_SCALAR_MUL_LN : Op<(op "*", $p0, (call "vget_lane", $p1, $p2))>;
@@ -221,18 +221,18 @@ def OP_FMLSL_LN_Hi : Op<(call "vfmlsl_high", $p0, $p1,
def OP_USDOT_LN
: Op<(call "vusdot", $p0, $p1,
- (cast "8", "S", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)))>;
+ (bitcast "8", "S", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)))>;
def OP_USDOT_LNQ
: Op<(call "vusdot", $p0, $p1,
- (cast "8", "S", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)))>;
+ (bitcast "8", "S", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)))>;
// sudot splats the second vector and then calls vusdot
def OP_SUDOT_LN
: Op<(call "vusdot", $p0,
- (cast "8", "U", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)), $p1)>;
+ (bitcast "8", "U", (call_mangled "splat_lane", (bitcast "int32x2_t", $p2), $p3)), $p1)>;
def OP_SUDOT_LNQ
: Op<(call "vusdot", $p0,
- (cast "8", "U", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)), $p1)>;
+ (bitcast "8", "U", (call_mangled "splat_lane", (bitcast "int32x4_t", $p2), $p3)), $p1)>;
def OP_BFDOT_LN
: Op<(call "vbfdot", $p0, $p1,
@@ -263,7 +263,7 @@ def OP_VCVT_BF16_F32_A32
: Op<(call "__a32_vcvt_bf16", $p0)>;
def OP_VCVT_BF16_F32_LO_A32
- : Op<(call "vcombine", (cast "bfloat16x4_t", (literal "uint64_t", "0ULL")),
+ : Op<(call "vcombine", (bitcast "bfloat16x4_t", (literal "uint64_t", "0ULL")),
(call "__a32_vcvt_bf16", $p0))>;
def OP_VCVT_BF16_F32_HI_A32
: Op<(call "vcombine", (call "__a32_vcvt_bf16", $p1),
@@ -924,12 +924,12 @@ def CFMLE : SOpInst<"vcle", "U..", "lUldQdQlQUl", OP_LE>;
def CFMGT : SOpInst<"vcgt", "U..", "lUldQdQlQUl", OP_GT>;
def CFMLT : SOpInst<"vclt", "U..", "lUldQdQlQUl", OP_LT>;
-def CMEQ : SInst<"vceqz", "U.",
+def CMEQ : SInst<"vceqz", "U(.!)",
"csilfUcUsUiUlPcPlQcQsQiQlQfQUcQUsQUiQUlQPcdQdQPl">;
-def CMGE : SInst<"vcgez", "U.", "csilfdQcQsQiQlQfQd">;
-def CMLE : SInst<"vclez", "U.", "csilfdQcQsQiQlQfQd">;
-def CMGT : SInst<"vcgtz", "U.", "csilfdQcQsQiQlQfQd">;
-def CMLT : SInst<"vcltz", "U.", "csilfdQcQsQiQlQfQd">;
+def CMGE : SInst<"vcgez", "U(.!)", "csilfdQcQsQiQlQfQd">;
+def CMLE : SInst<"vclez", "U(.!)", "csilfdQcQsQiQlQfQd">;
+def CMGT : SInst<"vcgtz", "U(.!)", "csilfdQcQsQiQlQfQd">;
+def CMLT : SInst<"vcltz", "U(.!)", "csilfdQcQsQiQlQfQd">;
////////////////////////////////////////////////////////////////////////////////
// Max/Min Integer
@@ -1667,11 +1667,11 @@ let TargetGuard = "fullfp16,neon" in {
// ARMv8.2-A FP16 one-operand vector intrinsics.
// Comparison
- def CMEQH : SInst<"vceqz", "U.", "hQh">;
- def CMGEH : SInst<"vcgez", "U.", "hQh">;
- def CMGTH : SInst<"vcgtz", "U.", "hQh">;
- def CMLEH : SInst<"vclez", "U.", "hQh">;
- def CMLTH : SInst<"vcltz", "U.", "hQh">;
+ def CMEQH : SInst<"vceqz", "U(.!)", "hQh">;
+ def CMGEH : SInst<"vcgez", "U(.!)", "hQh">;
+ def CMGTH : SInst<"vcgtz", "U(.!)", "hQh">;
+ def CMLEH : SInst<"vclez", "U(.!)", "hQh">;
+ def CMLTH : SInst<"vcltz", "U(.!)", "hQh">;
// Vector conversion
def VCVT_F16 : SInst<"vcvt_f16", "F(.!)", "sUsQsQUs">;
@@ -2090,17 +2090,17 @@ let ArchGuard = "defined(__aarch64__) || defined(__arm64ec__)", TargetGuard = "r
// Lookup table read with 2-bit/4-bit indices
let ArchGuard = "defined(__aarch64__)", TargetGuard = "lut" in {
- def VLUTI2_B : SInst<"vluti2_lane", "Q.(qU)I", "cUcPcQcQUcQPc",
+ def VLUTI2_B : SInst<"vluti2_lane", "Q.(qU)I", "cUcPcmQcQUcQPcQm",
[ImmCheck<2, ImmCheck0_1>]>;
- def VLUTI2_B_Q : SInst<"vluti2_laneq", "Q.(QU)I", "cUcPcQcQUcQPc",
+ def VLUTI2_B_Q : SInst<"vluti2_laneq", "Q.(QU)I", "cUcPcmQcQUcQPcQm",
[ImmCheck<2, ImmCheck0_3>]>;
def VLUTI2_H : SInst<"vluti2_lane", "Q.(<qU)I", "sUsPshQsQUsQPsQh",
[ImmCheck<2, ImmCheck0_3>]>;
def VLUTI2_H_Q : SInst<"vluti2_laneq", "Q.(<QU)I", "sUsPshQsQUsQPsQh",
[ImmCheck<2, ImmCheck0_7>]>;
- def VLUTI4_B : SInst<"vluti4_lane", "..(qU)I", "QcQUcQPc",
+ def VLUTI4_B : SInst<"vluti4_lane", "..(qU)I", "QcQUcQPcQm",
[ImmCheck<2, ImmCheck0_0>]>;
- def VLUTI4_B_Q : SInst<"vluti4_laneq", "..UI", "QcQUcQPc",
+ def VLUTI4_B_Q : SInst<"vluti4_laneq", "..UI", "QcQUcQPcQm",
[ImmCheck<2, ImmCheck0_1>]>;
def VLUTI4_H_X2 : SInst<"vluti4_lane_x2", ".2(<qU)I", "QsQUsQPsQh",
[ImmCheck<3, ImmCheck0_1>]>;
@@ -2194,4 +2194,70 @@ let ArchGuard = "defined(__aarch64__)", TargetGuard = "fp8,neon" in {
// fscale
def FSCALE_V128 : WInst<"vscale", "..(.S)", "QdQfQh">;
def FSCALE_V64 : WInst<"vscale", "(.q)(.q)(.qS)", "fh">;
+}
+
+//FP8 versions of untyped intrinsics
+let ArchGuard = "defined(__aarch64__)" in {
+ def VGET_LANE_MF8 : IInst<"vget_lane", "1.I", "mQm", [ImmCheck<1, ImmCheckLaneIndex, 0>]>;
+ def SPLAT_MF8 : WInst<"splat_lane", ".(!q)I", "mQm", [ImmCheck<1, ImmCheckLaneIndex, 0>]>;
+ def SPLATQ_MF8 : WInst<"splat_laneq", ".(!Q)I", "mQm", [ImmCheck<1, ImmCheckLaneIndex, 0>]>;
+ def VSET_LANE_MF8 : IInst<"vset_lane", ".1.I", "mQm", [ImmCheck<2, ImmCheckLaneIndex, 1>]>;
+ def VCREATE_MF8 : NoTestOpInst<"vcreate", ".(IU>)", "m", OP_CAST> { let BigEndianSafe = 1; }
+ let InstName = "vmov" in {
+ def VDUP_N_MF8 : WOpInst<"vdup_n", ".1", "mQm", OP_DUP>;
+ def VMOV_N_MF8 : WOpInst<"vmov_n", ".1", "mQm", OP_DUP>;
+ }
+ let InstName = "" in
+ def VDUP_LANE_MF8: WOpInst<"vdup_lane", ".qI", "mQm", OP_DUP_LN>;
+ def VCOMBINE_MF8 : NoTestOpInst<"vcombine", "Q..", "m", OP_CONC>;
+ let InstName = "vmov" in {
+ def VGET_HIGH_MF8 : NoTestOpInst<"vget_high", ".Q", "m", OP_HI>;
+ def VGET_LOW_MF8 : NoTestOpInst<"vget_low", ".Q", "m", OP_LO>;
+ }
+ let InstName = "vtbl" in {
+ def VTBL1_MF8 : WInst<"vtbl1", "..p", "m">;
+ def VTBL2_MF8 : WInst<"vtbl2", ".2p", "m">;
+ def VTBL3_MF8 : WInst<"vtbl3", ".3p", "m">;
+ def VTBL4_MF8 : WInst<"vtbl4", ".4p", "m">;
+ }
+ let InstName = "vtbx" in {
+ def VTBX1_MF8 : WInst<"vtbx1", "...p", "m">;
+ def VTBX2_MF8 : WInst<"vtbx2", "..2p", "m">;
+ def VTBX3_MF8 : WInst<"vtbx3", "..3p", "m">;
+ def VTBX4_MF8 : WInst<"vtbx4", "..4p", "m">;
+ }
+ def VEXT_MF8 : WInst<"vext", "...I", "mQm", [ImmCheck<2, ImmCheckLaneIndex, 0>]>;
+ def VREV64_MF8 : WOpInst<"vrev64", "..", "mQm", OP_REV64>;
+ def VREV32_MF8 : WOpInst<"vrev32", "..", "mQm", OP_REV32>;
+ def VREV16_MF8 : WOpInst<"vrev16", "..", "mQm", OP_REV16>;
+ let isHiddenLInst = 1 in
+ def VBSL_MF8 : SInst<"vbsl", ".U..", "mQm">;
+ def VTRN_MF8 : WInst<"vtrn", "2..", "mQm">;
+ def VZIP_MF8 : WInst<"vzip", "2..", "mQm">;
+ def VUZP_MF8 : WInst<"vuzp", "2..", "mQm">;
+ def COPY_LANE_MF8 : IOpInst<"vcopy_lane", "..I.I", "m", OP_COPY_LN>;
+ def COPYQ_LANE_MF8 : IOpInst<"vcopy_lane", "..IqI", "Qm", OP_COPY_LN>;
+ def COPY_LANEQ_MF8 : IOpInst<"vcopy_laneq", "..IQI", "m", OP_COPY_LN>;
+ def COPYQ_LANEQ_MF8 : IOpInst<"vcopy_laneq", "..I.I", "Qm", OP_COPY_LN>;
+ def VDUP_LANE2_MF8 : WOpInst<"vdup_laneq", ".QI", "mQm", OP_DUP_LN>;
+ def VTRN1_MF8 : SOpInst<"vtrn1", "...", "mQm", OP_TRN1>;
+ def VZIP1_MF8 : SOpInst<"vzip1", "...", "mQm", OP_ZIP1>;
+ def VUZP1_MF8 : SOpInst<"vuzp1", "...", "mQm", OP_UZP1>;
+ def VTRN2_MF8 : SOpInst<"vtrn2", "...", "mQm", OP_TRN2>;
+ def VZIP2_MF8 : SOpInst<"vzip2", "...", "mQm", OP_ZIP2>;
+ def VUZP2_MF8 : SOpInst<"vuzp2", "...", "mQm", OP_UZP2>;
+ let InstName = "vtbl" in {
+ def VQTBL1_A64_MF8 : WInst<"vqtbl1", ".QU", "mQm">;
+ def VQTBL2_A64_MF8 : WInst<"vqtbl2", ".(2Q)U", "mQm">;
+ def VQTBL3_A64_MF8 : WInst<"vqtbl3", ".(3Q)U", "mQm">;
+ def VQTBL4_A64_MF8 : WInst<"vqtbl4", ".(4Q)U", "mQm">;
+ }
+ let InstName = "vtbx" in {
+ def VQTBX1_A64_MF8 : WInst<"vqtbx1", "..QU", "mQm">;
+ def VQTBX2_A64_MF8 : WInst<"vqtbx2", "..(2Q)U", "mQm">;
+ def VQTBX3_A64_MF8 : WInst<"vqtbx3", "..(3Q)U", "mQm">;
+ def VQTBX4_A64_MF8 : WInst<"vqtbx4", "..(4Q)U", "mQm">;
+ }
+ def SCALAR_VDUP_LANE_MF8 : IInst<"vdup_lane", "1.I", "Sm", [ImmCheck<1, ImmCheckLaneIndex, 0>]>;
+ def SCALAR_VDUP_LANEQ_MF8 : IInst<"vdup_laneq", "1QI", "Sm", [ImmCheck<1, ImmCheckLaneIndex, 0>]>;
}
\ No newline at end of file
diff --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp
index 5c6ca4c9ee4de..655abd0ecdb50 100644
--- a/clang/lib/AST/ExprConstant.cpp
+++ b/clang/lib/AST/ExprConstant.cpp
@@ -11172,6 +11172,11 @@ VectorExprEvaluator::VisitInitListExpr(const InitListExpr *E) {
QualType EltTy = VT->getElementType();
SmallVector<APValue, 4> Elements;
+ // MFloat8 type doesn't have constants and thus constant folding
+ // is impossible.
+ if (EltTy->isMFloat8Type())
+ return false;
+
// The number of initializers can be less than the number of
// vector elements. For OpenCL, this can be due to nested vector
// initialization. For GCC compatibility, missing trailing elements
diff --git a/clang/lib/AST/Type.cpp b/clang/lib/AST/Type.cpp
index 8c11ec2e1fe24..2f7a3a5688973 100644
--- a/clang/lib/AST/Type.cpp
+++ b/clang/lib/AST/Type.cpp
@@ -2777,6 +2777,11 @@ static bool isTriviallyCopyableTypeImpl(const QualType &type,
if (CanonicalType->isScalarType() || CanonicalType->isVectorType())
return true;
+ // Mfloat8 type is a special case as it not scalar, but is still trivially
+ // copyable.
+ if (CanonicalType->isMFloat8Type())
+ return true;
+
if (const auto *RT = CanonicalType->getAs<RecordType>()) {
if (const auto *ClassDecl = dyn_cast<CXXRecordDecl>(RT->getDecl())) {
if (IsCopyConstructible) {
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 361e4c4bf2e2e..03062f01907d1 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -8189,8 +8189,9 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
// Determine the type of this overloaded NEON intrinsic.
NeonTypeFlags Type(NeonTypeConst->getZExtValue());
- bool Usgn = Type.isUnsigned();
- bool Quad = Type.isQuad();
+ const bool Usgn = Type.isUnsigned();
+ const bool Quad = Type.isQuad();
+ const bool Floating = Type.isFloatingPoint();
const bool HasLegalHalfType = getTarget().hasLegalHalfType();
const bool AllowBFloatArgsAndRet =
getTargetHooks().getABIInfo().allowBFloatArgsAndRet();
@@ -8291,24 +8292,28 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
}
case NEON::BI__builtin_neon_vceqz_v:
case NEON::BI__builtin_neon_vceqzq_v:
- return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OEQ,
- ICmpInst::ICMP_EQ, "vceqz");
+ return EmitAArch64CompareBuiltinExpr(
+ Ops[0], Ty, Floating ? ICmpInst::FCMP_OEQ : ICmpInst::ICMP_EQ, "vceqz");
case NEON::BI__builtin_neon_vcgez_v:
case NEON::BI__builtin_neon_vcgezq_v:
- return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OGE,
- ICmpInst::ICMP_SGE, "vcgez");
+ return EmitAArch64CompareBuiltinExpr(
+ Ops[0], Ty, Floating ? ICmpInst::FCMP_OGE : ICmpInst::ICMP_SGE,
+ "vcgez");
case NEON::BI__builtin_neon_vclez_v:
case NEON::BI__builtin_neon_vclezq_v:
- return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OLE,
- ICmpInst::ICMP_SLE, "vclez");
+ return EmitAArch64CompareBuiltinExpr(
+ Ops[0], Ty, Floating ? ICmpInst::FCMP_OLE : ICmpInst::ICMP_SLE,
+ "vclez");
case NEON::BI__builtin_neon_vcgtz_v:
case NEON::BI__builtin_neon_vcgtzq_v:
- return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OGT,
- ICmpInst::ICMP_SGT, "vcgtz");
+ return EmitAArch64CompareBuiltinExpr(
+ Ops[0], Ty, Floating ? ICmpInst::FCMP_OGT : ICmpInst::ICMP_SGT,
+ "vcgtz");
case NEON::BI__builtin_neon_vcltz_v:
case NEON::BI__builtin_neon_vcltzq_v:
- return EmitAArch64CompareBuiltinExpr(Ops[0], Ty, ICmpInst::FCMP_OLT,
- ICmpInst::ICMP_SLT, "vcltz");
+ return EmitAArch64CompareBuiltinExpr(
+ Ops[0], Ty, Floating ? ICmpInst::FCMP_OLT : ICmpInst::ICMP_SLT,
+ "vcltz");
case NEON::BI__builtin_neon_vclz_v:
case NEON::BI__builtin_neon_vclzq_v:
// We generate target-independent intrinsic, which needs a second argument
@@ -8871,28 +8876,32 @@ Value *CodeGenFunction::EmitCommonNeonBuiltinExpr(
return Builder.CreateBitCast(Result, ResultType, NameHint);
}
-Value *CodeGenFunction::EmitAArch64CompareBuiltinExpr(
- Value *Op, llvm::Type *Ty, const CmpInst::Predicate Fp,
- const CmpInst::Predicate Ip, const Twine &Name) {
- llvm::Type *OTy = Op->getType();
-
- // FIXME: this is utterly horrific. We should not be looking at previous
- // codegen context to find out what needs doing. Unfortunately TableGen
- // currently gives us exactly the same calls for vceqz_f32 and vceqz_s32
- // (etc).
- if (BitCastInst *BI = dyn_cast<BitCastInst>(Op))
- OTy = BI->getOperand(0)->getType();
-
- Op = Builder.CreateBitCast(Op, OTy);
- if (OTy->getScalarType()->isFloatingPointTy()) {
- if (Fp == CmpInst::FCMP_OEQ)
- Op = Builder.CreateFCmp(Fp, Op, Constant::getNullValue(OTy));
+Value *
+CodeGenFunction::EmitAArch64CompareBuiltinExpr(Value *Op, llvm::Type *Ty,
+ const CmpInst::Predicate Pred,
+ const Twine &Name) {
+
+ if (isa<FixedVectorType>(Ty)) {
+ // Vector types are cast to i8 vectors. Recover original type.
+ Op = Builder.CreateBitCast(Op, Ty);
+ }
+
+ if (CmpInst::isFPPredicate(Pred)) {
+...
[truncated]
|
✅ With the latest revision this PR passed the undef deprecator. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Marian,
There is some tests failing with the codegen. Can you rebase this patch and fix the failings tests
This patch adds fp8 variants to existing intrinsics, whose operation doesn't depend on arguments being a specific type.
1dae495
to
c331c4c
Compare
Failures should be fixedn now |
clang/lib/CodeGen/CGCall.cpp
Outdated
// Mfloat8 type is loaded as scalar type, but is treated as single | ||
// vector type for other operations. We need to bitcast it to the vector | ||
// type here. | ||
if (auto *EltTy = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this is the best way to solve this issue so would appreciate your feedback on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see an issue here. That is exactly what should happen regardless of the target architecture any time the ABI for that architecture says values of type T
are passed as <1 x T>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the ABI say this? My understand is that values of type _mfp8 are floating-point 8-bit values that are passes as _mfp8. The pretend it's an i8
in some cases and <1 x i8>
in others is purely an implementation detail within clang.
This is not to say the code is invalid, but we should be cautious with how far down the rabbit hole we go.
FYI: As part of @MacDue's work to improve streaming-mode code generation I asked him to add the MVT aarch64mfp8
along with support to load and store it. I expect over time we'll migrate away from using i8
as our scalar type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the fallout will be from this but I think the problem here is we should not have loaded a scalar in the first place. Looking at CodeGenTypes::ConvertTypeForMem()
I can see that we're using a different type for the memory representation than the normal one, which I think is a mistake.
Changing this so the types are consistent will remove the need for this code but I suspect it'll prompt further work elsewhere. My hope is that work sits in target specific areas relating to modelling the builtin so seem reasonable. Please shout though if it starts to get out of control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the ABI say this?
It doesn't. Unfortunately this discussion was split and I didn't replicate all my comments here.
Momchil Velikov 15 Apr at 16:11
The ABI spec (naturally) does not say anything about <1 x i8> . It says (in a somewhat obscure way) that the value > is passed in a FPR.
And then clang/llvm decide to implement the ABI by mapping to <1 x T>.
I consider the "natural" mapping of __mfp8
to LLVM types to be i8
and <1 x i8>
to be merely a hack coming from the peculiar way of implementing ABIs in clang/llvm (by implicit contracts and "mutual understading"). As such <1 x i8>
out to be applicable only for values that are arguments passed in registers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not yet confident in my understanding of the trade-offs between the two approaches, beside that one impacts target-specific code while the other affects target-independent code. As such, I don’t feel well-positioned to contribute meaningfully to this discussion. That said, I’d appreciate it if we could reach alignment here, as I’d like to merge this patch soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The underlying storage for __mfp8
is an FPR and until we decide whether to use a dedicated target type, or LLVM gains an opaque 8-bit floating point type our only option is to represent it as an i8 vector type.
The reason for using i8
was for some specific code reuse but as this PR showed, that reuse is not total and so I'd rather we just be honest and insert the relevant bitcasts when necessary. This will put us in good stead if we decide to go the target type route.
For my education can you explain why the fp8 variants are broken out into their own definitions. Taking |
That's a good question. Its been a while since I implemented this patch, so I have forgotten my reasoning behind this, but I think this might be because I was originally not sure if we want to target guard these, behind fp8 feature flag. Since it looks like we are not doing that I can merge them back to their original intrinsics |
@paulwalker-arm the reasoning behind creating separate records, is that mfloat type is not available for aarch32 architectures and therefore all intrinsics using it need to be gated behind |
I see. How practical would it be for NEONEmitter to infer the ArchGuard based on the type? I'm assuming ArchGuard is either unset or set to what we need for all the cases we care about. This is not a firm ask but it would be nice to reuse the existing definitions if possible. |
…hitecture checks on AArch64
I have adjusted NeonEmitter to automatically emit correct attribute for mfloat8 intrinsics and merged them to the original record. |
clang/lib/CodeGen/CGCall.cpp
Outdated
// Mfloat8 type is loaded as scalar type, but is treated as single | ||
// vector type for other operations. We need to bitcast it to the vector | ||
// type here. | ||
if (auto *EltTy = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the fallout will be from this but I think the problem here is we should not have loaded a scalar in the first place. Looking at CodeGenTypes::ConvertTypeForMem()
I can see that we're using a different type for the memory representation than the normal one, which I think is a mistake.
Changing this so the types are consistent will remove the need for this code but I suspect it'll prompt further work elsewhere. My hope is that work sits in target specific areas relating to modelling the builtin so seem reasonable. Please shout though if it starts to get out of control.
Change-Id: I6c4d9d98fbe46fb3ee115532a9432709c6a86e10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not verified every line of the test files but what I've seen looks good, as do the code changes. Other than a few stylistic suggestions this looks good to me.
Change-Id: I9ee2f41ec8879bd631c6ef64e9dc721ef22cf2a1
Change-Id: Ic460c0e6afdccdc37ef31f78cde9933cdcb3c544
…28019) This patch adds fp8 variants to existing intrinsics, whose operation doesn't depend on arguments being a specific type. It also changes mfloat8 type representation in memory from `i8` to `<1xi8>`
This patch adds fp8 variants to existing intrinsics, whose operation
doesn't depend on arguments being a specific type.
It also changes mfloat8 type representation in memory from
i8
to<1xi8>