-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[AArch64][LV][SLP] Vectorizers use call cost for vectorized frem #82488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64][LV][SLP] Vectorizers use call cost for vectorized frem #82488
Conversation
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-backend-aarch64 Author: Paschalis Mpeis (paschalis-mpeis) ChangesWhen vector library calls are available for frem, given its type and This allows 'superword-level' vectorization, which can be converted to Add tests that vectorize code that contains 2x double and 4x float frem Stacked PR:
Full diff: https://github.com/llvm/llvm-project/pull/82488.diff 5 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 6655931181c2d5..ce8cd629bf501d 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -2972,6 +2972,13 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info,
Op2Info);
+ case ISD::FREM:
+ // Pass nullptr as fmod/fmodf calls are emitted by the backend even when
+ // those functions are not delcared in the module.
+ if (!Ty->isVectorTy())
+ return getCallInstrCost(/*Function*/ nullptr, Ty, {Ty, Ty}, CostKind);
+ return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info,
+ Op2Info);
}
}
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 4e334748c95934..effe52fe2c4e31 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -8362,9 +8362,20 @@ BoUpSLP::getEntryCost(const TreeEntry *E, ArrayRef<Value *> VectorizedVals,
unsigned OpIdx = isa<UnaryOperator>(VL0) ? 0 : 1;
TTI::OperandValueInfo Op1Info = getOperandInfo(E->getOperand(0));
TTI::OperandValueInfo Op2Info = getOperandInfo(E->getOperand(OpIdx));
- return TTI->getArithmeticInstrCost(ShuffleOrOp, VecTy, CostKind, Op1Info,
- Op2Info) +
- CommonCost;
+ auto VecCost = TTI->getArithmeticInstrCost(ShuffleOrOp, VecTy, CostKind,
+ Op1Info, Op2Info);
+ // Some targets can replace frem with vector library calls.
+ if (ShuffleOrOp == Instruction::FRem) {
+ LibFunc Func;
+ if (TLI->getLibFunc(ShuffleOrOp, ScalarTy, Func) &&
+ TLI->isFunctionVectorizable(TLI->getName(Func),
+ VecTy->getElementCount())) {
+ auto VecCallCost = TTI->getCallInstrCost(
+ nullptr, VecTy, {ScalarTy, ScalarTy}, CostKind);
+ VecCost = std::min(VecCost, VecCallCost);
+ }
+ }
+ return VecCost + CommonCost;
};
return GetCostDiff(GetScalarCost, GetVectorCost);
}
diff --git a/llvm/test/Analysis/CostModel/AArch64/arith-fp-frem.ll b/llvm/test/Analysis/CostModel/AArch64/arith-fp-frem.ll
index 20e0ef7ea34281..63149adfa21587 100644
--- a/llvm/test/Analysis/CostModel/AArch64/arith-fp-frem.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/arith-fp-frem.ll
@@ -22,44 +22,44 @@ target triple = "aarch64-unknown-linux-gnu"
define void @frem_f64(ptr noalias %in.ptr, ptr noalias %out.ptr) {
; NEON-NO-VECLIB-LABEL: 'frem_f64'
-; NEON-NO-VECLIB: LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
-; NEON-NO-VECLIB: LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
+; NEON-NO-VECLIB: LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem double %in, %in
+; NEON-NO-VECLIB: LV: Found an estimated cost of 24 for VF 2 For instruction: %res = frem double %in, %in
;
; SVE-NO-VECLIB-LABEL: 'frem_f64'
-; SVE-NO-VECLIB: LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
-; SVE-NO-VECLIB: LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
+; SVE-NO-VECLIB: LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem double %in, %in
+; SVE-NO-VECLIB: LV: Found an estimated cost of 24 for VF 2 For instruction: %res = frem double %in, %in
; SVE-NO-VECLIB: LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem double %in, %in
; SVE-NO-VECLIB: LV: Found an estimated cost of Invalid for VF vscale x 2 For instruction: %res = frem double %in, %in
;
; NEON-ARMPL-LABEL: 'frem_f64'
-; NEON-ARMPL: LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
-; NEON-ARMPL: LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
+; NEON-ARMPL: LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem double %in, %in
+; NEON-ARMPL: LV: Found an estimated cost of 10 for VF 2 For instruction: %res = frem double %in, %in
;
; NEON-SLEEF-LABEL: 'frem_f64'
-; NEON-SLEEF: LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
-; NEON-SLEEF: LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
+; NEON-SLEEF: LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem double %in, %in
+; NEON-SLEEF: LV: Found an estimated cost of 10 for VF 2 For instruction: %res = frem double %in, %in
;
; SVE-ARMPL-LABEL: 'frem_f64'
-; SVE-ARMPL: LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
-; SVE-ARMPL: LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
+; SVE-ARMPL: LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem double %in, %in
+; SVE-ARMPL: LV: Found an estimated cost of 10 for VF 2 For instruction: %res = frem double %in, %in
; SVE-ARMPL: LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem double %in, %in
; SVE-ARMPL: LV: Found an estimated cost of 10 for VF vscale x 2 For instruction: %res = frem double %in, %in
;
; SVE-SLEEF-LABEL: 'frem_f64'
-; SVE-SLEEF: LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
-; SVE-SLEEF: LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
+; SVE-SLEEF: LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem double %in, %in
+; SVE-SLEEF: LV: Found an estimated cost of 10 for VF 2 For instruction: %res = frem double %in, %in
; SVE-SLEEF: LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem double %in, %in
; SVE-SLEEF: LV: Found an estimated cost of 10 for VF vscale x 2 For instruction: %res = frem double %in, %in
;
; SVE-ARMPL-TAILFOLD-LABEL: 'frem_f64'
-; SVE-ARMPL-TAILFOLD: LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
-; SVE-ARMPL-TAILFOLD: LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
+; SVE-ARMPL-TAILFOLD: LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem double %in, %in
+; SVE-ARMPL-TAILFOLD: LV: Found an estimated cost of 10 for VF 2 For instruction: %res = frem double %in, %in
; SVE-ARMPL-TAILFOLD: LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem double %in, %in
; SVE-ARMPL-TAILFOLD: LV: Found an estimated cost of 10 for VF vscale x 2 For instruction: %res = frem double %in, %in
;
; SVE-SLEEF-TAILFOLD-LABEL: 'frem_f64'
-; SVE-SLEEF-TAILFOLD: LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
-; SVE-SLEEF-TAILFOLD: LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
+; SVE-SLEEF-TAILFOLD: LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem double %in, %in
+; SVE-SLEEF-TAILFOLD: LV: Found an estimated cost of 10 for VF 2 For instruction: %res = frem double %in, %in
; SVE-SLEEF-TAILFOLD: LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem double %in, %in
; SVE-SLEEF-TAILFOLD: LV: Found an estimated cost of 10 for VF vscale x 2 For instruction: %res = frem double %in, %in
;
@@ -83,55 +83,55 @@ define void @frem_f64(ptr noalias %in.ptr, ptr noalias %out.ptr) {
define void @frem_f32(ptr noalias %in.ptr, ptr noalias %out.ptr) {
; NEON-NO-VECLIB-LABEL: 'frem_f32'
-; NEON-NO-VECLIB: LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
-; NEON-NO-VECLIB: LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
-; NEON-NO-VECLIB: LV: Found an estimated cost of 20 for VF 4 For instruction: %res = frem float %in, %in
+; NEON-NO-VECLIB: LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem float %in, %in
+; NEON-NO-VECLIB: LV: Found an estimated cost of 24 for VF 2 For instruction: %res = frem float %in, %in
+; NEON-NO-VECLIB: LV: Found an estimated cost of 52 for VF 4 For instruction: %res = frem float %in, %in
;
; SVE-NO-VECLIB-LABEL: 'frem_f32'
-; SVE-NO-VECLIB: LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
-; SVE-NO-VECLIB: LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
-; SVE-NO-VECLIB: LV: Found an estimated cost of 20 for VF 4 For instruction: %res = frem float %in, %in
+; SVE-NO-VECLIB: LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem float %in, %in
+; SVE-NO-VECLIB: LV: Found an estimated cost of 24 for VF 2 For instruction: %res = frem float %in, %in
+; SVE-NO-VECLIB: LV: Found an estimated cost of 52 for VF 4 For instruction: %res = frem float %in, %in
; SVE-NO-VECLIB: LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem float %in, %in
; SVE-NO-VECLIB: LV: Found an estimated cost of Invalid for VF vscale x 2 For instruction: %res = frem float %in, %in
; SVE-NO-VECLIB: LV: Found an estimated cost of Invalid for VF vscale x 4 For instruction: %res = frem float %in, %in
;
; NEON-ARMPL-LABEL: 'frem_f32'
-; NEON-ARMPL: LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
-; NEON-ARMPL: LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
+; NEON-ARMPL: LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem float %in, %in
+; NEON-ARMPL: LV: Found an estimated cost of 24 for VF 2 For instruction: %res = frem float %in, %in
; NEON-ARMPL: LV: Found an estimated cost of 10 for VF 4 For instruction: %res = frem float %in, %in
;
; NEON-SLEEF-LABEL: 'frem_f32'
-; NEON-SLEEF: LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
-; NEON-SLEEF: LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
+; NEON-SLEEF: LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem float %in, %in
+; NEON-SLEEF: LV: Found an estimated cost of 24 for VF 2 For instruction: %res = frem float %in, %in
; NEON-SLEEF: LV: Found an estimated cost of 10 for VF 4 For instruction: %res = frem float %in, %in
;
; SVE-ARMPL-LABEL: 'frem_f32'
-; SVE-ARMPL: LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
-; SVE-ARMPL: LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
+; SVE-ARMPL: LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem float %in, %in
+; SVE-ARMPL: LV: Found an estimated cost of 24 for VF 2 For instruction: %res = frem float %in, %in
; SVE-ARMPL: LV: Found an estimated cost of 10 for VF 4 For instruction: %res = frem float %in, %in
; SVE-ARMPL: LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem float %in, %in
; SVE-ARMPL: LV: Found an estimated cost of Invalid for VF vscale x 2 For instruction: %res = frem float %in, %in
; SVE-ARMPL: LV: Found an estimated cost of 10 for VF vscale x 4 For instruction: %res = frem float %in, %in
;
; SVE-SLEEF-LABEL: 'frem_f32'
-; SVE-SLEEF: LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
-; SVE-SLEEF: LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
+; SVE-SLEEF: LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem float %in, %in
+; SVE-SLEEF: LV: Found an estimated cost of 24 for VF 2 For instruction: %res = frem float %in, %in
; SVE-SLEEF: LV: Found an estimated cost of 10 for VF 4 For instruction: %res = frem float %in, %in
; SVE-SLEEF: LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem float %in, %in
; SVE-SLEEF: LV: Found an estimated cost of Invalid for VF vscale x 2 For instruction: %res = frem float %in, %in
; SVE-SLEEF: LV: Found an estimated cost of 10 for VF vscale x 4 For instruction: %res = frem float %in, %in
;
; SVE-ARMPL-TAILFOLD-LABEL: 'frem_f32'
-; SVE-ARMPL-TAILFOLD: LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
-; SVE-ARMPL-TAILFOLD: LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
+; SVE-ARMPL-TAILFOLD: LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem float %in, %in
+; SVE-ARMPL-TAILFOLD: LV: Found an estimated cost of 24 for VF 2 For instruction: %res = frem float %in, %in
; SVE-ARMPL-TAILFOLD: LV: Found an estimated cost of 10 for VF 4 For instruction: %res = frem float %in, %in
; SVE-ARMPL-TAILFOLD: LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem float %in, %in
; SVE-ARMPL-TAILFOLD: LV: Found an estimated cost of Invalid for VF vscale x 2 For instruction: %res = frem float %in, %in
; SVE-ARMPL-TAILFOLD: LV: Found an estimated cost of 10 for VF vscale x 4 For instruction: %res = frem float %in, %in
;
; SVE-SLEEF-TAILFOLD-LABEL: 'frem_f32'
-; SVE-SLEEF-TAILFOLD: LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
-; SVE-SLEEF-TAILFOLD: LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
+; SVE-SLEEF-TAILFOLD: LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem float %in, %in
+; SVE-SLEEF-TAILFOLD: LV: Found an estimated cost of 24 for VF 2 For instruction: %res = frem float %in, %in
; SVE-SLEEF-TAILFOLD: LV: Found an estimated cost of 10 for VF 4 For instruction: %res = frem float %in, %in
; SVE-SLEEF-TAILFOLD: LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem float %in, %in
; SVE-SLEEF-TAILFOLD: LV: Found an estimated cost of Invalid for VF vscale x 2 For instruction: %res = frem float %in, %in
diff --git a/llvm/test/Analysis/CostModel/AArch64/arith-fp.ll b/llvm/test/Analysis/CostModel/AArch64/arith-fp.ll
index c352892354fc24..497ade4f2f613c 100644
--- a/llvm/test/Analysis/CostModel/AArch64/arith-fp.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/arith-fp.ll
@@ -197,17 +197,17 @@ define i32 @fdiv(i32 %arg) {
define i32 @frem(i32 %arg) {
; CHECK-LABEL: 'frem'
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F16 = frem half undef, undef
-; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V4F16 = frem <4 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found an estimated cost of 44 for instruction: %V8F16 = frem <8 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found an estimated cost of 88 for instruction: %V16F16 = frem <16 x half> undef, undef
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V2F32 = frem <2 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %V4F32 = frem <4 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %V8F32 = frem <8 x float> undef, undef
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef
-; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V2F64 = frem <2 x double> undef, undef
-; CHECK-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V4F64 = frem <4 x double> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %F16 = frem half undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 52 for instruction: %V4F16 = frem <4 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 108 for instruction: %V8F16 = frem <8 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 216 for instruction: %V16F16 = frem <16 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %F32 = frem float undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V2F32 = frem <2 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 52 for instruction: %V4F32 = frem <4 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 104 for instruction: %V8F32 = frem <8 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %F64 = frem double undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V2F64 = frem <2 x double> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V4F64 = frem <4 x double> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;
%F16 = frem half undef, undef
diff --git a/llvm/test/Transforms/SLPVectorizer/AArch64/slp-frem.ll b/llvm/test/Transforms/SLPVectorizer/AArch64/slp-frem.ll
new file mode 100644
index 00000000000000..a38f4bdc4640e9
--- /dev/null
+++ b/llvm/test/Transforms/SLPVectorizer/AArch64/slp-frem.ll
@@ -0,0 +1,55 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt < %s -S -mtriple=aarch64 -vector-library=ArmPL -passes=slp-vectorizer | FileCheck %s
+
+@a = common global ptr null, align 8
+
+define void @frem_v2double() {
+; CHECK-LABEL: define void @frem_v2double() {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = load <2 x double>, ptr @a, align 8
+; CHECK-NEXT: [[TMP1:%.*]] = load <2 x double>, ptr @a, align 8
+; CHECK-NEXT: [[TMP2:%.*]] = frem <2 x double> [[TMP0]], [[TMP1]]
+; CHECK-NEXT: store <2 x double> [[TMP2]], ptr @a, align 8
+; CHECK-NEXT: ret void
+;
+entry:
+ %a0 = load double, ptr getelementptr inbounds (double, ptr @a, i64 0), align 8
+ %a1 = load double, ptr getelementptr inbounds (double, ptr @a, i64 1), align 8
+ %b0 = load double, ptr getelementptr inbounds (double, ptr @a, i64 0), align 8
+ %b1 = load double, ptr getelementptr inbounds (double, ptr @a, i64 1), align 8
+ %r0 = frem double %a0, %b0
+ %r1 = frem double %a1, %b1
+ store double %r0, ptr getelementptr inbounds (double, ptr @a, i64 0), align 8
+ store double %r1, ptr getelementptr inbounds (double, ptr @a, i64 1), align 8
+ ret void
+}
+
+define void @frem_v4float() {
+; CHECK-LABEL: define void @frem_v4float() {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = load <4 x float>, ptr @a, align 8
+; CHECK-NEXT: [[TMP1:%.*]] = load <4 x float>, ptr @a, align 8
+; CHECK-NEXT: [[TMP2:%.*]] = frem <4 x float> [[TMP0]], [[TMP1]]
+; CHECK-NEXT: store <4 x float> [[TMP2]], ptr @a, align 8
+; CHECK-NEXT: ret void
+;
+entry:
+ %a0 = load float, ptr getelementptr inbounds (float, ptr @a, i64 0), align 8
+ %a1 = load float, ptr getelementptr inbounds (float, ptr @a, i64 1), align 8
+ %a2 = load float, ptr getelementptr inbounds (float, ptr @a, i64 2), align 8
+ %a3 = load float, ptr getelementptr inbounds (float, ptr @a, i64 3), align 8
+ %b0 = load float, ptr getelementptr inbounds (float, ptr @a, i64 0), align 8
+ %b1 = load float, ptr getelementptr inbounds (float, ptr @a, i64 1), align 8
+ %b2 = load float, ptr getelementptr inbounds (float, ptr @a, i64 2), align 8
+ %b3 = load float, ptr getelementptr inbounds (float, ptr @a, i64 3), align 8
+ %r0 = frem float %a0, %b0
+ %r1 = frem float %a1, %b1
+ %r2 = frem float %a2, %b2
+ %r3 = frem float %a3, %b3
+ store float %r0, ptr getelementptr inbounds (float, ptr @a, i64 0), align 8
+ store float %r1, ptr getelementptr inbounds (float, ptr @a, i64 1), align 8
+ store float %r2, ptr getelementptr inbounds (float, ptr @a, i64 2), align 8
+ store float %r3, ptr getelementptr inbounds (float, ptr @a, i64 3), align 8
+ ret void
+}
+
|
3b12ec6
to
b4a7eed
Compare
Addressed reviewers and rebased to parent pr: Github is now rendering only the changes of this patch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've review the patch from both side so most of the comment will be void if you opt for the new TTI hook. That advantage of the TTI hook is that because it is specific to FREM you can hardware things like numbers of operands, which should streamline the implementation.
Hm, not sure adding getFRemInstrCost is the best solution here. I would more support adding TLI to getArithmeticInstrCost instead. Some other users may benefit from this too. Though getFRemInstrCost is still better than the current solution |
Changing |
That should be fine, what's the dangerous in it? |
The benefits of having
Edit: I won't use a dedicate switch as I don't want to duplicate those lambdas or introduce any weird fall-throughs. |
0a77f3a
to
71d6952
Compare
forced-push to rebase to main (parent PR #80423 merged, so this one is no longer stacked). This change, matches functionality on both LoopVectorizer and SLPVectorizer. What do you think about the latest change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments but others this looks fine to me.
What's so special about it? It is absolutely fine if the target does not support it.
This is not a problem at all. Instead you're adding a special API for a single instruction, which bloats the users of this API.
|
Currently,
If we encapsulate this functionality in So, until CodeGen gains support for emitting vectorized frem, we want to be very explicit about it. Examples:
In both examples, the return costs will favor replacing to a vectorized frem, however, it could be at a point in the pipeline where What's next:Ideally, we would also prefer to have this merged into So until then, we insist on being very explicit on those cases that knowingly call the special @alexey-bataev while not ideal, does the above make things clearer? Also, given we proceed with this approach for this patch, what changes would you require? |
Same applies to the newly introduced function
This is not the problem of this patch, it is the issue of future patches and the authors of these patches should handle it correctly. |
True, but the newly introduced function makes it quite explicit to it's users.
Exactly. Until codegen can fully handle frem, we'd prefer not to set any 'traps' for future patches and authors, and instead be very explicit about it. |
Same with adding the new parameter to the existing function, no difference at all.
As I said, this is not a problem of this patch |
If adding this new function that behaves differently to |
Perhaps a better option is for the code generater to emit the function calls as part of legalisation. If we did that then I'd be more comfortable with modifying |
@paschalis-mpeis, I'll investigate potential code generator changes and will report back. |
It needs updated costs when there are available vector library functions given the VF and type.
When vector library calls are available for frem, given its type and vector length, the SLP vectorizer uses updated costs that amount to a call, matching LoopVectorizer's functionality. This allows 'superword-level' vectorization, which can be converted to a vector lib call by later passes. Add tests that vectorize code that contains 2x double and 4x float frem instructions.
SLP vectorization for frem now happens when vector library calls are available, given its type and vector length. This is due to using the updated cost that amounts to a call. Add tests that do SLP vectorization for code that contains 2x double and 4x float frem instructions. LoopVectorizer now also uses getFRemInstrCost.
getArithmeticInstrCost is used by both LoopVectorizer and SLPVectorizer to compute the cost of frem, which becomes a call cost on AArch64 when TLI has a vector library function. Add tests that do SLP vectorization for code that contains 2x double and 4x float frem instructions.
5f1335a
to
6d508fb
Compare
@@ -8852,7 +8852,8 @@ BoUpSLP::getEntryCost(const TreeEntry *E, ArrayRef<Value *> VectorizedVals, | |||
TTI::OperandValueInfo Op1Info = getOperandInfo(E->getOperand(0)); | |||
TTI::OperandValueInfo Op2Info = getOperandInfo(E->getOperand(OpIdx)); | |||
return TTI->getArithmeticInstrCost(ShuffleOrOp, VecTy, CostKind, Op1Info, | |||
Op2Info) + | |||
Op2Info, ArrayRef<const Value *>(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Op2Info, ArrayRef<const Value *>(), | |
Op2Info, std::nullopt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied in the latest commit: (97a2ad9).
Thanks for the suggestion and the comments throughout the reviewing process.
I've landed #83859 so my previous concerns/objections are no longer relevant and have reset my approval accordingly. |
The below patch has landed by @paulwalker-arm allows us to safety encapsulate all functionality in This force-update rebases to main and restructure the code in SLP and Loop Vectorizers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've a couple of suggestions to consider but otherwise looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG
SLP vectorization for frem now happens when vector library calls are
available, given its type and vector length. This is due to using the
updated cost that amounts to a call.
Add tests that do SLP vectorization for code that contains 2x double and
4x float frem instructions.
LoopVectorizer now also uses getFRemInstrCost.