[AArch64][LV][SLP] Vectorizers use call cost for vectorized frem #82488

paschalis-mpeis · 2024-02-21T12:48:00Z

SLP vectorization for frem now happens when vector library calls are
available, given its type and vector length. This is due to using the
updated cost that amounts to a call.

Add tests that do SLP vectorization for code that contains 2x double and
4x float frem instructions.

LoopVectorizer now also uses getFRemInstrCost.

llvmbot · 2024-02-21T12:48:33Z

@llvm/pr-subscribers-llvm-analysis
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-aarch64

Author: Paschalis Mpeis (paschalis-mpeis)

Changes

When vector library calls are available for frem, given its type and
vector length, the SLP vectorizer uses updated costs that amount to a
call, matching LoopVectorizer's functionality.

This allows 'superword-level' vectorization, which can be converted to
a vector lib call by later passes.

Add tests that vectorize code that contains 2x double and 4x float frem
instructions.

Stacked PR:

Parent PR: #80423
Review commits >= TBA 982d28b

Full diff: https://github.com/llvm/llvm-project/pull/82488.diff

5 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+7)
(modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+14-3)
(modified) llvm/test/Analysis/CostModel/AArch64/arith-fp-frem.ll (+34-34)
(modified) llvm/test/Analysis/CostModel/AArch64/arith-fp.ll (+11-11)
(added) llvm/test/Transforms/SLPVectorizer/AArch64/slp-frem.ll (+55)

diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 6655931181c2d5..ce8cd629bf501d 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -2972,6 +2972,13 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
 
     return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info,
                                          Op2Info);
+  case ISD::FREM:
+    // Pass nullptr as fmod/fmodf calls are emitted by the backend even when
+    // those functions are not delcared in the module.
+    if (!Ty->isVectorTy())
+      return getCallInstrCost(/*Function*/ nullptr, Ty, {Ty, Ty}, CostKind);
+    return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info,
+                                         Op2Info);
   }
 }
 
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 4e334748c95934..effe52fe2c4e31 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -8362,9 +8362,20 @@ BoUpSLP::getEntryCost(const TreeEntry *E, ArrayRef<Value *> VectorizedVals,
       unsigned OpIdx = isa<UnaryOperator>(VL0) ? 0 : 1;
       TTI::OperandValueInfo Op1Info = getOperandInfo(E->getOperand(0));
       TTI::OperandValueInfo Op2Info = getOperandInfo(E->getOperand(OpIdx));
-      return TTI->getArithmeticInstrCost(ShuffleOrOp, VecTy, CostKind, Op1Info,
-                                         Op2Info) +
-             CommonCost;
+      auto VecCost = TTI->getArithmeticInstrCost(ShuffleOrOp, VecTy, CostKind,
+                                                 Op1Info, Op2Info);
+      // Some targets can replace frem with vector library calls.
+      if (ShuffleOrOp == Instruction::FRem) {
+        LibFunc Func;
+        if (TLI->getLibFunc(ShuffleOrOp, ScalarTy, Func) &&
+            TLI->isFunctionVectorizable(TLI->getName(Func),
+                                        VecTy->getElementCount())) {
+          auto VecCallCost = TTI->getCallInstrCost(
+              nullptr, VecTy, {ScalarTy, ScalarTy}, CostKind);
+          VecCost = std::min(VecCost, VecCallCost);
+        }
+      }
+      return VecCost + CommonCost;
     };
     return GetCostDiff(GetScalarCost, GetVectorCost);
   }
diff --git a/llvm/test/Analysis/CostModel/AArch64/arith-fp-frem.ll b/llvm/test/Analysis/CostModel/AArch64/arith-fp-frem.ll
index 20e0ef7ea34281..63149adfa21587 100644
--- a/llvm/test/Analysis/CostModel/AArch64/arith-fp-frem.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/arith-fp-frem.ll
@@ -22,44 +22,44 @@ target triple = "aarch64-unknown-linux-gnu"
 
 define void @frem_f64(ptr noalias %in.ptr, ptr noalias %out.ptr) {
 ; NEON-NO-VECLIB-LABEL: 'frem_f64'
-; NEON-NO-VECLIB:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
-; NEON-NO-VECLIB:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
+; NEON-NO-VECLIB:  LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem double %in, %in
+; NEON-NO-VECLIB:  LV: Found an estimated cost of 24 for VF 2 For instruction: %res = frem double %in, %in
 ;
 ; SVE-NO-VECLIB-LABEL: 'frem_f64'
-; SVE-NO-VECLIB:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
-; SVE-NO-VECLIB:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
+; SVE-NO-VECLIB:  LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem double %in, %in
+; SVE-NO-VECLIB:  LV: Found an estimated cost of 24 for VF 2 For instruction: %res = frem double %in, %in
 ; SVE-NO-VECLIB:  LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem double %in, %in
 ; SVE-NO-VECLIB:  LV: Found an estimated cost of Invalid for VF vscale x 2 For instruction: %res = frem double %in, %in
 ;
 ; NEON-ARMPL-LABEL: 'frem_f64'
-; NEON-ARMPL:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
-; NEON-ARMPL:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
+; NEON-ARMPL:  LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem double %in, %in
+; NEON-ARMPL:  LV: Found an estimated cost of 10 for VF 2 For instruction: %res = frem double %in, %in
 ;
 ; NEON-SLEEF-LABEL: 'frem_f64'
-; NEON-SLEEF:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
-; NEON-SLEEF:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
+; NEON-SLEEF:  LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem double %in, %in
+; NEON-SLEEF:  LV: Found an estimated cost of 10 for VF 2 For instruction: %res = frem double %in, %in
 ;
 ; SVE-ARMPL-LABEL: 'frem_f64'
-; SVE-ARMPL:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
-; SVE-ARMPL:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
+; SVE-ARMPL:  LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem double %in, %in
+; SVE-ARMPL:  LV: Found an estimated cost of 10 for VF 2 For instruction: %res = frem double %in, %in
 ; SVE-ARMPL:  LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem double %in, %in
 ; SVE-ARMPL:  LV: Found an estimated cost of 10 for VF vscale x 2 For instruction: %res = frem double %in, %in
 ;
 ; SVE-SLEEF-LABEL: 'frem_f64'
-; SVE-SLEEF:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
-; SVE-SLEEF:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
+; SVE-SLEEF:  LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem double %in, %in
+; SVE-SLEEF:  LV: Found an estimated cost of 10 for VF 2 For instruction: %res = frem double %in, %in
 ; SVE-SLEEF:  LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem double %in, %in
 ; SVE-SLEEF:  LV: Found an estimated cost of 10 for VF vscale x 2 For instruction: %res = frem double %in, %in
 ;
 ; SVE-ARMPL-TAILFOLD-LABEL: 'frem_f64'
-; SVE-ARMPL-TAILFOLD:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
-; SVE-ARMPL-TAILFOLD:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
+; SVE-ARMPL-TAILFOLD:  LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem double %in, %in
+; SVE-ARMPL-TAILFOLD:  LV: Found an estimated cost of 10 for VF 2 For instruction: %res = frem double %in, %in
 ; SVE-ARMPL-TAILFOLD:  LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem double %in, %in
 ; SVE-ARMPL-TAILFOLD:  LV: Found an estimated cost of 10 for VF vscale x 2 For instruction: %res = frem double %in, %in
 ;
 ; SVE-SLEEF-TAILFOLD-LABEL: 'frem_f64'
-; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem double %in, %in
-; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem double %in, %in
+; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem double %in, %in
+; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of 10 for VF 2 For instruction: %res = frem double %in, %in
 ; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem double %in, %in
 ; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of 10 for VF vscale x 2 For instruction: %res = frem double %in, %in
 ;
@@ -83,55 +83,55 @@ define void @frem_f64(ptr noalias %in.ptr, ptr noalias %out.ptr) {
 
 define void @frem_f32(ptr noalias %in.ptr, ptr noalias %out.ptr) {
 ; NEON-NO-VECLIB-LABEL: 'frem_f32'
-; NEON-NO-VECLIB:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
-; NEON-NO-VECLIB:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
-; NEON-NO-VECLIB:  LV: Found an estimated cost of 20 for VF 4 For instruction: %res = frem float %in, %in
+; NEON-NO-VECLIB:  LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem float %in, %in
+; NEON-NO-VECLIB:  LV: Found an estimated cost of 24 for VF 2 For instruction: %res = frem float %in, %in
+; NEON-NO-VECLIB:  LV: Found an estimated cost of 52 for VF 4 For instruction: %res = frem float %in, %in
 ;
 ; SVE-NO-VECLIB-LABEL: 'frem_f32'
-; SVE-NO-VECLIB:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
-; SVE-NO-VECLIB:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
-; SVE-NO-VECLIB:  LV: Found an estimated cost of 20 for VF 4 For instruction: %res = frem float %in, %in
+; SVE-NO-VECLIB:  LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem float %in, %in
+; SVE-NO-VECLIB:  LV: Found an estimated cost of 24 for VF 2 For instruction: %res = frem float %in, %in
+; SVE-NO-VECLIB:  LV: Found an estimated cost of 52 for VF 4 For instruction: %res = frem float %in, %in
 ; SVE-NO-VECLIB:  LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem float %in, %in
 ; SVE-NO-VECLIB:  LV: Found an estimated cost of Invalid for VF vscale x 2 For instruction: %res = frem float %in, %in
 ; SVE-NO-VECLIB:  LV: Found an estimated cost of Invalid for VF vscale x 4 For instruction: %res = frem float %in, %in
 ;
 ; NEON-ARMPL-LABEL: 'frem_f32'
-; NEON-ARMPL:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
-; NEON-ARMPL:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
+; NEON-ARMPL:  LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem float %in, %in
+; NEON-ARMPL:  LV: Found an estimated cost of 24 for VF 2 For instruction: %res = frem float %in, %in
 ; NEON-ARMPL:  LV: Found an estimated cost of 10 for VF 4 For instruction: %res = frem float %in, %in
 ;
 ; NEON-SLEEF-LABEL: 'frem_f32'
-; NEON-SLEEF:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
-; NEON-SLEEF:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
+; NEON-SLEEF:  LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem float %in, %in
+; NEON-SLEEF:  LV: Found an estimated cost of 24 for VF 2 For instruction: %res = frem float %in, %in
 ; NEON-SLEEF:  LV: Found an estimated cost of 10 for VF 4 For instruction: %res = frem float %in, %in
 ;
 ; SVE-ARMPL-LABEL: 'frem_f32'
-; SVE-ARMPL:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
-; SVE-ARMPL:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
+; SVE-ARMPL:  LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem float %in, %in
+; SVE-ARMPL:  LV: Found an estimated cost of 24 for VF 2 For instruction: %res = frem float %in, %in
 ; SVE-ARMPL:  LV: Found an estimated cost of 10 for VF 4 For instruction: %res = frem float %in, %in
 ; SVE-ARMPL:  LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem float %in, %in
 ; SVE-ARMPL:  LV: Found an estimated cost of Invalid for VF vscale x 2 For instruction: %res = frem float %in, %in
 ; SVE-ARMPL:  LV: Found an estimated cost of 10 for VF vscale x 4 For instruction: %res = frem float %in, %in
 ;
 ; SVE-SLEEF-LABEL: 'frem_f32'
-; SVE-SLEEF:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
-; SVE-SLEEF:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
+; SVE-SLEEF:  LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem float %in, %in
+; SVE-SLEEF:  LV: Found an estimated cost of 24 for VF 2 For instruction: %res = frem float %in, %in
 ; SVE-SLEEF:  LV: Found an estimated cost of 10 for VF 4 For instruction: %res = frem float %in, %in
 ; SVE-SLEEF:  LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem float %in, %in
 ; SVE-SLEEF:  LV: Found an estimated cost of Invalid for VF vscale x 2 For instruction: %res = frem float %in, %in
 ; SVE-SLEEF:  LV: Found an estimated cost of 10 for VF vscale x 4 For instruction: %res = frem float %in, %in
 ;
 ; SVE-ARMPL-TAILFOLD-LABEL: 'frem_f32'
-; SVE-ARMPL-TAILFOLD:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
-; SVE-ARMPL-TAILFOLD:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
+; SVE-ARMPL-TAILFOLD:  LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem float %in, %in
+; SVE-ARMPL-TAILFOLD:  LV: Found an estimated cost of 24 for VF 2 For instruction: %res = frem float %in, %in
 ; SVE-ARMPL-TAILFOLD:  LV: Found an estimated cost of 10 for VF 4 For instruction: %res = frem float %in, %in
 ; SVE-ARMPL-TAILFOLD:  LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem float %in, %in
 ; SVE-ARMPL-TAILFOLD:  LV: Found an estimated cost of Invalid for VF vscale x 2 For instruction: %res = frem float %in, %in
 ; SVE-ARMPL-TAILFOLD:  LV: Found an estimated cost of 10 for VF vscale x 4 For instruction: %res = frem float %in, %in
 ;
 ; SVE-SLEEF-TAILFOLD-LABEL: 'frem_f32'
-; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of 2 for VF 1 For instruction: %res = frem float %in, %in
-; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of 8 for VF 2 For instruction: %res = frem float %in, %in
+; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of 10 for VF 1 For instruction: %res = frem float %in, %in
+; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of 24 for VF 2 For instruction: %res = frem float %in, %in
 ; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of 10 for VF 4 For instruction: %res = frem float %in, %in
 ; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of Invalid for VF vscale x 1 For instruction: %res = frem float %in, %in
 ; SVE-SLEEF-TAILFOLD:  LV: Found an estimated cost of Invalid for VF vscale x 2 For instruction: %res = frem float %in, %in
diff --git a/llvm/test/Analysis/CostModel/AArch64/arith-fp.ll b/llvm/test/Analysis/CostModel/AArch64/arith-fp.ll
index c352892354fc24..497ade4f2f613c 100644
--- a/llvm/test/Analysis/CostModel/AArch64/arith-fp.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/arith-fp.ll
@@ -197,17 +197,17 @@ define i32 @fdiv(i32 %arg) {
 
 define i32 @frem(i32 %arg) {
 ; CHECK-LABEL: 'frem'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %F16 = frem half undef, undef
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 20 for instruction: %V4F16 = frem <4 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 44 for instruction: %V8F16 = frem <8 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 88 for instruction: %V16F16 = frem <16 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %F32 = frem float undef, undef
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V2F32 = frem <2 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 20 for instruction: %V4F32 = frem <4 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %V8F32 = frem <8 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %F64 = frem double undef, undef
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V2F64 = frem <2 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %V4F64 = frem <4 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %F16 = frem half undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 52 for instruction: %V4F16 = frem <4 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 108 for instruction: %V8F16 = frem <8 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 216 for instruction: %V16F16 = frem <16 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %F32 = frem float undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 24 for instruction: %V2F32 = frem <2 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 52 for instruction: %V4F32 = frem <4 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 104 for instruction: %V8F32 = frem <8 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %F64 = frem double undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 24 for instruction: %V2F64 = frem <2 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 48 for instruction: %V4F64 = frem <4 x double> undef, undef
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
 ;
   %F16 = frem half undef, undef
diff --git a/llvm/test/Transforms/SLPVectorizer/AArch64/slp-frem.ll b/llvm/test/Transforms/SLPVectorizer/AArch64/slp-frem.ll
new file mode 100644
index 00000000000000..a38f4bdc4640e9
--- /dev/null
+++ b/llvm/test/Transforms/SLPVectorizer/AArch64/slp-frem.ll
@@ -0,0 +1,55 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt < %s -S -mtriple=aarch64 -vector-library=ArmPL -passes=slp-vectorizer | FileCheck %s
+
+@a = common global ptr null, align 8
+
+define void @frem_v2double() {
+; CHECK-LABEL: define void @frem_v2double() {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = load <2 x double>, ptr @a, align 8
+; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x double>, ptr @a, align 8
+; CHECK-NEXT:    [[TMP2:%.*]] = frem <2 x double> [[TMP0]], [[TMP1]]
+; CHECK-NEXT:    store <2 x double> [[TMP2]], ptr @a, align 8
+; CHECK-NEXT:    ret void
+;
+entry:
+  %a0 = load double, ptr getelementptr inbounds (double, ptr @a, i64 0), align 8
+  %a1 = load double, ptr getelementptr inbounds (double, ptr @a, i64 1), align 8
+  %b0 = load double, ptr getelementptr inbounds (double, ptr @a, i64 0), align 8
+  %b1 = load double, ptr getelementptr inbounds (double, ptr @a, i64 1), align 8
+  %r0 = frem double %a0, %b0
+  %r1 = frem double %a1, %b1
+  store double %r0, ptr getelementptr inbounds (double, ptr @a, i64 0), align 8
+  store double %r1, ptr getelementptr inbounds (double, ptr @a, i64 1), align 8
+  ret void
+}
+
+define void @frem_v4float() {
+; CHECK-LABEL: define void @frem_v4float() {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = load <4 x float>, ptr @a, align 8
+; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x float>, ptr @a, align 8
+; CHECK-NEXT:    [[TMP2:%.*]] = frem <4 x float> [[TMP0]], [[TMP1]]
+; CHECK-NEXT:    store <4 x float> [[TMP2]], ptr @a, align 8
+; CHECK-NEXT:    ret void
+;
+entry:
+  %a0 = load float, ptr getelementptr inbounds (float, ptr @a, i64 0), align 8
+  %a1 = load float, ptr getelementptr inbounds (float, ptr @a, i64 1), align 8
+  %a2 = load float, ptr getelementptr inbounds (float, ptr @a, i64 2), align 8
+  %a3 = load float, ptr getelementptr inbounds (float, ptr @a, i64 3), align 8
+  %b0 = load float, ptr getelementptr inbounds (float, ptr @a, i64 0), align 8
+  %b1 = load float, ptr getelementptr inbounds (float, ptr @a, i64 1), align 8
+  %b2 = load float, ptr getelementptr inbounds (float, ptr @a, i64 2), align 8
+  %b3 = load float, ptr getelementptr inbounds (float, ptr @a, i64 3), align 8
+  %r0 = frem float %a0, %b0
+  %r1 = frem float %a1, %b1
+  %r2 = frem float %a2, %b2
+  %r3 = frem float %a3, %b3
+  store float %r0, ptr getelementptr inbounds (float, ptr @a, i64 0), align 8
+  store float %r1, ptr getelementptr inbounds (float, ptr @a, i64 1), align 8
+  store float %r2, ptr getelementptr inbounds (float, ptr @a, i64 2), align 8
+  store float %r3, ptr getelementptr inbounds (float, ptr @a, i64 3), align 8
+  ret void
+}
+

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

paschalis-mpeis · 2024-02-21T18:17:17Z

Addressed reviewers and rebased to parent pr:

[AArch64][CostModel] Improve scalar frem cost #80423

Github is now rendering only the changes of this patch.

llvm/lib/Analysis/TargetTransformInfo.cpp

paulwalker-arm

I've review the patch from both side so most of the comment will be void if you opt for the new TTI hook. That advantage of the TTI hook is that because it is specific to FREM you can hardware things like numbers of operands, which should streamline the implementation.

llvm/lib/Analysis/VectorUtils.cpp

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/include/llvm/Analysis/VectorUtils.h

llvm/lib/Analysis/VectorUtils.cpp

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

alexey-bataev · 2024-02-22T15:30:04Z

Personally I'm happy with keeping this nuance outside of TTI but if we really want this captured within TTI then I think it's time to break FREM into its own cost function (i.e. implement getFRemInstrCost. That way getArithmeticInstrCost can work as it does today and the new function can be documented to highlight it's assumption that if a TLI is passed in and a vector mapping is present then the return value is only valid based on it's assumption that vector FREM instructions will be transformed by a following transformation pass. I prefer this to say, adding TLI to getArithmeticInstrCost, because I'd rather users of getFRemInstrCost to explicitly enter into this contract.

Hm, not sure adding getFRemInstrCost is the best solution here. I would more support adding TLI to getArithmeticInstrCost instead. Some other users may benefit from this too. Though getFRemInstrCost is still better than the current solution

paulwalker-arm · 2024-02-22T15:34:56Z

Changing getArithmeticInstrCost is just too dangerous. What if one opcode needs TLI for a different reason? all of a sudden all existing callers are entered into the contract (FREM is guaranteed to be transformed into a math call) without ensuring that's actually the case.

alexey-bataev · 2024-02-22T15:40:33Z

Changing getArithmeticInstrCost is just too dangerous. What if one opcode needs TLI for a different reason?

That should be fine, what's the dangerous in it?

paschalis-mpeis · 2024-02-22T15:49:07Z

The benefits of havinggetFRemInstrCost in my view are the below:

frem is a special case anyway:
It's an IR instruction that is not supported by all hw and targets have to specialize.
Handling it ~~in a dedicated switch case~~ with a dedicated TTI function call, clearly exposes that information to anyone who reads the code in both vectorizers (and not obscuring it away).
Plus it won't add any if (TLI hasVecLib) doThis else doThat logic to the vectorizers.
This won't be a significant API change.
It won't force any other user of the getArithmeticInstrCost to go through that change.

Edit: I won't use a dedicate switch as I don't want to duplicate those lambdas or introduce any weird fall-throughs.

paschalis-mpeis · 2024-02-23T14:51:38Z

forced-push to rebase to main (parent PR #80423 merged, so this one is no longer stacked).

This change, matches functionality on both LoopVectorizer and SLPVectorizer.

What do you think about the latest change?
Thank you all for the valuable feedback so far.

paulwalker-arm

A few minor comments but others this looks fine to me.

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/include/llvm/Analysis/TargetTransformInfo.h

alexey-bataev · 2024-02-27T14:18:39Z

The benefits of havinggetFRemInstrCost in my view are the below:

frem is a special case anyway:
It's an IR instruction that is not supported by all hw and targets have to specialize.
Handling it ~~in a dedicated switch case~~ with a dedicated TTI function call, clearly exposes that information to anyone who reads the code in both vectorizers (and not obscuring it away).

What's so special about it? It is absolutely fine if the target does not support it.

Plus it won't add any if (TLI hasVecLib) doThis else doThat logic to the vectorizers.
2. This won't be a significant API change.
It won't force any other user of the getArithmeticInstrCost to go through that change.

This is not a problem at all. Instead you're adding a special API for a single instruction, which bloats the users of this API.

Edit: I won't use a dedicate switch as I don't want to duplicate those lambdas or introduce any weird fall-throughs.

paschalis-mpeis · 2024-02-28T17:11:56Z

Alexey: That should be fine, what's the dangerous in it?

Currently,frem relies on ReplaceWithVeclib pass, so if for whatever reason that pass does not ran, then codegen would crash.

Paul: What if one opcode needs TLI for a different reason? all of a sudden all existing callers are entered into the contract (FREM is guaranteed to be transformed into a math call) without ensuring that's actually the case

If we encapsulate this functionality in getArithmeticInstrCost, we are adding existing callers into this 'contract'.
What this means, is that they could potentially pass at some point in the future a TLI object, returning the updated costs in a scenario where ReplaceWithVeclib happens to not ran.

So, until CodeGen gains support for emitting vectorized frem, we want to be very explicit about it.

Examples:

A new pass might calls getArithmeticInstrCost and passes TLI there, which may happen to have veclib support for frem.
Similarly, a maintaiiner might decide to pass TLI to an existing callsite of getArithmeticInstrCost, getting again the updated costs.

In both examples, the return costs will favor replacing to a vectorized frem, however, it could be at a point in the pipeline where ReplaceWithVeclib does not ran, causing codegen to crash soon after.

What's next:

Ideally, we would also prefer to have this merged into getArithmeticInstrCost, but we are reluctant to do so until Codegen learns to deal with frem in all cases.

So until then, we insist on being very explicit on those cases that knowingly call the special getFRemInstrCost (SLP & Loop vectorizers). At this time, I could add a TODO note of such an intention to getFRemInstrCost, and in the future, given such codegen support, we'd be happy to fully abstract it away inside TTI.

@alexey-bataev while not ideal, does the above make things clearer? Also, given we proceed with this approach for this patch, what changes would you require?

alexey-bataev · 2024-02-28T17:17:20Z

Currently,frem relies on ReplaceWithVeclib pass, so if for whatever reason that pass does not ran, then codegen would crash.

Same applies to the newly introduced function

If we encapsulate this functionality in getArithmeticInstrCost, we are adding existing callers into this 'contract'.
What this means, is that they could potentially pass at some point in the future a TLI object, returning the updated costs in a scenario where ReplaceWithVeclib happens to not ran.

This is not the problem of this patch, it is the issue of future patches and the authors of these patches should handle it correctly.

paschalis-mpeis · 2024-02-28T17:25:21Z

Same applies to the newly introduced function

True, but the newly introduced function makes it quite explicit to it's users.
They can't use it by mistake without noticing. And now we'll have it in two places that we know are OK.

This is not the problem of this patch, it is the issue of future patches and the authors of these patches should handle it correctly.

Exactly. Until codegen can fully handle frem, we'd prefer not to set any 'traps' for future patches and authors, and instead be very explicit about it.

alexey-bataev · 2024-02-28T17:27:38Z

True, but the newly introduced function makes it quite explicit to it's users.
They can't use it by mistake without noticing. And now we'll have it in two places that we know are OK.

Same with adding the new parameter to the existing function, no difference at all.

Exactly. Until codegen can fully handle frem, we'd prefer not to set any 'traps' for future patches and authors, and instead be very explicit about it.

As I said, this is not a problem of this patch

paulwalker-arm · 2024-02-28T17:33:17Z

If adding this new function that behaves differently to getArithmeticInstrCost for the reasons described is such a sticking point then I think we cannot safely use scalable vector FRem instructions on AArch64 and perhaps we should instead change LoopVectorize and SLPVectorize to cost and emit the function calls directly? I’d rather not complicate those passes but it would better reflect the transformation that is actually happening and thus remove my concerns and any need to modify TTI.

paulwalker-arm · 2024-02-28T17:49:17Z

Perhaps a better option is for the code generater to emit the function calls as part of legalisation. If we did that then I'd be more comfortable with modifying getArithmeticInstrCost.

paulwalker-arm · 2024-02-28T18:17:25Z

@paschalis-mpeis, I'll investigate potential code generator changes and will report back.

llvm/include/llvm/Analysis/TargetTransformInfo.h

It needs updated costs when there are available vector library functions given the VF and type.

When vector library calls are available for frem, given its type and vector length, the SLP vectorizer uses updated costs that amount to a call, matching LoopVectorizer's functionality. This allows 'superword-level' vectorization, which can be converted to a vector lib call by later passes. Add tests that vectorize code that contains 2x double and 4x float frem instructions.

SLP vectorization for frem now happens when vector library calls are available, given its type and vector length. This is due to using the updated cost that amounts to a call. Add tests that do SLP vectorization for code that contains 2x double and 4x float frem instructions. LoopVectorizer now also uses getFRemInstrCost.

getArithmeticInstrCost is used by both LoopVectorizer and SLPVectorizer to compute the cost of frem, which becomes a call cost on AArch64 when TLI has a vector library function. Add tests that do SLP vectorization for code that contains 2x double and 4x float frem instructions.

alexey-bataev · 2024-03-11T18:08:31Z

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

@@ -8852,7 +8852,8 @@ BoUpSLP::getEntryCost(const TreeEntry *E, ArrayRef<Value *> VectorizedVals,
      TTI::OperandValueInfo Op1Info = getOperandInfo(E->getOperand(0));
      TTI::OperandValueInfo Op2Info = getOperandInfo(E->getOperand(OpIdx));
      return TTI->getArithmeticInstrCost(ShuffleOrOp, VecTy, CostKind, Op1Info,
-                                         Op2Info) +
+                                         Op2Info, ArrayRef<const Value *>(),


Suggested change

Op2Info, ArrayRef<const Value *>(),

Op2Info, std::nullopt,

Applied in the latest commit: (97a2ad9).
Thanks for the suggestion and the comments throughout the reviewing process.

paulwalker-arm · 2024-03-11T18:14:56Z

I've landed #83859 so my previous concerns/objections are no longer relevant and have reset my approval accordingly.

paschalis-mpeis · 2024-03-11T19:00:00Z

The below patch has landed by @paulwalker-arm allows us to safety encapsulate all functionality in getArithmeticInstrCost

[LLVM][CodeGen] Teach SelectionDAG how to expand FREM to a vector math call. #83859

This force-update rebases to main and restructure the code in SLP and Loop Vectorizers.

paulwalker-arm

I've a couple of suggestions to consider but otherwise looks good.

llvm/include/llvm/Analysis/TargetTransformInfo.h

alexey-bataev

LG

llvmbot added backend:AArch64 vectorizers llvm:analysis llvm:transforms labels Feb 21, 2024

alexey-bataev reviewed Feb 21, 2024

View reviewed changes

paschalis-mpeis requested review from paulwalker-arm, huntergr-arm and mgabka February 21, 2024 14:26

huntergr-arm reviewed Feb 21, 2024

View reviewed changes

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp Outdated Show resolved Hide resolved

paschalis-mpeis mentioned this pull request Feb 21, 2024

[AArch64][CostModel] Improve scalar frem cost #80423

Merged

paschalis-mpeis changed the base branch from main to users/paschalis-mpeis/improve-scalar-frem-costs February 21, 2024 15:54

paschalis-mpeis force-pushed the users/paschalis-mpeis/frem-slp-vectorization branch from 3b12ec6 to b4a7eed Compare February 21, 2024 18:14

paulwalker-arm reviewed Feb 21, 2024

View reviewed changes

llvm/lib/Analysis/TargetTransformInfo.cpp Outdated Show resolved Hide resolved

llvm/lib/Analysis/TargetTransformInfo.cpp Outdated Show resolved Hide resolved

paulwalker-arm reviewed Feb 22, 2024

View reviewed changes

alexey-bataev requested a review from RKSimon February 22, 2024 15:28

Base automatically changed from users/paschalis-mpeis/improve-scalar-frem-costs to main February 23, 2024 09:29

paschalis-mpeis changed the title ~~[AArch64] SLP can vectorize frem~~ [AArch64][LV][SLP] Vectorizers now use getFRemInstrCost for frem costs Feb 23, 2024

paschalis-mpeis changed the title ~~[AArch64][LV][SLP] Vectorizers now use getFRemInstrCost for frem costs~~ [AArch64][LV][SLP] Vectorizers use getFRemInstrCost for frem costs Feb 23, 2024

paschalis-mpeis force-pushed the users/paschalis-mpeis/frem-slp-vectorization branch from 0a77f3a to 71d6952 Compare February 23, 2024 14:44

paulwalker-arm reviewed Feb 23, 2024

View reviewed changes

paulwalker-arm approved these changes Feb 27, 2024

View reviewed changes

paschalis-mpeis requested a review from huntergr-arm February 28, 2024 10:27

RKSimon reviewed Mar 6, 2024

View reviewed changes

llvm/include/llvm/Analysis/TargetTransformInfo.h Outdated Show resolved Hide resolved

paschalis-mpeis added 6 commits March 11, 2024 10:58

SLP cannot vectorize frem calls in AArch64.

abe1b4e

It needs updated costs when there are available vector library functions given the VF and type.

Addressing reviewers

3ed8acc

Addressing reviewers (2)

ecd7da7

paschalis-mpeis force-pushed the users/paschalis-mpeis/frem-slp-vectorization branch from 5f1335a to 6d508fb Compare March 11, 2024 17:40

paschalis-mpeis changed the title ~~[AArch64][LV][SLP] Vectorizers use getFRemInstrCost for frem costs~~ [AArch64][LV][SLP] Vectorizers use call cost for vectorized frem Mar 11, 2024

alexey-bataev reviewed Mar 11, 2024

View reviewed changes

paulwalker-arm self-requested a review March 11, 2024 18:11

Addressing reviewers (3)

97a2ad9

paschalis-mpeis requested review from alexey-bataev and RKSimon March 11, 2024 19:12

paulwalker-arm approved these changes Mar 12, 2024

View reviewed changes

llvm/include/llvm/Analysis/TargetTransformInfo.h Outdated Show resolved Hide resolved

llvm/include/llvm/Analysis/TargetTransformInfo.h Outdated Show resolved Hide resolved

Addressing reviewers (4)

05c1986

alexey-bataev approved these changes Mar 13, 2024

View reviewed changes

paschalis-mpeis merged commit f795d1a into main Mar 14, 2024

paschalis-mpeis deleted the users/paschalis-mpeis/frem-slp-vectorization branch March 14, 2024 17:20

[AArch64][LV][SLP] Vectorizers use call cost for vectorized frem #82488

[AArch64][LV][SLP] Vectorizers use call cost for vectorized frem #82488

Uh oh!

Conversation

paschalis-mpeis commented Feb 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Stacked PR:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

paschalis-mpeis commented Feb 21, 2024

Uh oh!

Uh oh!

Uh oh!

paulwalker-arm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexey-bataev commented Feb 22, 2024

Uh oh!

paulwalker-arm commented Feb 22, 2024

Uh oh!

alexey-bataev commented Feb 22, 2024

Uh oh!

paschalis-mpeis commented Feb 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paschalis-mpeis commented Feb 23, 2024

Uh oh!

paulwalker-arm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexey-bataev commented Feb 27, 2024

Uh oh!

paschalis-mpeis commented Feb 28, 2024

Examples:

What's next:

Uh oh!

alexey-bataev commented Feb 28, 2024

Uh oh!

paschalis-mpeis commented Feb 28, 2024

Uh oh!

alexey-bataev commented Feb 28, 2024

Uh oh!

paulwalker-arm commented Feb 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paulwalker-arm commented Feb 28, 2024

Uh oh!

paulwalker-arm commented Feb 28, 2024

Uh oh!

Uh oh!

alexey-bataev Mar 11, 2024

Choose a reason for hiding this comment

Uh oh!

paschalis-mpeis Mar 11, 2024

Choose a reason for hiding this comment

Uh oh!

paulwalker-arm commented Mar 11, 2024

Uh oh!

paschalis-mpeis commented Mar 11, 2024

Uh oh!

paulwalker-arm left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

paschalis-mpeis commented Feb 21, 2024 •

edited

Loading

llvmbot commented Feb 21, 2024 •

edited

Loading

paschalis-mpeis commented Feb 22, 2024 •

edited

Loading

paulwalker-arm commented Feb 28, 2024 •

edited

Loading

paulwalker-arm left a comment •

edited

Loading