[LV][AArch64] LoopVectorizer allows scalable frem instructions #76247

paschalis-mpeis · 2023-12-22T16:06:20Z

In AArch64, when an 'frem' instruction uses scalable vectors, it will be replaced with a vector library call. LoopVectorize is now aware of that so it no longer returns invalid costs.

When it is not scalable, it returns the default costs, which are delegated to the BaseT TTI Implementation.

llvmbot · 2023-12-22T16:06:49Z

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-backend-aarch64

Author: Paschalis Mpeis (paschalis-mpeis)

Changes

In AArch64, when an 'frem' instruction uses scalable vectors, it will be replaced with a vector library call. LoopVectorize is now aware of that so it no longer returns invalid costs.

When it is not scalable, it returns the default costs, which are delegated to the BaseT TTI Implementation.

Dependencies:

merged after: #76166
(not a stacked PR)

Full diff: https://github.com/llvm/llvm-project/pull/76247.diff

4 Files Affected:

(modified) llvm/lib/Analysis/TargetTransformInfo.cpp (-1)
(modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+15)
(modified) llvm/test/Analysis/CostModel/AArch64/arith-fp-sve.ll (+12-11)
(modified) llvm/test/Analysis/CostModel/AArch64/arith-fp.ll (+1-1)

diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index 3f76dfdaac317c..a50176d3ad7f7c 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -17,7 +17,6 @@
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/Operator.h"
-#include "llvm/IR/PatternMatch.h"
 #include "llvm/InitializePasses.h"
 #include "llvm/Support/CommandLine.h"
 #include <optional>
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index b5b8b68291786d..0a76c670c68a8b 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -2902,6 +2902,21 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
     if (!Ty->getScalarType()->isFP128Ty())
       return LT.first;
     [[fallthrough]];
+  case ISD::FREM: {
+    // Scalable frem instructions will be replaced with Vector library calls.
+    if (Ty->isScalableTy()) {
+      SmallVector<Type *, 4> OpTypes;
+      for (auto &Op : CxtI->operands())
+        OpTypes.push_back(Op->getType());
+
+      InstructionCost ScalableCost =
+          getCallInstrCost(nullptr, Ty, OpTypes, CostKind);
+      return ScalableCost;
+    } else {
+      return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info,
+                                           Op2Info);
+    }
+  }
   case ISD::FMUL:
   case ISD::FDIV:
     // These nodes are marked as 'custom' just to lower them to SVE.
diff --git a/llvm/test/Analysis/CostModel/AArch64/arith-fp-sve.ll b/llvm/test/Analysis/CostModel/AArch64/arith-fp-sve.ll
index 18a1c31c03f748..682bb5a58a7846 100644
--- a/llvm/test/Analysis/CostModel/AArch64/arith-fp-sve.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/arith-fp-sve.ll
@@ -1,7 +1,8 @@
-; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
-; RUN: opt < %s -enable-no-nans-fp-math -passes="print<cost-model>" 2>&1 -disable-output -mtriple=aarch64 -mattr=+fullfp16 -mattr=+sve | FileCheck %s
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -mattr=+sve -mattr=+fullfp16 -enable-no-nans-fp-math -disable-output -passes="print<cost-model>" %s 2>&1 | FileCheck %s
+
+target triple = "aarch64-unknown-linux-gnu"
 
-target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
 
 define void @fadd() {
 ; CHECK-LABEL: 'fadd'
@@ -137,14 +138,14 @@ define void @fdiv() {
 
 define void @frem() {
 ; CHECK-LABEL: 'frem'
-; CHECK-NEXT:  Cost Model: Invalid cost for instruction: %V4F16 = frem <vscale x 4 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Invalid cost for instruction: %V8F16 = frem <vscale x 8 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Invalid cost for instruction: %V16F16 = frem <vscale x 16 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Invalid cost for instruction: %V2F32 = frem <vscale x 2 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Invalid cost for instruction: %V4F32 = frem <vscale x 4 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Invalid cost for instruction: %V8F32 = frem <vscale x 8 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Invalid cost for instruction: %V2F64 = frem <vscale x 2 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Invalid cost for instruction: %V4F64 = frem <vscale x 4 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %V4F16 = frem <vscale x 4 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %V8F16 = frem <vscale x 8 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %V16F16 = frem <vscale x 16 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %V2F32 = frem <vscale x 2 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %V4F32 = frem <vscale x 4 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %V8F32 = frem <vscale x 8 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %V2F64 = frem <vscale x 2 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %V4F64 = frem <vscale x 4 x double> undef, undef
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
   %V4F16 = frem <vscale x 4 x half> undef, undef
diff --git a/llvm/test/Analysis/CostModel/AArch64/arith-fp.ll b/llvm/test/Analysis/CostModel/AArch64/arith-fp.ll
index c352892354fc24..403ee8e861387e 100644
--- a/llvm/test/Analysis/CostModel/AArch64/arith-fp.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/arith-fp.ll
@@ -1,7 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
 ; RUN: opt < %s -enable-no-nans-fp-math  -passes="print<cost-model>" 2>&1 -disable-output -mtriple=aarch64 -mattr=+fullfp16 | FileCheck %s
 
-target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
+target triple = "aarch64-unknown-linux-gnu"
 
 define i32 @fadd(i32 %arg) {
 ; CHECK-LABEL: 'fadd'

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/test/Analysis/CostModel/AArch64/arith-fp-sve.ll

paschalis-mpeis · 2024-01-04T11:38:23Z

Forced pushed a041258 to additionally rebase to main.

CostModel is now computed in LoopVectorizer.

fhahn · 2024-01-04T11:56:14Z

We are already checking if a library function exists, wouldn't it be more accurate to replace to widen recipe with a WidenCall recipe with the function?

llvm/test/Analysis/CostModel/AArch64/arith-fp-sve.ll

llvm/test/Analysis/CostModel/AArch64/arith-fp.ll

fhahn · 2024-01-04T11:54:48Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+          SmallVector<Type *, 4> OpTypes;
+          for (auto &Op : I->operands())
+            OpTypes.push_back(Op->getType());
+          return TTI.getCallInstrCost(nullptr, VectorTy, OpTypes, CostKind);


IIUC this doesn't account for the cost of the frem implementation itself. What about the case where a library provides frem, but it is more expensive?

Now it compares against the default cost. Since this is now in place, the restriction to scalable types (in the next comment is lifted.

fhahn · 2024-01-04T11:55:09Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -7142,6 +7142,19 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
        Legal->isInvariant(Op2))
      Op2Info.Kind = TargetTransformInfo::OK_UniformValue;

+    // Some targets replace frem with vector library calls.
+    if (I->getOpcode() == Instruction::FRem && VectorTy->isScalableTy()) {


Why limit to scalable types?

paschalis-mpeis · 2024-01-09T16:06:18Z

We are already checking if a library function exists, wouldn't it be more accurate to replace to widen recipe with a WidenCall recipe with the function?

@fhahn unfortunately, I will need a CallInst to do that, as tryToWidenCall / VPWidenCallRecipe and their dependencies do not seem to be easily adaptable for Instructions.

In AArch64, when an 'frem' instruction uses scalable vectors, it will be replaced with a vector library call. LoopVectorize is now aware of that so it no longer returns invalid costs. When it is not scalable, it returns the default costs, which are delegated to the BaseT TTI Implementation.

LoopVectorizer is aware when a target can replace a scalable frem instruction with a vector library call and it returns the relevant cost. Otherwise, it returns an invalid cost (as previously). Add test that check costs on AArch64, when there is no vector library available and when there is, with and without tail-folding. NOTE: Invoking CostModel directly (not through LV) would still return invalid costs. To avoid this, it would require passing TargetLibraryInfo to TargetTransformInfo API, which we preferred not to as the 'frem' instruction is an isolated case.

Comparing against the default cost, and no longer restricting to scalable types.

llvm/test/Analysis/CostModel/AArch64/arith-fp-sve-frem.ll

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

mgabka · 2024-01-11T09:58:58Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+            OpTypes.push_back(Op->getType());
+          auto CallCost =
+              TTI.getCallInstrCost(nullptr, VectorTy, OpTypes, CostKind);
+          return std::min(InstrCost, CallCost);


I am not sure if that is correct.
As there is no logic in the ReplaceWithVeclib pass deciding when the frem with vector operands gets replaced to call to fmod/fmodf.
I think the idea implemented at the moment is assuming that if there are mappings in the TLI to vector functions they are worth to use always.

Otherwise we can have a patological case when:

CallCost is Invalid or very high

frem gets vectorised because of cheap cost of InstrCost (vector frem)

replaceWIthVeclib still decided to replace vector frem with call to vector fmod

In my opinion that is more misleading then just returning here CallCost.

@huntergr-arm what do you think?

@mgabka If CallCost is invalid because there's no available mapping or very high because it's suboptimal and the target in question (not AArch64, at least at present) has an frem instruction, we would want to return InstrCost.

If replace-with-veclib were to override that, I think it would be a bug in that pass.

The below PR will bring such functionality to replace-with-veclib. Thank you all for your input!

[TLI] ReplaceWithVecLib pass uses CostModel #78688

llvm/test/Analysis/CostModel/AArch64/arith-fp-sve-frem.ll

Added neon tests as well.

mgabka

It LGTM, but probably worth to wait some time to give a chance @huntergr-arm and @fhahn to also have a look.

huntergr-arm

LGTM

…76247) LoopVectorizer is aware when a target can replace a scalable frem instruction with a vector library call for a given VF and it returns the relevant cost. Otherwise, it returns an invalid cost (as previously). Add test that check costs on AArch64, when there is no vector library available and when there is (with and without tail-folding). NOTE: Invoking CostModel directly (not through LV) would still return invalid costs.

llvmbot added backend:AArch64 llvm:analysis labels Dec 22, 2023

mgabka requested review from paulwalker-arm, huntergr-arm and mgabka December 28, 2023 13:56

mgabka requested changes Dec 28, 2023

View reviewed changes

llvm/lib/Analysis/TargetTransformInfo.cpp Outdated Show resolved Hide resolved

llvm/test/Analysis/CostModel/AArch64/arith-fp-sve.ll Outdated Show resolved Hide resolved

paschalis-mpeis force-pushed the users/paschalis-mpeis/loop-vectorizer-frem-scalable branch from 8e85c51 to a041258 Compare January 4, 2024 11:35

llvmbot added vectorizers llvm:transforms labels Jan 4, 2024

paschalis-mpeis requested a review from labrinea January 4, 2024 11:37

fhahn reviewed Jan 4, 2024

View reviewed changes

paschalis-mpeis added 3 commits January 9, 2024 17:02

Addressing reviewers.

ca9f165

Comparing against the default cost, and no longer restricting to scalable types.

paschalis-mpeis force-pushed the users/paschalis-mpeis/loop-vectorizer-frem-scalable branch from a041258 to ca9f165 Compare January 9, 2024 17:35

mgabka reviewed Jan 11, 2024

View reviewed changes

llvm/test/Analysis/CostModel/AArch64/arith-fp-sve-frem.ll Outdated Show resolved Hide resolved

Considering VF when getting vector call costs.

ca78dd0

Added neon tests as well.

mgabka approved these changes Jan 15, 2024

View reviewed changes

huntergr-arm approved these changes Jan 16, 2024

View reviewed changes

paschalis-mpeis merged commit 37c87d5 into main Jan 18, 2024

paschalis-mpeis deleted the users/paschalis-mpeis/loop-vectorizer-frem-scalable branch January 18, 2024 08:32

[LV][AArch64] LoopVectorizer allows scalable frem instructions #76247

[LV][AArch64] LoopVectorizer allows scalable frem instructions #76247

Uh oh!

Conversation

paschalis-mpeis commented Dec 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Dec 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependencies:

Uh oh!

Uh oh!

Uh oh!

paschalis-mpeis commented Jan 4, 2024

Uh oh!

fhahn commented Jan 4, 2024

Uh oh!

Uh oh!

Uh oh!

fhahn Jan 4, 2024

Choose a reason for hiding this comment

Uh oh!

paschalis-mpeis Jan 9, 2024

Choose a reason for hiding this comment

Uh oh!

fhahn Jan 4, 2024

Choose a reason for hiding this comment

Uh oh!

paschalis-mpeis commented Jan 9, 2024

Uh oh!

Uh oh!

Uh oh!

mgabka Jan 11, 2024

Choose a reason for hiding this comment

Uh oh!

huntergr-arm Jan 15, 2024

Choose a reason for hiding this comment

Uh oh!

paschalis-mpeis Jan 19, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mgabka left a comment

Choose a reason for hiding this comment

Uh oh!

huntergr-arm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

paschalis-mpeis commented Dec 22, 2023 •

edited

Loading

llvmbot commented Dec 22, 2023 •

edited

Loading