Skip to content

[LV][AArch64] LoopVectorizer allows scalable frem instructions #76247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

paschalis-mpeis
Copy link
Member

@paschalis-mpeis paschalis-mpeis commented Dec 22, 2023

In AArch64, when an 'frem' instruction uses scalable vectors, it will be replaced with a vector library call. LoopVectorize is now aware of that so it no longer returns invalid costs.

When it is not scalable, it returns the default costs, which are delegated to the BaseT TTI Implementation.

@llvmbot
Copy link
Member

llvmbot commented Dec 22, 2023

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-backend-aarch64

Author: Paschalis Mpeis (paschalis-mpeis)

Changes

In AArch64, when an 'frem' instruction uses scalable vectors, it will be replaced with a vector library call. LoopVectorize is now aware of that so it no longer returns invalid costs.

When it is not scalable, it returns the default costs, which are delegated to the BaseT TTI Implementation.


Dependencies:

  • merged after: #76166
  • (not a stacked PR)

Full diff: https://github.com/llvm/llvm-project/pull/76247.diff

4 Files Affected:

  • (modified) llvm/lib/Analysis/TargetTransformInfo.cpp (-1)
  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+15)
  • (modified) llvm/test/Analysis/CostModel/AArch64/arith-fp-sve.ll (+12-11)
  • (modified) llvm/test/Analysis/CostModel/AArch64/arith-fp.ll (+1-1)
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index 3f76dfdaac317c..a50176d3ad7f7c 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -17,7 +17,6 @@
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/Operator.h"
-#include "llvm/IR/PatternMatch.h"
 #include "llvm/InitializePasses.h"
 #include "llvm/Support/CommandLine.h"
 #include <optional>
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index b5b8b68291786d..0a76c670c68a8b 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -2902,6 +2902,21 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
     if (!Ty->getScalarType()->isFP128Ty())
       return LT.first;
     [[fallthrough]];
+  case ISD::FREM: {
+    // Scalable frem instructions will be replaced with Vector library calls.
+    if (Ty->isScalableTy()) {
+      SmallVector<Type *, 4> OpTypes;
+      for (auto &Op : CxtI->operands())
+        OpTypes.push_back(Op->getType());
+
+      InstructionCost ScalableCost =
+          getCallInstrCost(nullptr, Ty, OpTypes, CostKind);
+      return ScalableCost;
+    } else {
+      return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info,
+                                           Op2Info);
+    }
+  }
   case ISD::FMUL:
   case ISD::FDIV:
     // These nodes are marked as 'custom' just to lower them to SVE.
diff --git a/llvm/test/Analysis/CostModel/AArch64/arith-fp-sve.ll b/llvm/test/Analysis/CostModel/AArch64/arith-fp-sve.ll
index 18a1c31c03f748..682bb5a58a7846 100644
--- a/llvm/test/Analysis/CostModel/AArch64/arith-fp-sve.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/arith-fp-sve.ll
@@ -1,7 +1,8 @@
-; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
-; RUN: opt < %s -enable-no-nans-fp-math -passes="print<cost-model>" 2>&1 -disable-output -mtriple=aarch64 -mattr=+fullfp16 -mattr=+sve | FileCheck %s
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -mattr=+sve -mattr=+fullfp16 -enable-no-nans-fp-math -disable-output -passes="print<cost-model>" %s 2>&1 | FileCheck %s
+
+target triple = "aarch64-unknown-linux-gnu"
 
-target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
 
 define void @fadd() {
 ; CHECK-LABEL: 'fadd'
@@ -137,14 +138,14 @@ define void @fdiv() {
 
 define void @frem() {
 ; CHECK-LABEL: 'frem'
-; CHECK-NEXT:  Cost Model: Invalid cost for instruction: %V4F16 = frem <vscale x 4 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Invalid cost for instruction: %V8F16 = frem <vscale x 8 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Invalid cost for instruction: %V16F16 = frem <vscale x 16 x half> undef, undef
-; CHECK-NEXT:  Cost Model: Invalid cost for instruction: %V2F32 = frem <vscale x 2 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Invalid cost for instruction: %V4F32 = frem <vscale x 4 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Invalid cost for instruction: %V8F32 = frem <vscale x 8 x float> undef, undef
-; CHECK-NEXT:  Cost Model: Invalid cost for instruction: %V2F64 = frem <vscale x 2 x double> undef, undef
-; CHECK-NEXT:  Cost Model: Invalid cost for instruction: %V4F64 = frem <vscale x 4 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %V4F16 = frem <vscale x 4 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %V8F16 = frem <vscale x 8 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %V16F16 = frem <vscale x 16 x half> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %V2F32 = frem <vscale x 2 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %V4F32 = frem <vscale x 4 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %V8F32 = frem <vscale x 8 x float> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %V2F64 = frem <vscale x 2 x double> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %V4F64 = frem <vscale x 4 x double> undef, undef
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
   %V4F16 = frem <vscale x 4 x half> undef, undef
diff --git a/llvm/test/Analysis/CostModel/AArch64/arith-fp.ll b/llvm/test/Analysis/CostModel/AArch64/arith-fp.ll
index c352892354fc24..403ee8e861387e 100644
--- a/llvm/test/Analysis/CostModel/AArch64/arith-fp.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/arith-fp.ll
@@ -1,7 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
 ; RUN: opt < %s -enable-no-nans-fp-math  -passes="print<cost-model>" 2>&1 -disable-output -mtriple=aarch64 -mattr=+fullfp16 | FileCheck %s
 
-target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
+target triple = "aarch64-unknown-linux-gnu"
 
 define i32 @fadd(i32 %arg) {
 ; CHECK-LABEL: 'fadd'

@paschalis-mpeis paschalis-mpeis force-pushed the users/paschalis-mpeis/loop-vectorizer-frem-scalable branch from 8e85c51 to a041258 Compare January 4, 2024 11:35
@paschalis-mpeis
Copy link
Member Author

Forced pushed a041258 to additionally rebase to main.

CostModel is now computed in LoopVectorizer.

@fhahn
Copy link
Contributor

fhahn commented Jan 4, 2024

We are already checking if a library function exists, wouldn't it be more accurate to replace to widen recipe with a WidenCall recipe with the function?

SmallVector<Type *, 4> OpTypes;
for (auto &Op : I->operands())
OpTypes.push_back(Op->getType());
return TTI.getCallInstrCost(nullptr, VectorTy, OpTypes, CostKind);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this doesn't account for the cost of the frem implementation itself. What about the case where a library provides frem, but it is more expensive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it compares against the default cost. Since this is now in place, the restriction to scalable types (in the next comment is lifted.

@@ -7142,6 +7142,19 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
Legal->isInvariant(Op2))
Op2Info.Kind = TargetTransformInfo::OK_UniformValue;

// Some targets replace frem with vector library calls.
if (I->getOpcode() == Instruction::FRem && VectorTy->isScalableTy()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why limit to scalable types?

@paschalis-mpeis
Copy link
Member Author

We are already checking if a library function exists, wouldn't it be more accurate to replace to widen recipe with a WidenCall recipe with the function?

@fhahn unfortunately, I will need a CallInst to do that, as tryToWidenCall / VPWidenCallRecipe and their dependencies do not seem to be easily adaptable for Instructions.

In AArch64, when an 'frem' instruction uses scalable vectors, it will be
replaced with a vector library call. LoopVectorize is now aware of that
so it no longer returns invalid costs.

When it is not scalable, it returns the default costs, which are
delegated to the BaseT TTI Implementation.
LoopVectorizer is aware when a target can replace a scalable frem
instruction with a vector library call and it returns the relevant cost.
Otherwise, it returns an invalid cost (as previously).

Add test that check costs on AArch64, when there is no vector library
available and when there is, with and without tail-folding.

NOTE: Invoking CostModel directly (not through LV) would still return
invalid costs. To avoid this, it would require passing TargetLibraryInfo
to TargetTransformInfo API, which we preferred not to as the 'frem'
instruction is an isolated case.
Comparing against the default cost, and no longer restricting to
scalable types.
@paschalis-mpeis paschalis-mpeis force-pushed the users/paschalis-mpeis/loop-vectorizer-frem-scalable branch from a041258 to ca9f165 Compare January 9, 2024 17:35
OpTypes.push_back(Op->getType());
auto CallCost =
TTI.getCallInstrCost(nullptr, VectorTy, OpTypes, CostKind);
return std::min(InstrCost, CallCost);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if that is correct.
As there is no logic in the ReplaceWithVeclib pass deciding when the frem with vector operands gets replaced to call to fmod/fmodf.
I think the idea implemented at the moment is assuming that if there are mappings in the TLI to vector functions they are worth to use always.

Otherwise we can have a patological case when:

  1. CallCost is Invalid or very high
  2. frem gets vectorised because of cheap cost of InstrCost (vector frem)
  3. replaceWIthVeclib still decided to replace vector frem with call to vector fmod

In my opinion that is more misleading then just returning here CallCost.

@huntergr-arm what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgabka If CallCost is invalid because there's no available mapping or very high because it's suboptimal and the target in question (not AArch64, at least at present) has an frem instruction, we would want to return InstrCost.

If replace-with-veclib were to override that, I think it would be a bug in that pass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The below PR will bring such functionality to replace-with-veclib. Thank you all for your input!

Copy link
Contributor

@mgabka mgabka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It LGTM, but probably worth to wait some time to give a chance @huntergr-arm and @fhahn to also have a look.

Copy link
Collaborator

@huntergr-arm huntergr-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@paschalis-mpeis paschalis-mpeis merged commit 37c87d5 into main Jan 18, 2024
@paschalis-mpeis paschalis-mpeis deleted the users/paschalis-mpeis/loop-vectorizer-frem-scalable branch January 18, 2024 08:32
ampandey-1995 pushed a commit to ampandey-1995/llvm-project that referenced this pull request Jan 19, 2024
…76247)

LoopVectorizer is aware when a target can replace a scalable frem
instruction with a vector library call for a given VF and it returns the
relevant cost. Otherwise, it returns an invalid cost (as previously).

Add test that check costs on AArch64, when there is no vector library
available and when there is (with and without tail-folding).

NOTE: Invoking CostModel directly (not through LV) would still return
invalid costs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants