-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[LV][AArch64] LoopVectorizer allows scalable frem instructions #76247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LV][AArch64] LoopVectorizer allows scalable frem instructions #76247
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-backend-aarch64 Author: Paschalis Mpeis (paschalis-mpeis) ChangesIn AArch64, when an 'frem' instruction uses scalable vectors, it will be replaced with a vector library call. LoopVectorize is now aware of that so it no longer returns invalid costs. When it is not scalable, it returns the default costs, which are delegated to the BaseT TTI Implementation. Dependencies:
Full diff: https://github.com/llvm/llvm-project/pull/76247.diff 4 Files Affected:
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index 3f76dfdaac317c..a50176d3ad7f7c 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -17,7 +17,6 @@
#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/Operator.h"
-#include "llvm/IR/PatternMatch.h"
#include "llvm/InitializePasses.h"
#include "llvm/Support/CommandLine.h"
#include <optional>
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index b5b8b68291786d..0a76c670c68a8b 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -2902,6 +2902,21 @@ InstructionCost AArch64TTIImpl::getArithmeticInstrCost(
if (!Ty->getScalarType()->isFP128Ty())
return LT.first;
[[fallthrough]];
+ case ISD::FREM: {
+ // Scalable frem instructions will be replaced with Vector library calls.
+ if (Ty->isScalableTy()) {
+ SmallVector<Type *, 4> OpTypes;
+ for (auto &Op : CxtI->operands())
+ OpTypes.push_back(Op->getType());
+
+ InstructionCost ScalableCost =
+ getCallInstrCost(nullptr, Ty, OpTypes, CostKind);
+ return ScalableCost;
+ } else {
+ return BaseT::getArithmeticInstrCost(Opcode, Ty, CostKind, Op1Info,
+ Op2Info);
+ }
+ }
case ISD::FMUL:
case ISD::FDIV:
// These nodes are marked as 'custom' just to lower them to SVE.
diff --git a/llvm/test/Analysis/CostModel/AArch64/arith-fp-sve.ll b/llvm/test/Analysis/CostModel/AArch64/arith-fp-sve.ll
index 18a1c31c03f748..682bb5a58a7846 100644
--- a/llvm/test/Analysis/CostModel/AArch64/arith-fp-sve.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/arith-fp-sve.ll
@@ -1,7 +1,8 @@
-; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
-; RUN: opt < %s -enable-no-nans-fp-math -passes="print<cost-model>" 2>&1 -disable-output -mtriple=aarch64 -mattr=+fullfp16 -mattr=+sve | FileCheck %s
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -mattr=+sve -mattr=+fullfp16 -enable-no-nans-fp-math -disable-output -passes="print<cost-model>" %s 2>&1 | FileCheck %s
+
+target triple = "aarch64-unknown-linux-gnu"
-target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
define void @fadd() {
; CHECK-LABEL: 'fadd'
@@ -137,14 +138,14 @@ define void @fdiv() {
define void @frem() {
; CHECK-LABEL: 'frem'
-; CHECK-NEXT: Cost Model: Invalid cost for instruction: %V4F16 = frem <vscale x 4 x half> undef, undef
-; CHECK-NEXT: Cost Model: Invalid cost for instruction: %V8F16 = frem <vscale x 8 x half> undef, undef
-; CHECK-NEXT: Cost Model: Invalid cost for instruction: %V16F16 = frem <vscale x 16 x half> undef, undef
-; CHECK-NEXT: Cost Model: Invalid cost for instruction: %V2F32 = frem <vscale x 2 x float> undef, undef
-; CHECK-NEXT: Cost Model: Invalid cost for instruction: %V4F32 = frem <vscale x 4 x float> undef, undef
-; CHECK-NEXT: Cost Model: Invalid cost for instruction: %V8F32 = frem <vscale x 8 x float> undef, undef
-; CHECK-NEXT: Cost Model: Invalid cost for instruction: %V2F64 = frem <vscale x 2 x double> undef, undef
-; CHECK-NEXT: Cost Model: Invalid cost for instruction: %V4F64 = frem <vscale x 4 x double> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V4F16 = frem <vscale x 4 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V8F16 = frem <vscale x 8 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V16F16 = frem <vscale x 16 x half> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V2F32 = frem <vscale x 2 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V4F32 = frem <vscale x 4 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V8F32 = frem <vscale x 8 x float> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V2F64 = frem <vscale x 2 x double> undef, undef
+; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %V4F64 = frem <vscale x 4 x double> undef, undef
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
%V4F16 = frem <vscale x 4 x half> undef, undef
diff --git a/llvm/test/Analysis/CostModel/AArch64/arith-fp.ll b/llvm/test/Analysis/CostModel/AArch64/arith-fp.ll
index c352892354fc24..403ee8e861387e 100644
--- a/llvm/test/Analysis/CostModel/AArch64/arith-fp.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/arith-fp.ll
@@ -1,7 +1,7 @@
; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
; RUN: opt < %s -enable-no-nans-fp-math -passes="print<cost-model>" 2>&1 -disable-output -mtriple=aarch64 -mattr=+fullfp16 | FileCheck %s
-target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
+target triple = "aarch64-unknown-linux-gnu"
define i32 @fadd(i32 %arg) {
; CHECK-LABEL: 'fadd'
|
8e85c51
to
a041258
Compare
Forced pushed a041258 to additionally rebase to main. CostModel is now computed in LoopVectorizer. |
We are already checking if a library function exists, wouldn't it be more accurate to replace to widen recipe with a WidenCall recipe with the function? |
SmallVector<Type *, 4> OpTypes; | ||
for (auto &Op : I->operands()) | ||
OpTypes.push_back(Op->getType()); | ||
return TTI.getCallInstrCost(nullptr, VectorTy, OpTypes, CostKind); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this doesn't account for the cost of the frem
implementation itself. What about the case where a library provides frem
, but it is more expensive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it compares against the default cost. Since this is now in place, the restriction to scalable types (in the next comment is lifted.
@@ -7142,6 +7142,19 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF, | |||
Legal->isInvariant(Op2)) | |||
Op2Info.Kind = TargetTransformInfo::OK_UniformValue; | |||
|
|||
// Some targets replace frem with vector library calls. | |||
if (I->getOpcode() == Instruction::FRem && VectorTy->isScalableTy()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why limit to scalable types?
@fhahn unfortunately, I will need a |
In AArch64, when an 'frem' instruction uses scalable vectors, it will be replaced with a vector library call. LoopVectorize is now aware of that so it no longer returns invalid costs. When it is not scalable, it returns the default costs, which are delegated to the BaseT TTI Implementation.
LoopVectorizer is aware when a target can replace a scalable frem instruction with a vector library call and it returns the relevant cost. Otherwise, it returns an invalid cost (as previously). Add test that check costs on AArch64, when there is no vector library available and when there is, with and without tail-folding. NOTE: Invoking CostModel directly (not through LV) would still return invalid costs. To avoid this, it would require passing TargetLibraryInfo to TargetTransformInfo API, which we preferred not to as the 'frem' instruction is an isolated case.
Comparing against the default cost, and no longer restricting to scalable types.
a041258
to
ca9f165
Compare
OpTypes.push_back(Op->getType()); | ||
auto CallCost = | ||
TTI.getCallInstrCost(nullptr, VectorTy, OpTypes, CostKind); | ||
return std::min(InstrCost, CallCost); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if that is correct.
As there is no logic in the ReplaceWithVeclib pass deciding when the frem with vector operands gets replaced to call to fmod/fmodf.
I think the idea implemented at the moment is assuming that if there are mappings in the TLI to vector functions they are worth to use always.
Otherwise we can have a patological case when:
- CallCost is Invalid or very high
- frem gets vectorised because of cheap cost of InstrCost (vector frem)
- replaceWIthVeclib still decided to replace vector frem with call to vector fmod
In my opinion that is more misleading then just returning here CallCost.
@huntergr-arm what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgabka If CallCost is invalid because there's no available mapping or very high because it's suboptimal and the target in question (not AArch64, at least at present) has an frem instruction, we would want to return InstrCost.
If replace-with-veclib were to override that, I think it would be a bug in that pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The below PR will bring such functionality to replace-with-veclib
. Thank you all for your input!
Added neon tests as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It LGTM, but probably worth to wait some time to give a chance @huntergr-arm and @fhahn to also have a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…76247) LoopVectorizer is aware when a target can replace a scalable frem instruction with a vector library call for a given VF and it returns the relevant cost. Otherwise, it returns an invalid cost (as previously). Add test that check costs on AArch64, when there is no vector library available and when there is (with and without tail-folding). NOTE: Invoking CostModel directly (not through LV) would still return invalid costs.
In AArch64, when an 'frem' instruction uses scalable vectors, it will be replaced with a vector library call. LoopVectorize is now aware of that so it no longer returns invalid costs.
When it is not scalable, it returns the default costs, which are delegated to the BaseT TTI Implementation.