-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[AArch64][TTI] Improve LegalVF
when computing gather-loads cost
#69617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64][TTI] Improve LegalVF
when computing gather-loads cost
#69617
Conversation
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-backend-aarch64 Author: Antonio Frighetto (antoniofrighetto) ChangesAfter determining the cost of loads that could not be coalesced into Current number of loads is held in Fixes: #68953. Full diff: https://github.com/llvm/llvm-project/pull/69617.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index d8a0e68d7123759..40ed3e6d327d882 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -2989,7 +2989,14 @@ InstructionCost AArch64TTIImpl::getGatherScatterOpCost(
ElementCount::getScalable(1))
return InstructionCost::getInvalid();
- ElementCount LegalVF = LT.second.getVectorElementCount();
+ ElementCount LegalVF;
+ if (LT.second.isVector()) {
+ LegalVF = LT.second.getVectorElementCount();
+ } else {
+ // If the legalized type is a simple type, treat it as a 1-element vector.
+ LegalVF = ElementCount::getFixed(1);
+ }
+
InstructionCost MemOpCost =
getMemoryOpCost(Opcode, VT->getElementType(), Alignment, 0, CostKind,
{TTI::OK_AnyValue, TTI::OP_None}, I);
diff --git a/llvm/test/Transforms/SLPVectorizer/AArch64/gather-load-fp128.ll b/llvm/test/Transforms/SLPVectorizer/AArch64/gather-load-fp128.ll
new file mode 100644
index 000000000000000..84de45bde06a62f
--- /dev/null
+++ b/llvm/test/Transforms/SLPVectorizer/AArch64/gather-load-fp128.ll
@@ -0,0 +1,37 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -S -passes=slp-vectorizer -mtriple=aarch64-unknown-linux-gnu -mcpu=neoverse-512tvb -pass-remarks-output=%t.yaml < %s | FileCheck %s
+; RUN: cat %t.yaml | FileCheck -check-prefix=REMARK %s
+
+; REMARK-LABEL: Function: gather_load_fp128
+; REMARK: Args:
+; REMARK-NEXT: - String: 'List vectorization was possible but not beneficial with cost '
+; REMARK-NEXT: - Cost: '0'
+; REMARK-NEXT: - String: ' >= '
+; REMARK-NEXT: - Treshold: '0'
+
+define void @gather_load_fp128(ptr %arg) #0 {
+; CHECK-LABEL: @gather_load_fp128(
+; CHECK-NEXT: [[GEP:%.*]] = getelementptr i8, ptr [[ARG:%.*]], i64 16
+; CHECK-NEXT: [[LOAD0:%.*]] = load fp128, ptr [[ARG]], align 1
+; CHECK-NEXT: [[LOAD1:%.*]] = load fp128, ptr [[GEP]], align 1
+; CHECK-NEXT: [[LOAD2:%.*]] = load fp128, ptr null, align 1
+; CHECK-NEXT: [[LOAD3:%.*]] = load fp128, ptr null, align 1
+; CHECK-NEXT: [[FCMP0:%.*]] = fcmp oeq fp128 [[LOAD0]], 0xL00000000000000000000000000000000
+; CHECK-NEXT: [[FCMP1:%.*]] = fcmp oeq fp128 [[LOAD1]], 0xL00000000000000000000000000000000
+; CHECK-NEXT: [[FCMP2:%.*]] = fcmp oeq fp128 [[LOAD2]], 0xL00000000000000000000000000000000
+; CHECK-NEXT: [[FCMP3:%.*]] = fcmp oeq fp128 [[LOAD3]], 0xL00000000000000000000000000000000
+; CHECK-NEXT: ret void
+;
+ %gep = getelementptr i8, ptr %arg, i64 16
+ %load0 = load fp128, ptr %arg, align 1
+ %load1 = load fp128, ptr %gep, align 1
+ %load2 = load fp128, ptr null, align 1
+ %load3 = load fp128, ptr null, align 1
+ %fcmp0 = fcmp oeq fp128 %load0, 0xL0
+ %fcmp1 = fcmp oeq fp128 %load1, 0xL0
+ %fcmp2 = fcmp oeq fp128 %load2, 0xL0
+ %fcmp3 = fcmp oeq fp128 %load3, 0xL0
+ ret void
+}
+
+attributes #0 = { vscale_range(2,2) }
|
ElementCount LegalVF; | ||
if (LT.second.isVector()) { | ||
LegalVF = LT.second.getVectorElementCount(); | ||
} else { | ||
// If the legalized type is a simple type, treat it as a 1-element vector. | ||
LegalVF = ElementCount::getFixed(1); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall isLegalMaskedGather() just return false for such types instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May that be the case though? Currently, few lines above, if useNeonVector
is available, the cost is computed via BaseT::getGatherScatterOpCost
before type-legalization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the vector type gets scalarized (like in your check), isLegalMaskedGather() shall return false for such types, even though useNeonVector() returns true. SLP shall not assume emitting masked gather here at all, the scalarization increases the cost significantly, we should end up with the simple gather here
@@ -0,0 +1,37 @@ | |||
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py | |||
; RUN: opt -S -passes=slp-vectorizer -mtriple=aarch64-unknown-linux-gnu -mcpu=neoverse-512tvb -pass-remarks-output=%t.yaml < %s | FileCheck %s | |||
; RUN: cat %t.yaml | FileCheck -check-prefix=REMARK %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is testing the remarks necessary, along with the output?
; REMARK-NEXT: - String: ' >= ' | ||
; REMARK-NEXT: - Treshold: '0' | ||
|
||
define void @gather_load_fp128(ptr %arg) #0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a variant with i128 too?
@@ -2989,7 +2989,14 @@ InstructionCost AArch64TTIImpl::getGatherScatterOpCost( | |||
ElementCount::getScalable(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be worth treating !LT.second.isVector() the same as 1 x scalable vectors. Either that or something should be checking that the element type isElementTypeLegalForScalableVector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we bail out when isLegalMaskedGather
returns false, which indeed would happen with such types, – as per @alexey-bataev suggestion above – I don't think we would ever be reaching this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't look like that function was called when I tried it, and it is best if these functions work correctly in isolation. It's possible to call it throught the cost model: https://godbolt.org/z/z4635vodq.
I think that because the gather will not become a scalable SVE gather in this case, the line above with useNeonVector should be returning true so we use the BaseT::getGatherScatterOpCost for the cost.
731ed9d
to
7b5fa1d
Compare
Addressed reviews, thanks! |
@@ -2996,6 +2996,9 @@ static unsigned getSVEGatherScatterOverhead(unsigned Opcode) { | |||
InstructionCost AArch64TTIImpl::getGatherScatterOpCost( | |||
unsigned Opcode, Type *DataTy, const Value *Ptr, bool VariableMask, | |||
Align Alignment, TTI::TargetCostKind CostKind, const Instruction *I) { | |||
if (!isLegalMaskedGatherScatter(DataTy)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't sound right - to return an invalid cost. If the gather/scatter can be scalarized then it should have some cost, even if it's quite high. There are some scalable vectors that can't be scalarized, but in general for fixed-width it should be OK.
Is the isElementTypeLegalForScalableVector check below enough to fix the issues you were seeing? We should probably have some fixed-length cost checks too.
If not, I think doing this would make sense, so that it returned the base cost:
if (useNeonVector(DataTy) || !isLegalMaskedGatherScatter(DataTy))
return BaseT::getGatherScatterOpCost(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought we wanted to favour an invalid cost in those cases, but makes sense. I confirm isElementTypeLegalForScalableVector
is fixing the issues previously seen.
7b5fa1d
to
c828e0d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this LGTM. Thanks for the fix!
After determining the cost of loads that could not be coalesced into `VectorizedLoads` in SLP, computing the cost of a gather-vectorized load is carried out. Favour a potentially high valid cost when the type of a group of loads, whose type is a vector of size dependent upon `VF`, may be legalized into a scalar value. Fixes: llvm#68953.
c828e0d
to
138e6c1
Compare
After determining the cost of loads that could not be coalesced into
VectorizedLoads
, computing the cost of a gather-vectorized load is carried out. Take into account the case where the type of a group of loads, whose type is a vector of size dependent uponVF
, may be legalized into a scalar value.Current number of loads is held in
LT.first
.Fixes: #68953.