-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[AMDGPU][PromoteAlloca] Don't stop when an alloca is too big to promote #93466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When I rewrote this, I made a mistake in the control flow. I thought we could just stop promoting if an alloca is too big to vectorize, but we can't. Other allocas in the list may be promotable and fit within the budget; Fixes SWDEV-455343
@llvm/pr-subscribers-backend-amdgpu Author: Pierre van Houtryve (Pierre-vh) ChangesWhen I rewrote this, I made a mistake in the control flow. I thought we could just stop promoting if an alloca is too big to vectorize, but we can't. Other allocas in the list may be promotable and fit within the budget; Fixes SWDEV-455343 Full diff: https://github.com/llvm/llvm-project/pull/93466.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
index c0846b123d187..fbda8f973db99 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
@@ -333,22 +333,24 @@ bool AMDGPUPromoteAllocaImpl::run(Function &F, bool PromoteToLDS) {
bool Changed = false;
for (AllocaInst *AI : Allocas) {
const unsigned AllocaCost = DL->getTypeSizeInBits(AI->getAllocatedType());
- if (AllocaCost > VectorizationBudget) {
- LLVM_DEBUG(dbgs() << " Alloca too big for vectorization: " << *AI
- << "\n");
- return Changed;
+ // First, check if we have enough budget to vectorize this alloca.
+ if (AllocaCost <= VectorizationBudget) {
+ // If we do, attempt vectorization, otherwise, fall through and try
+ // promoting to LDS instead.
+ if (tryPromoteAllocaToVector(*AI)) {
+ Changed = true;
+ assert((VectorizationBudget - AllocaCost) < VectorizationBudget &&
+ "Underflow!");
+ VectorizationBudget -= AllocaCost;
+ LLVM_DEBUG(dbgs() << " Remaining vectorization budget:"
+ << VectorizationBudget << "\n");
+ continue;
+ }
+ } else {
+ LLVM_DEBUG(dbgs() << "Alloca too big for vectorization (size:" << AllocaCost << ", budget:" << VectorizationBudget << "): " << *AI << "\n");
}
- if (tryPromoteAllocaToVector(*AI)) {
- Changed = true;
- assert((VectorizationBudget - AllocaCost) < VectorizationBudget &&
- "Underflow!");
- VectorizationBudget -= AllocaCost;
- LLVM_DEBUG(dbgs() << " Remaining vectorization budget:"
- << VectorizationBudget << "\n");
- if (VectorizationBudget == 0)
- break;
- } else if (PromoteToLDS && tryPromoteAllocaToLDS(*AI, SufficientLDS))
+ if (PromoteToLDS && tryPromoteAllocaToLDS(*AI, SufficientLDS))
Changed = true;
}
diff --git a/llvm/test/CodeGen/AMDGPU/promote-alloca-budget-exhausted.ll b/llvm/test/CodeGen/AMDGPU/promote-alloca-budget-exhausted.ll
new file mode 100644
index 0000000000000..e13ab421dfdb7
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/promote-alloca-budget-exhausted.ll
@@ -0,0 +1,41 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -mtriple=amdgcn-unknown-amdhsa -mcpu=kaveri -amdgpu-promote-alloca-to-vector-limit=128 -passes=amdgpu-promote-alloca-to-vector %s -o - | FileCheck %s
+
+; Check that when we see an alloca that's too big to vectorize given the remaining budget,
+; we don't give up and we keep looking for other allocas to vectorize.
+
+define amdgpu_kernel void @simple_users_scores() {
+; CHECK-LABEL: define amdgpu_kernel void @simple_users_scores(
+; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: [[MANYUSERS:%.*]] = alloca [64 x i64], align 4, addrspace(5)
+; CHECK-NEXT: [[MANYUSERS_1:%.*]] = getelementptr i8, ptr addrspace(5) [[MANYUSERS]], i64 2
+; CHECK-NEXT: [[V0:%.*]] = load i8, ptr addrspace(5) [[MANYUSERS_1]], align 1
+; CHECK-NEXT: [[V0_EXT:%.*]] = zext i8 [[V0]] to i64
+; CHECK-NEXT: store i64 [[V0_EXT]], ptr addrspace(5) [[MANYUSERS_1]], align 8
+; CHECK-NEXT: [[MANYUSERS_2:%.*]] = getelementptr i8, ptr addrspace(5) [[MANYUSERS]], i64 1
+; CHECK-NEXT: [[V1:%.*]] = load i8, ptr addrspace(5) [[MANYUSERS_2]], align 1
+; CHECK-NEXT: [[V1_EXT:%.*]] = zext i8 [[V0]] to i64
+; CHECK-NEXT: store i64 [[V1_EXT]], ptr addrspace(5) [[MANYUSERS_2]], align 8
+; CHECK-NEXT: ret void
+;
+entry:
+ ; should get a score of 1
+ %simpleuser = alloca [4 x i64], align 4, addrspace(5)
+ ; should get a score of 4 and be visited first.
+ %manyusers = alloca [64 x i64], align 4, addrspace(5)
+
+ store i64 42, ptr addrspace(5) %simpleuser
+
+ %manyusers.1 = getelementptr i8, ptr addrspace(5) %manyusers, i64 2
+ %v0 = load i8, ptr addrspace(5) %manyusers.1
+ %v0.ext = zext i8 %v0 to i64
+ store i64 %v0.ext, ptr addrspace(5) %manyusers.1
+
+ %manyusers.2 = getelementptr i8, ptr addrspace(5) %manyusers, i64 1
+ %v1 = load i8, ptr addrspace(5) %manyusers.2
+ %v1.ext = zext i8 %v0 to i64
+ store i64 %v1.ext, ptr addrspace(5) %manyusers.2
+
+ ret void
+}
|
; Check that when we see an alloca that's too big to vectorize given the remaining budget, | ||
; we don't give up and we keep looking for other allocas to vectorize. | ||
|
||
define amdgpu_kernel void @simple_users_scores() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this patch, %simpleuser
wouldn't be promoted because %manyusers
is seen first, but it's too big and we'd simply stop.
✅ With the latest revision this PR passed the C/C++ code formatter. |
Reverts: 0e73bbd [AMDGPU][PromoteAlloca] Don't stop when an alloca is too big to promote (llvm#93466) Change-Id: I1d86ab5653aa184da3062366f0f55d26f56f6db2
… to promote (llvm#93466) When I rewrote this, I made a mistake in the control flow. I thought we could just stop promoting if an alloca is too big to vectorize, but we can't. Other allocas in the list may be promotable and fit within the budget. Fixes SWDEV-455343 Fixes SWDEV-464683 Change-Id: Iedeabc3ee1c91500da13503097d2029f2838c8ce
… to promote (llvm#93466) When I rewrote this, I made a mistake in the control flow. I thought we could just stop promoting if an alloca is too big to vectorize, but we can't. Other allocas in the list may be promotable and fit within the budget. Fixes SWDEV-455343 Fixes SWDEV-464683 Change-Id: Iedeabc3ee1c91500da13503097d2029f2838c8ce
When I rewrote this, I made a mistake in the control flow. I thought we could just stop promoting if an alloca is too big to vectorize, but we can't. Other allocas in the list may be promotable and fit within the budget;
Fixes SWDEV-455343