Skip to content

[AMDGPU][PromoteAlloca] Don't stop when an alloca is too big to promote #93466

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 28, 2024

Conversation

Pierre-vh
Copy link
Contributor

When I rewrote this, I made a mistake in the control flow. I thought we could just stop promoting if an alloca is too big to vectorize, but we can't. Other allocas in the list may be promotable and fit within the budget;

Fixes SWDEV-455343

When I rewrote this, I made a mistake in the control flow. I thought we could just stop promoting if an alloca is too big to vectorize, but we can't. Other allocas in the list may be promotable and fit within the budget;

Fixes SWDEV-455343
@llvmbot
Copy link
Member

llvmbot commented May 27, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)

Changes

When I rewrote this, I made a mistake in the control flow. I thought we could just stop promoting if an alloca is too big to vectorize, but we can't. Other allocas in the list may be promotable and fit within the budget;

Fixes SWDEV-455343


Full diff: https://github.com/llvm/llvm-project/pull/93466.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp (+16-14)
  • (added) llvm/test/CodeGen/AMDGPU/promote-alloca-budget-exhausted.ll (+41)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
index c0846b123d187..fbda8f973db99 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
@@ -333,22 +333,24 @@ bool AMDGPUPromoteAllocaImpl::run(Function &F, bool PromoteToLDS) {
   bool Changed = false;
   for (AllocaInst *AI : Allocas) {
     const unsigned AllocaCost = DL->getTypeSizeInBits(AI->getAllocatedType());
-    if (AllocaCost > VectorizationBudget) {
-      LLVM_DEBUG(dbgs() << "  Alloca too big for vectorization: " << *AI
-                        << "\n");
-      return Changed;
+    // First, check if we have enough budget to vectorize this alloca.
+    if (AllocaCost <= VectorizationBudget) {
+      // If we do, attempt vectorization, otherwise, fall through and try
+      // promoting to LDS instead.
+      if (tryPromoteAllocaToVector(*AI)) {
+        Changed = true;
+        assert((VectorizationBudget - AllocaCost) < VectorizationBudget &&
+               "Underflow!");
+        VectorizationBudget -= AllocaCost;
+        LLVM_DEBUG(dbgs() << "  Remaining vectorization budget:"
+                          << VectorizationBudget << "\n");
+        continue;
+      }
+    } else {
+      LLVM_DEBUG(dbgs() << "Alloca too big for vectorization (size:" << AllocaCost << ", budget:" << VectorizationBudget << "): " << *AI << "\n");
     }
 
-    if (tryPromoteAllocaToVector(*AI)) {
-      Changed = true;
-      assert((VectorizationBudget - AllocaCost) < VectorizationBudget &&
-             "Underflow!");
-      VectorizationBudget -= AllocaCost;
-      LLVM_DEBUG(dbgs() << "  Remaining vectorization budget:"
-                        << VectorizationBudget << "\n");
-      if (VectorizationBudget == 0)
-        break;
-    } else if (PromoteToLDS && tryPromoteAllocaToLDS(*AI, SufficientLDS))
+    if (PromoteToLDS && tryPromoteAllocaToLDS(*AI, SufficientLDS))
       Changed = true;
   }
 
diff --git a/llvm/test/CodeGen/AMDGPU/promote-alloca-budget-exhausted.ll b/llvm/test/CodeGen/AMDGPU/promote-alloca-budget-exhausted.ll
new file mode 100644
index 0000000000000..e13ab421dfdb7
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/promote-alloca-budget-exhausted.ll
@@ -0,0 +1,41 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -S -mtriple=amdgcn-unknown-amdhsa -mcpu=kaveri -amdgpu-promote-alloca-to-vector-limit=128 -passes=amdgpu-promote-alloca-to-vector %s -o - | FileCheck %s
+
+; Check that when we see an alloca that's too big to vectorize given the remaining budget,
+; we don't give up and we keep looking for other allocas to vectorize.
+
+define amdgpu_kernel void @simple_users_scores() {
+; CHECK-LABEL: define amdgpu_kernel void @simple_users_scores(
+; CHECK-SAME: ) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[MANYUSERS:%.*]] = alloca [64 x i64], align 4, addrspace(5)
+; CHECK-NEXT:    [[MANYUSERS_1:%.*]] = getelementptr i8, ptr addrspace(5) [[MANYUSERS]], i64 2
+; CHECK-NEXT:    [[V0:%.*]] = load i8, ptr addrspace(5) [[MANYUSERS_1]], align 1
+; CHECK-NEXT:    [[V0_EXT:%.*]] = zext i8 [[V0]] to i64
+; CHECK-NEXT:    store i64 [[V0_EXT]], ptr addrspace(5) [[MANYUSERS_1]], align 8
+; CHECK-NEXT:    [[MANYUSERS_2:%.*]] = getelementptr i8, ptr addrspace(5) [[MANYUSERS]], i64 1
+; CHECK-NEXT:    [[V1:%.*]] = load i8, ptr addrspace(5) [[MANYUSERS_2]], align 1
+; CHECK-NEXT:    [[V1_EXT:%.*]] = zext i8 [[V0]] to i64
+; CHECK-NEXT:    store i64 [[V1_EXT]], ptr addrspace(5) [[MANYUSERS_2]], align 8
+; CHECK-NEXT:    ret void
+;
+entry:
+  ; should get a score of 1
+  %simpleuser = alloca [4 x i64], align 4, addrspace(5)
+  ; should get a score of 4 and be visited first.
+  %manyusers = alloca [64 x i64], align 4, addrspace(5)
+
+  store i64 42, ptr addrspace(5) %simpleuser
+
+  %manyusers.1 = getelementptr i8, ptr addrspace(5) %manyusers, i64 2
+  %v0 = load i8, ptr addrspace(5)  %manyusers.1
+  %v0.ext = zext i8 %v0 to i64
+  store i64 %v0.ext, ptr addrspace(5) %manyusers.1
+
+  %manyusers.2 = getelementptr i8, ptr addrspace(5) %manyusers, i64 1
+  %v1 = load i8, ptr addrspace(5)  %manyusers.2
+  %v1.ext = zext i8 %v0 to i64
+  store i64 %v1.ext, ptr addrspace(5) %manyusers.2
+
+  ret void
+}

; Check that when we see an alloca that's too big to vectorize given the remaining budget,
; we don't give up and we keep looking for other allocas to vectorize.

define amdgpu_kernel void @simple_users_scores() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this patch, %simpleuser wouldn't be promoted because %manyusers is seen first, but it's too big and we'd simply stop.

Copy link

github-actions bot commented May 27, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@Pierre-vh Pierre-vh merged commit 0e73bbd into llvm:main May 28, 2024
7 checks passed
@Pierre-vh Pierre-vh deleted the fix-promotealloca-bug branch May 28, 2024 06:05
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request May 29, 2024
Reverts: 0e73bbd [AMDGPU][PromoteAlloca] Don't stop when an alloca is too big to promote (llvm#93466)

Change-Id: I1d86ab5653aa184da3062366f0f55d26f56f6db2
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Oct 1, 2024
… to promote (llvm#93466)

When I rewrote this, I made a mistake in the control flow. I thought we
could just stop promoting if an alloca is too big to vectorize, but we
can't. Other allocas in the list may be promotable and fit within the
budget.

Fixes SWDEV-455343
Fixes SWDEV-464683

Change-Id: Iedeabc3ee1c91500da13503097d2029f2838c8ce
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Nov 22, 2024
… to promote (llvm#93466)

When I rewrote this, I made a mistake in the control flow. I thought we
could just stop promoting if an alloca is too big to vectorize, but we
can't. Other allocas in the list may be promotable and fit within the
budget.

Fixes SWDEV-455343
Fixes SWDEV-464683

Change-Id: Iedeabc3ee1c91500da13503097d2029f2838c8ce
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants