[AMDGPU] Handle unset/max flat workgroup size in waves/EU #139955

npanchen · 2025-05-14T20:20:04Z

When amdgpu-flat-work-group-size is either missed or set to maximum allowed [1, 1024], attributor won't change the state. This later results that getAAFor<AAAMDFlatWorkGroupSize> returns {0,0} and compiler crashes on FlatWorkGroupSize != 0 assertion.

When `amdgpu-flat-work-group-size` is either missed or set to maximum allowed [1, 1024], attributor won't change the state, which later results to `FlatWorkGroupSize != 0` assertion.

llvmbot · 2025-05-14T20:20:38Z

@llvm/pr-subscribers-backend-amdgpu

Author: Nikolay Panchenko (npanchen)

Changes

When amdgpu-flat-work-group-size is either missed or set to maximum allowed [1, 1024], attributor won't change the state. This later results that getAAFor<AAAMDFlatWorkGroupSize> returns {0,0} and compiler crashes on FlatWorkGroupSize != 0 assertion.

Full diff: https://github.com/llvm/llvm-project/pull/139955.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp (+8-2)
(added) llvm/test/CodeGen/AMDGPU/amdgpu-attributor-max-flat-wgs.ll (+35)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index 433144a60d120..52774ff9277b0 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -1170,13 +1170,19 @@ struct AAAMDWavesPerEU : public AAAMDSizeRangeAttribute {
           !AssumedGroupSize->isValidState())
         return false;
 
+      unsigned MinFWGSize =
+          AssumedGroupSize->getAssumed().getLower().getZExtValue();
+      unsigned MaxFWGSize =
+          AssumedGroupSize->getAssumed().getUpper().getZExtValue();
+      if (MinFWGSize == 0 && MaxFWGSize == 0)
+        std::tie(MinFWGSize, MaxFWGSize) =
+            InfoCache.getDefaultFlatWorkGroupSize(*Func);
       unsigned Min, Max;
       std::tie(Min, Max) = InfoCache.getEffectiveWavesPerEU(
           *Caller,
           {CallerInfo->getAssumed().getLower().getZExtValue(),
            CallerInfo->getAssumed().getUpper().getZExtValue() - 1},
-          {AssumedGroupSize->getAssumed().getLower().getZExtValue(),
-           AssumedGroupSize->getAssumed().getUpper().getZExtValue() - 1});
+          {MinFWGSize, MaxFWGSize - 1});
       ConstantRange CallerRange(APInt(32, Min), APInt(32, Max + 1));
       IntegerRangeState CallerRangeState(CallerRange);
       Change |= clampStateAndIndicateChange(this->getState(), CallerRangeState);
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-max-flat-wgs.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-max-flat-wgs.ll
new file mode 100644
index 0000000000000..680fbedead429
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-max-flat-wgs.ll
@@ -0,0 +1,35 @@
+; RUN: opt -S -mtriple=amdgcn-unknown-amdhsa -mcpu=gfx942 -passes=amdgpu-attributor %s | FileCheck %s
+
+; CHECK-LABEL: define internal fastcc void @call1(
+; CHECK-SAME: ) #[[ATTR0:[0-9]+]]
+define internal fastcc void @call1() #0 {
+  tail call fastcc void @call2()
+  ret void
+}
+
+; CHECK-LABEL: define internal fastcc void @call2(
+; CHECK-SAME: ) #[[ATTR0]]
+define internal fastcc void @call2() #1 {
+  tail call fastcc void @call5()
+  ret void
+}
+
+; CHECK-LABEL: define { ptr addrspace(1), ptr } @call3(
+; CHECK-SAME:) #[[ATTR0]]
+define { ptr addrspace(1), ptr } @call3() #2 {
+  tail call fastcc void @call5()
+  ret { ptr addrspace(1), ptr } zeroinitializer
+}
+
+; CHECK-LABEL: define internal fastcc void @call5(
+; CHECK-SAME: ) #[[ATTR0]]
+define internal fastcc void @call5() {
+  tail call fastcc void @call1()
+  ret void
+}
+
+attributes #0 = {"amdgpu-flat-work-group-size"="1, 1024" "target-cpu"="gfx942" }
+attributes #1 = {"amdgpu-flat-work-group-size"="1, 1024" "target-cpu"="gfx942" }
+attributes #2 = {"amdgpu-flat-work-group-size"="1, 256" "target-cpu"="gfx942" }
+
+; CHECK: attributes #[[ATTR0]] = { "amdgpu-agpr-alloc"="0" "amdgpu-flat-work-group-size"="1,256" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx942" "uniform-work-group-size"="false" }

shiltian

I'm gonna put a block here, since this block of code is gonna go anyway after #123995 is merged.

npanchen · 2025-05-14T20:29:03Z

I'm gonna put a block here, since this block of code is gonna go anyway after #123995 is merged.

Do you have some ETA of when it's going to land ? Your PR is ~4 months old, while we started to observe assertion due to #137807 that was created and merged within a week.

shiltian · 2025-05-14T20:46:37Z

That is not really up to me TBH…I also hope it can be merged soon…It can fix a significant performance regression we got half a year ago…

shiltian · 2025-05-14T20:58:09Z

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp

+          AssumedGroupSize->getAssumed().getUpper().getZExtValue();
+      if (MinFWGSize == 0 && MaxFWGSize == 0)
+        std::tie(MinFWGSize, MaxFWGSize) =
+            InfoCache.getDefaultFlatWorkGroupSize(*Func);


And it is also wrong to use default flat work group size here.

can you elaborate why ? Isn't it always safe (correctness-wise) to use max that function allows ?

If both of them are zero, it means the flat workgroup size AA still doesn't reach fixed point since its state is still valid here.

Isn't it always safe (correctness-wise) to use max that function allows ?

Yes, it is always safe to use max, but then there is no point to have this waves per eu AA. Using the default/max value here basically pushes waves per eu to its worst state, and it can't be optimized later on, even if a more optimal flat work group size is encountered. That is because the growth of flat workgroup size and waves per eu is in different direction, and the value gets from waves per eu always dominates the values propagated here.

To fix this properly, we should not use flat workgroup size here at all, which is exactly what #123995 is doing. The intermediate value of work group size should not be used to define waves per eu.

That is because the growth of flat workgroup size and waves per eu is in different direction, and the value gets from waves per eu always dominates the values propagated here.

Thanks for explanation.

Yes, it is always safe to use max, but then there is no point to have this waves per eu AA. Using the default/max value here basically pushes waves per eu to its worst state, and it can't be optimized later on, even if a more optimal flat work group size is encountered

That's what I sort-of get from the comment in AAAMDFlatWorkGroupSize. However, the issue we're seeing is internal compiler error, so compiling with bad perf is always better than having crashing compiler.

To fix this properly, we should not use flat workgroup size here at all, which is exactly what #123995 is doing. The intermediate value of work group size should not be used to define waves per eu.

@arsenm can #123995 be merged ?

npanchen · 2025-05-14T21:18:15Z

That is not really up to me TBH…I also hope it can be merged soon…It can fix a significant performance regression we got half a year ago…

I see. so it's only review process and not some disagreement there.

npanchen · 2025-05-15T19:08:41Z

@arsenm @shiltian ping either on this or #123995

shiltian · 2025-05-17T05:01:30Z

#123995 has been merged

npanchen · 2025-05-17T12:53:12Z

thanks!

[AMDGPU] Handle unset/max flat workgroup size in waves/EU

c40eaae

When `amdgpu-flat-work-group-size` is either missed or set to maximum allowed [1, 1024], attributor won't change the state, which later results to `FlatWorkGroupSize != 0` assertion.

npanchen requested review from arsenm and lucas-rami May 14, 2025 20:20

llvmbot added the backend:AMDGPU label May 14, 2025

shiltian requested changes May 14, 2025

View reviewed changes

shiltian reviewed May 14, 2025

View reviewed changes

npanchen closed this May 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Handle unset/max flat workgroup size in waves/EU #139955

[AMDGPU] Handle unset/max flat workgroup size in waves/EU #139955

Uh oh!

npanchen commented May 14, 2025

Uh oh!

llvmbot commented May 14, 2025

Uh oh!

shiltian left a comment

Uh oh!

npanchen commented May 14, 2025 •

edited

Loading

Uh oh!

shiltian commented May 14, 2025 •

edited

Loading

Uh oh!

shiltian May 14, 2025

Uh oh!

npanchen May 14, 2025

Uh oh!

shiltian May 14, 2025 •

edited

Loading

Uh oh!

npanchen May 14, 2025

Uh oh!

npanchen commented May 14, 2025

Uh oh!

npanchen commented May 15, 2025

Uh oh!

shiltian commented May 17, 2025

Uh oh!

npanchen commented May 17, 2025

Uh oh!

Uh oh!

[AMDGPU] Handle unset/max flat workgroup size in waves/EU #139955

[AMDGPU] Handle unset/max flat workgroup size in waves/EU #139955

Uh oh!

Conversation

npanchen commented May 14, 2025

Uh oh!

llvmbot commented May 14, 2025

Uh oh!

shiltian left a comment

Choose a reason for hiding this comment

Uh oh!

npanchen commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shiltian commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shiltian May 14, 2025

Choose a reason for hiding this comment

Uh oh!

npanchen May 14, 2025

Choose a reason for hiding this comment

Uh oh!

shiltian May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

npanchen May 14, 2025

Choose a reason for hiding this comment

Uh oh!

npanchen commented May 14, 2025

Uh oh!

npanchen commented May 15, 2025

Uh oh!

shiltian commented May 17, 2025

Uh oh!

npanchen commented May 17, 2025

Uh oh!

Uh oh!

npanchen commented May 14, 2025 •

edited

Loading

shiltian commented May 14, 2025 •

edited

Loading

shiltian May 14, 2025 •

edited

Loading