[AMDGPU][Attributor] Rework update of `AAAMDWavesPerEU` #123995

shiltian · 2025-01-22T19:56:08Z

Currently, we use AAAMDWavesPerEU to iteratively update values based on attributes from the associated function, potentially propagating user-annotated values, along with AAAMDFlatWorkGroupSize. Similarly, we have AAAMDFlatWorkGroupSize. However, since the value calculated through the flat workgroup size always dominates the user annotation (i.e., the attribute), running AAAMDWavesPerEU iteratively is unnecessary if no user-annotated value exists.

This PR completely rewrites how the amdgpu-waves-per-eu attribute is handled in AMDGPUAttributor. The key changes are as follows:

AAAMDFlatWorkGroupSize remains unchanged.
AAAMDWavesPerEU now only propagates user-annotated values.
A new function is added to check and update amdgpu-waves-per-eu based on the following rules:
- No waves per eu, no flat workgroup size: Assume a flat workgroup size of 1,1024 and compute waves per eu based on this.
- No waves per eu, flat workgroup size exists: Use the provided flat workgroup size to compute waves-per-eu.
- Waves per eu exists, no flat workgroup size: This is a tricky case. In this PR, we assume a flat workgroup size of 1,1024, but this can be adjusted if a different approach is preferred. Alternatively, we could directly use the user-annotated value.
- Both waves per eu and flat workgroup size exist: If there’s a conflict, the value derived from the flat workgroup size takes precedence over waves per eu.

This PR also updates the logic for merging two waves per eu pairs. The current implementation, which uses clampStateAndIndicateChange to compute a union, might not be ideal. If we think from ensure proper resource allocation perspective, for instance, if one pair specifies a minimum of 2 waves per eu, and another specifies a minimum of 4, we should guarantee that 4 waves per eu can be supported, as failing to do so could result in excessive resource allocation per wave. A similar principle applies to the upper bound. Thus, the PR uses the following approach for merging two pairs, lo_a,up_a and lo_b,up_b: max(lo_a, lo_b), max(up_a, up_b). This ensures that resource allocation adheres to the stricter constraints from both inputs.

Fix #123092.

shiltian · 2025-01-22T19:56:27Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-01-22T19:56:46Z

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-amdgpu

Author: Shilei Tian (shiltian)

Changes

Patch is 164.14 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/123995.diff

31 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp (+74-51)
(modified) llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll (+6-6)
(modified) llvm/test/CodeGen/AMDGPU/annotate-existing-abi-attributes.ll (+10-10)
(modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll (+39-41)
(modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll (+14-14)
(modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features.ll (+10-10)
(modified) llvm/test/CodeGen/AMDGPU/attr-amdgpu-max-num-workgroups-propagate.ll (+25-27)
(modified) llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit.ll (+12-12)
(modified) llvm/test/CodeGen/AMDGPU/attributor-loop-issue-58639.ll (+2-3)
(modified) llvm/test/CodeGen/AMDGPU/direct-indirect-call.ll (+3-4)
(modified) llvm/test/CodeGen/AMDGPU/duplicate-attribute-indirect.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/implicitarg-offset-attributes.ll (+16-16)
(modified) llvm/test/CodeGen/AMDGPU/indirect-call-set-from-other-function.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/inline-attr.ll (+6-6)
(modified) llvm/test/CodeGen/AMDGPU/issue120256-annotate-constexpr-addrspacecast.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/pal-simple-indirect-call.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/propagate-flat-work-group-size.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/propagate-waves-per-eu.ll (+18-19)
(modified) llvm/test/CodeGen/AMDGPU/recursive_global_initializer.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/remove-no-kernel-id-attribute.ll (+7-8)
(modified) llvm/test/CodeGen/AMDGPU/simple-indirect-call-2.ll (+6-6)
(modified) llvm/test/CodeGen/AMDGPU/simple-indirect-call.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/simplify-libcalls.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-attribute-missing.ll (+1-1)
(modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-multistep.ll (+5-6)
(modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-nested-function-calls.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-prevent-attribute-propagation.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-propagate-attribute.ll (+4-4)
(modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-recursion-test.ll (+2-2)
(modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-test.ll (+1-1)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index 2bc68cf2fd6a4a..dc5ba1afcfdf7a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -1109,74 +1109,38 @@ struct AAAMDWavesPerEU : public AAAMDSizeRangeAttribute {
     Function *F = getAssociatedFunction();
     auto &InfoCache = static_cast<AMDGPUInformationCache &>(A.getInfoCache());
 
-    auto TakeRange = [&](std::pair<unsigned, unsigned> R) {
-      auto [Min, Max] = R;
-      ConstantRange Range(APInt(32, Min), APInt(32, Max + 1));
-      IntegerRangeState RangeState(Range);
-      clampStateAndIndicateChange(this->getState(), RangeState);
-      indicateOptimisticFixpoint();
-    };
-
-    std::pair<unsigned, unsigned> MaxWavesPerEURange{
-        1U, InfoCache.getMaxWavesPerEU(*F)};
-
     // If the attribute exists, we will honor it if it is not the default.
     if (auto Attr = InfoCache.getWavesPerEUAttr(*F)) {
+      std::pair<unsigned, unsigned> MaxWavesPerEURange{
+          1U, InfoCache.getMaxWavesPerEU(*F)};
       if (*Attr != MaxWavesPerEURange) {
-        TakeRange(*Attr);
+        auto [Min, Max] = *Attr;
+        ConstantRange Range(APInt(32, Min), APInt(32, Max + 1));
+        IntegerRangeState RangeState(Range);
+        clampStateAndIndicateChange(this->getState(), RangeState);
+        indicateOptimisticFixpoint();
         return;
       }
     }
 
-    // Unlike AAAMDFlatWorkGroupSize, it's getting trickier here. Since the
-    // calculation of waves per EU involves flat work group size, we can't
-    // simply use an assumed flat work group size as a start point, because the
-    // update of flat work group size is in an inverse direction of waves per
-    // EU. However, we can still do something if it is an entry function. Since
-    // an entry function is a terminal node, and flat work group size either
-    // from attribute or default will be used anyway, we can take that value and
-    // calculate the waves per EU based on it. This result can't be updated by
-    // no means, but that could still allow us to propagate it.
-    if (AMDGPU::isEntryFunctionCC(F->getCallingConv())) {
-      std::pair<unsigned, unsigned> FlatWorkGroupSize;
-      if (auto Attr = InfoCache.getFlatWorkGroupSizeAttr(*F))
-        FlatWorkGroupSize = *Attr;
-      else
-        FlatWorkGroupSize = InfoCache.getDefaultFlatWorkGroupSize(*F);
-      TakeRange(InfoCache.getEffectiveWavesPerEU(*F, MaxWavesPerEURange,
-                                                 FlatWorkGroupSize));
-    }
+    if (AMDGPU::isEntryFunctionCC(F->getCallingConv()))
+      indicatePessimisticFixpoint();
   }
 
   ChangeStatus updateImpl(Attributor &A) override {
-    auto &InfoCache = static_cast<AMDGPUInformationCache &>(A.getInfoCache());
     ChangeStatus Change = ChangeStatus::UNCHANGED;
 
     auto CheckCallSite = [&](AbstractCallSite CS) {
       Function *Caller = CS.getInstruction()->getFunction();
-      Function *Func = getAssociatedFunction();
-      LLVM_DEBUG(dbgs() << '[' << getName() << "] Call " << Caller->getName()
-                        << "->" << Func->getName() << '\n');
-
       const auto *CallerInfo = A.getAAFor<AAAMDWavesPerEU>(
           *this, IRPosition::function(*Caller), DepClassTy::REQUIRED);
-      const auto *AssumedGroupSize = A.getAAFor<AAAMDFlatWorkGroupSize>(
-          *this, IRPosition::function(*Func), DepClassTy::REQUIRED);
-      if (!CallerInfo || !AssumedGroupSize || !CallerInfo->isValidState() ||
-          !AssumedGroupSize->isValidState())
+      if (!CallerInfo || !CallerInfo->isValidState())
         return false;
-
-      unsigned Min, Max;
-      std::tie(Min, Max) = InfoCache.getEffectiveWavesPerEU(
-          *Caller,
-          {CallerInfo->getAssumed().getLower().getZExtValue(),
-           CallerInfo->getAssumed().getUpper().getZExtValue() - 1},
-          {AssumedGroupSize->getAssumed().getLower().getZExtValue(),
-           AssumedGroupSize->getAssumed().getUpper().getZExtValue() - 1});
-      ConstantRange CallerRange(APInt(32, Min), APInt(32, Max + 1));
+      unsigned Min = CallerInfo->getAssumed().getLower().getZExtValue();
+      unsigned Max = CallerInfo->getAssumed().getUpper().getZExtValue();
+      ConstantRange CallerRange(APInt(32, Min), APInt(32, Max));
       IntegerRangeState CallerRangeState(CallerRange);
       Change |= clampStateAndIndicateChange(this->getState(), CallerRangeState);
-
       return true;
     };
 
@@ -1329,6 +1293,59 @@ static void addPreloadKernArgHint(Function &F, TargetMachine &TM) {
   }
 }
 
+static void checkWavesPerEU(Module &M, TargetMachine &TM) {
+  for (Function &F : M) {
+    const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);
+
+    auto FlatWgrpSizeAttr =
+        AMDGPU::getIntegerPairAttribute(F, "amdgpu-flat-work-group-size");
+    auto WavesPerEUAttr = AMDGPU::getIntegerPairAttribute(
+        F, "amdgpu-waves-per-eu", /*OnlyFirstRequired=*/true);
+
+    unsigned MinWavesPerEU = ST.getMinWavesPerEU();
+    unsigned MaxWavesPerEU = ST.getMaxWavesPerEU();
+
+    unsigned MinFlatWgrpSize = 1U;
+    unsigned MaxFlatWgrpSize = 1024U;
+    if (FlatWgrpSizeAttr.has_value()) {
+      MinFlatWgrpSize = FlatWgrpSizeAttr->first;
+      MaxFlatWgrpSize = *(FlatWgrpSizeAttr->second);
+    }
+
+    // Start with the max range.
+    unsigned Min = MinWavesPerEU;
+    unsigned Max = MaxWavesPerEU;
+
+    // If the attribute exists, set them to the value from the attribute.
+    if (WavesPerEUAttr.has_value()) {
+      Min = WavesPerEUAttr->first;
+      if (WavesPerEUAttr->second.has_value())
+        Max = *(WavesPerEUAttr->second);
+    }
+
+    // Compute the range from flat workgroup size.
+    auto [MinFromFlatWgrpSize, MaxFromFlatWgrpSize] =
+        ST.getWavesPerEU(F, std::make_pair(MinFlatWgrpSize, MaxFlatWgrpSize));
+
+    // For the lower bound, we have to "tighten" it.
+    Min = std::max(Min, MinFromFlatWgrpSize);
+    // For the upper bound, we have to "extend" it.
+    Max = std::max(Max, MaxFromFlatWgrpSize);
+
+    // Clamp the range to the max range.
+    Min = std::max(Min, MinWavesPerEU);
+    Max = std::min(Max, MaxWavesPerEU);
+
+    // Update the attribute if it is not the max.
+    if (Min != MinWavesPerEU || Max != MaxWavesPerEU) {
+      SmallString<10> Buffer;
+      raw_svector_ostream OS(Buffer);
+      OS << Min << ',' << Max;
+      F.addFnAttr("amdgpu-waves-per-eu", OS.str());
+    }
+  }
+}
+
 static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM,
                     AMDGPUAttributorOptions Options,
                     ThinOrFullLTOPhase LTOPhase) {
@@ -1421,8 +1438,14 @@ static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM,
     }
   }
 
-  ChangeStatus Change = A.run();
-  return Change == ChangeStatus::CHANGED;
+  bool Changed = A.run() == ChangeStatus::CHANGED;
+
+  if (Changed && (LTOPhase == ThinOrFullLTOPhase::None ||
+                  LTOPhase == ThinOrFullLTOPhase::FullLTOPostLink ||
+                  LTOPhase == ThinOrFullLTOPhase::ThinLTOPostLink))
+    checkWavesPerEU(M, TM);
+
+  return Changed;
 }
 
 class AMDGPUAttributorLegacy : public ModulePass {
diff --git a/llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll b/llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll
index d316e10037757b..51f0348c050cd9 100644
--- a/llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll
+++ b/llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll
@@ -232,9 +232,9 @@ attributes #1 = { nounwind }
 ; AKF_HSA: attributes #[[ATTR0:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }
 ; AKF_HSA: attributes #[[ATTR1]] = { nounwind }
 ;.
-; ATTRIBUTOR_HSA: attributes #[[ATTR0:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }
-; ATTRIBUTOR_HSA: attributes #[[ATTR1]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
-; ATTRIBUTOR_HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
+; ATTRIBUTOR_HSA: attributes #[[ATTR0:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) "amdgpu-waves-per-eu"="4,10" }
+; ATTRIBUTOR_HSA: attributes #[[ATTR1]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
+; ATTRIBUTOR_HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
 ;.
 ; AKF_HSA: [[META0:![0-9]+]] = !{i32 1, !"amdhsa_code_object_version", i32 500}
 ;.
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll
index 33e7e7a7a019e3..e28caf8f455e21 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll
@@ -254,11 +254,11 @@ define amdgpu_kernel void @indirect_calls_none_agpr(i1 %cond) {
 
 attributes #0 = { "amdgpu-no-agpr" }
 ;.
-; CHECK: attributes #[[ATTR0]] = { "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR1]] = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR2]] = { "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR3:[0-9]+]] = { convergent nocallback nofree nosync nounwind willreturn memory(none) "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR4:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR5:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR0]] = { "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,8" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR1]] = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,8" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR2]] = { "amdgpu-waves-per-eu"="4,8" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR3:[0-9]+]] = { convergent nocallback nofree nosync nounwind willreturn memory(none) "amdgpu-waves-per-eu"="4,8" "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR4:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) "amdgpu-waves-per-eu"="4,8" "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR5:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) "amdgpu-waves-per-eu"="4,8" "target-cpu"="gfx90a" }
 ; CHECK: attributes #[[ATTR6]] = { "amdgpu-no-agpr" }
 ;.
diff --git a/llvm/test/CodeGen/AMDGPU/annotate-existing-abi-attributes.ll b/llvm/test/CodeGen/AMDGPU/annotate-existing-abi-attributes.ll
index 7e0208cd1f45aa..28722021e0448f 100644
--- a/llvm/test/CodeGen/AMDGPU/annotate-existing-abi-attributes.ll
+++ b/llvm/test/CodeGen/AMDGPU/annotate-existing-abi-attributes.ll
@@ -117,14 +117,14 @@ define void @call_no_dispatch_id() {
   ret void
 }
 ;.
-; CHECK: attributes #[[ATTR0]] = { "amdgpu-no-workitem-id-x" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR1]] = { "amdgpu-no-workitem-id-y" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR2]] = { "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR3]] = { "amdgpu-no-workgroup-id-x" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR4]] = { "amdgpu-no-workgroup-id-y" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR5]] = { "amdgpu-no-workgroup-id-z" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR6]] = { "amdgpu-no-dispatch-ptr" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR7]] = { "amdgpu-no-queue-ptr" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR8]] = { "amdgpu-no-implicitarg-ptr" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR9]] = { "amdgpu-no-dispatch-id" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR0]] = { "amdgpu-no-workitem-id-x" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR1]] = { "amdgpu-no-workitem-id-y" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR2]] = { "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR3]] = { "amdgpu-no-workgroup-id-x" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR4]] = { "amdgpu-no-workgroup-id-y" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR5]] = { "amdgpu-no-workgroup-id-z" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR6]] = { "amdgpu-no-dispatch-ptr" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR7]] = { "amdgpu-no-queue-ptr" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR8]] = { "amdgpu-no-implicitarg-ptr" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR9]] = { "amdgpu-no-dispatch-id" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
 ;.
diff --git a/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll b/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll
index ea3f08ede2c5dc..da0f710abe08cf 100644
--- a/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll
+++ b/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll
@@ -688,7 +688,7 @@ define void @func_call_asm() #3 {
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_call_asm
 ; ATTRIBUTOR_HSA-SAME: () #[[ATTR16]] {
-; ATTRIBUTOR_HSA-NEXT:    call void asm sideeffect "", ""() #[[ATTR26:[0-9]+]]
+; ATTRIBUTOR_HSA-NEXT:    call void asm sideeffect "", ""() #[[ATTR24:[0-9]+]]
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
   call void asm sideeffect "", ""() #3
@@ -717,7 +717,7 @@ define amdgpu_kernel void @func_kern_defined() #3 {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_kern_defined
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR17:[0-9]+]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR16]] {
 ; ATTRIBUTOR_HSA-NEXT:    call void @defined.func()
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
@@ -845,7 +845,7 @@ define amdgpu_kernel void @kern_sanitize_address() #4 {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@kern_sanitize_address
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR18:[0-9]+]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR17:[0-9]+]] {
 ; ATTRIBUTOR_HSA-NEXT:    store volatile i32 0, ptr addrspace(1) null, align 4
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
@@ -861,7 +861,7 @@ define void @func_sanitize_address() #4 {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_sanitize_address
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR18]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR17]] {
 ; ATTRIBUTOR_HSA-NEXT:    store volatile i32 0, ptr addrspace(1) null, align 4
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
@@ -877,7 +877,7 @@ define void @func_indirect_sanitize_address() #3 {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_indirect_sanitize_address
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR19:[0-9]+]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR18:[0-9]+]] {
 ; ATTRIBUTOR_HSA-NEXT:    call void @func_sanitize_address()
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
@@ -893,7 +893,7 @@ define amdgpu_kernel void @kern_indirect_sanitize_address() #3 {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@kern_indirect_sanitize_address
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR19]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR18]] {
 ; ATTRIBUTOR_HSA-NEXT:    call void @func_sanitize_address()
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
@@ -928,7 +928,7 @@ define internal void @enqueue_block_def() #6 {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@enqueue_block_def
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR22:[0-9]+]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR21:[0-9]+]] {
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
   ret void
@@ -941,7 +941,7 @@ define amdgpu_kernel void @kern_call_enqueued_block_decl() {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@kern_call_enqueued_block_decl
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR23:[0-9]+]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR22:[0-9]+]] {
 ; ATTRIBUTOR_HSA-NEXT:    call void @enqueue_block_decl()
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
@@ -956,7 +956,7 @@ define amdgpu_kernel void @kern_call_enqueued_block_def() {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@kern_call_enqueued_block_def
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR24:[0-9]+]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR23:[0-9]+]] {
 ; ATTRIBUTOR_HSA-NEXT:    call void @enqueue_block_def()
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
@@ -969,7 +969,7 @@ define void @unused_enqueue_block() {
 ; AKF_HSA-NEXT:    ret void
...
[truncated]

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp

shiltian · 2025-01-22T20:36:49Z

I'll also make more tests if the code changes look reasonable.

shiltian · 2025-02-21T18:10:29Z

ping

@arsenm @jdoerfert

jplehr · 2025-02-22T08:35:49Z

Out of curiosity: I believe this is case three, why would we not want to use the user-provided values?

arsenm · 2025-02-23T02:07:10Z

Out of curiosity: I believe this is case three, why would we not want to use the user-provided values?

A library use does not have the usage context. It may have specified a bounds to avoid negatively impacting occupancy of some unknown calling kernel. If we know the bounds of the concrete calling kernels, the function should specialize for its bound

github-actions · 2025-03-21T19:00:22Z

✅ With the latest revision this PR passed the undef deprecator.

clang/test/CodeGenOpenCL/builtins-amdgcn.cl

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp

arsenm

Tests look like they were reverted to some old revision and lost updates

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp

arsenm · 2025-05-01T09:35:23Z

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp

+/// during attributor run because the two attributes grow in opposite direction,
+/// we should not use any intermediate value to calculate waves per eu until we
+/// have a determined flat workgroup size.
+static void updateWavesPerEU(Module &M, TargetMachine &TM) {


It just occurred to me that AMDGPUAttributor is only a Module pass. Should we add or replace it with a CGSCC pass?

~~We should probably add a CGSCC pass.~~

After a second thought, no, since we don't want to run it multiple times.

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp

arsenm · 2025-05-01T09:38:20Z

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp

@@ -1408,8 +1433,14 @@ static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM,
    }
  }

-  ChangeStatus Change = A.run();
-  return Change == ChangeStatus::CHANGED;
+  bool Changed = A.run() == ChangeStatus::CHANGED;


Can this be more refined on waves per eu or flat workgroup size changing?

I didn't follow your question. Can you expatiate it?

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp

llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll

llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll

shiltian · 2025-05-07T13:00:05Z

ping

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp

jdoerfert

One nit, feel free to ignore.

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp

llvm-ci · 2025-05-17T06:10:00Z

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-gcc-ubuntu-no-asserts running on doug-worker-6 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/202/builds/1279

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: tools/dsymutil/X86/op-convert-offset.test' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
warning: /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs/private/tmp/op-convert-offset/op-convert-offset.o: timestamp mismatch between object file (2025-03-11 17:27:45.671531848) and debug map (2022-07-12 20:49:30.000000000)
warning: /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs/private/tmp/op-convert-offset/op-convert-offset.o: timestamp mismatch between object file (2025-03-11 17:27:45.671531848) and debug map (2022-07-12 20:49:30.000000000)
warning: cann't read address attribute value.
note: while processing op-convert-offset1.c

--
Command Output (stderr):
--
/home/buildbot/buildbot-root/gcc-no-asserts/build/bin/dsymutil -oso-prepend-path /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs/private/tmp/op-convert-offset/op-convert-offset -o /home/buildbot/buildbot-root/gcc-no-asserts/build/test/tools/dsymutil/X86/Output/op-convert-offset.test.tmp.dSYM 2>&1 # RUN: at line 23
+ /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/dsymutil -oso-prepend-path /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs/private/tmp/op-convert-offset/op-convert-offset -o /home/buildbot/buildbot-root/gcc-no-asserts/build/test/tools/dsymutil/X86/Output/op-convert-offset.test.tmp.dSYM
/home/buildbot/buildbot-root/gcc-no-asserts/build/bin/llvm-dwarfdump /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs/private/tmp/op-convert-offset/op-convert-offset.o 2>&1 | /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/FileCheck /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/op-convert-offset.test --check-prefix OBJ # RUN: at line 24
+ /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/llvm-dwarfdump /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs/private/tmp/op-convert-offset/op-convert-offset.o
+ /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/FileCheck /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/op-convert-offset.test --check-prefix OBJ
/home/buildbot/buildbot-root/gcc-no-asserts/build/bin/llvm-dwarfdump /home/buildbot/buildbot-root/gcc-no-asserts/build/test/tools/dsymutil/X86/Output/op-convert-offset.test.tmp.dSYM 2>&1 | /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/FileCheck /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/op-convert-offset.test --check-prefix DSYM # RUN: at line 25
+ /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/llvm-dwarfdump /home/buildbot/buildbot-root/gcc-no-asserts/build/test/tools/dsymutil/X86/Output/op-convert-offset.test.tmp.dSYM
+ /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/FileCheck /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/op-convert-offset.test --check-prefix DSYM
/home/buildbot/buildbot-root/gcc-no-asserts/build/bin/dsymutil --linker parallel -oso-prepend-path /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs   /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs/private/tmp/op-convert-offset/op-convert-offset   -o /home/buildbot/buildbot-root/gcc-no-asserts/build/test/tools/dsymutil/X86/Output/op-convert-offset.test.tmp.dSYM 2>&1 # RUN: at line 27
+ /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/dsymutil --linker parallel -oso-prepend-path /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs/private/tmp/op-convert-offset/op-convert-offset -o /home/buildbot/buildbot-root/gcc-no-asserts/build/test/tools/dsymutil/X86/Output/op-convert-offset.test.tmp.dSYM
/home/buildbot/buildbot-root/gcc-no-asserts/build/bin/llvm-dwarfdump    /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs/private/tmp/op-convert-offset/op-convert-offset.o 2>&1    | /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/FileCheck /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/op-convert-offset.test --check-prefix OBJ # RUN: at line 30
+ /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/llvm-dwarfdump /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs/private/tmp/op-convert-offset/op-convert-offset.o
+ /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/FileCheck /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/op-convert-offset.test --check-prefix OBJ
/home/buildbot/buildbot-root/gcc-no-asserts/build/bin/llvm-dwarfdump /home/buildbot/buildbot-root/gcc-no-asserts/build/test/tools/dsymutil/X86/Output/op-convert-offset.test.tmp.dSYM 2>&1 | /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/FileCheck /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/op-convert-offset.test --check-prefix DSYM # RUN: at line 33
+ /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/llvm-dwarfdump /home/buildbot/buildbot-root/gcc-no-asserts/build/test/tools/dsymutil/X86/Output/op-convert-offset.test.tmp.dSYM
+ /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/FileCheck /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/op-convert-offset.test --check-prefix DSYM
�[1m/home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/op-convert-offset.test:45:7: �[0m�[0;1;31merror: �[0m�[1mDSYM: expected string not found in input
�[0mDSYM: 0x00000084: DW_TAG_base_type
�[0;1;32m      ^
�[0m�[1m<stdin>:1:1: �[0m�[0;1;30mnote: �[0m�[1mscanning from here
�[0m/home/buildbot/buildbot-root/gcc-no-asserts/build/test/tools/dsymutil/X86/Output/op-convert-offset.test.tmp.dSYM/Contents/Resources/DWARF/op-convert-offset: file format Mach-O 64-bit x86-64
�[0;1;32m^
�[0m�[1m<stdin>:16:1: �[0m�[0;1;30mnote: �[0m�[1mpossible intended match here
�[0m0x0000001e: DW_TAG_base_type
�[0;1;32m^
�[0m
Input file: <stdin>
Check file: /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/op-convert-offset.test

-dump-input=help explains the following input dump.

Input was:
<<<<<<
�[1m�[0m�[0;1;30m            1: �[0m�[1m�[0;1;46m/home/buildbot/buildbot-root/gcc-no-asserts/build/test/tools/dsymutil/X86/Output/op-convert-offset.test.tmp.dSYM/Contents/Resources/DWARF/op-convert-offset: file format Mach-O 64-bit x86-64 �[0m
�[0;1;31mcheck:45'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
�[0m�[0;1;30m            2: �[0m�[1m�[0;1;46m �[0m
�[0;1;31mcheck:45'0     ~
...

llvmbot added the backend:AMDGPU label Jan 22, 2025

shiltian mentioned this pull request Jan 22, 2025

[AMDGPU][Attributor] Add ThinOrFullLTOPhase as an argument #123994

Merged

shiltian requested review from arsenm and jdoerfert January 22, 2025 19:56

shiltian commented Jan 22, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp Outdated Show resolved Hide resolved

shiltian changed the title ~~[AMDGPU][Attributor] Rework calculation of waves per eu~~ [AMDGPU][Attributor] Rework update of waves per eu Jan 22, 2025

shiltian mentioned this pull request Jan 30, 2025

[AMDGPU][Attributor] Skip update and manifest if an AA is at its initial state #114726

Closed

shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from 0b40b9b to 3c03967 Compare February 4, 2025 17:59

shiltian force-pushed the users/shiltian/pass-stage-to-amdgpuattributor branch from 34fd831 to 1c1acf3 Compare February 4, 2025 17:59

shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from 3c03967 to b063f21 Compare February 6, 2025 17:16

shiltian force-pushed the users/shiltian/pass-stage-to-amdgpuattributor branch 2 times, most recently from 33c1672 to f044145 Compare February 6, 2025 17:26

shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from b063f21 to 66cd929 Compare February 6, 2025 17:26

shiltian changed the title ~~[AMDGPU][Attributor] Rework update of waves per eu~~ [AMDGPU][Attributor] Rework update of AAAMDWavesPerEU Feb 6, 2025

shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from d1dd771 to fafefe4 Compare February 10, 2025 16:20

shiltian force-pushed the users/shiltian/pass-stage-to-amdgpuattributor branch from 55196cc to 9b9a320 Compare February 10, 2025 16:20

shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from fafefe4 to 64d1817 Compare March 21, 2025 18:56

shiltian force-pushed the users/shiltian/pass-stage-to-amdgpuattributor branch from 9b9a320 to 3118593 Compare March 21, 2025 18:56

shiltian force-pushed the users/shiltian/pass-stage-to-amdgpuattributor branch from 3118593 to c3685eb Compare March 24, 2025 02:37

shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from 64d1817 to 2750f73 Compare March 24, 2025 02:37

llvmbot added the clang Clang issues not falling into any other category label Mar 24, 2025

shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from 2750f73 to 7167ec8 Compare March 24, 2025 02:39

arsenm reviewed Mar 24, 2025

View reviewed changes

shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from 1a376e2 to 0532ee5 Compare March 26, 2025 16:51

shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from 0532ee5 to f89193e Compare April 8, 2025 02:39

shiltian force-pushed the users/shiltian/pass-stage-to-amdgpuattributor branch 2 times, most recently from b7efaf3 to aafd825 Compare April 8, 2025 03:17

shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from f89193e to eff8de0 Compare April 8, 2025 03:17

arsenm reviewed May 1, 2025

View reviewed changes

shiltian force-pushed the users/shiltian/pass-stage-to-amdgpuattributor branch from aafd825 to 35fda65 Compare May 1, 2025 18:14

shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch 5 times, most recently from 66d27ed to 1e140dd Compare May 1, 2025 18:26

shiltian force-pushed the users/shiltian/pass-stage-to-amdgpuattributor branch from 35fda65 to ed136cc Compare May 1, 2025 22:33

shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from 1e140dd to 7c79a09 Compare May 1, 2025 22:33

Base automatically changed from users/shiltian/pass-stage-to-amdgpuattributor to main May 2, 2025 15:33

shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from 7c79a09 to 6b7be9c Compare May 2, 2025 15:38

shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from 6b7be9c to c235997 Compare May 14, 2025 16:59

shiltian mentioned this pull request May 14, 2025

[AMDGPU] Handle unset/max flat workgroup size in waves/EU #139955

Closed

shiltian requested a review from arsenm May 16, 2025 13:02

jdoerfert reviewed May 16, 2025

View reviewed changes

shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch 3 times, most recently from a24f0d0 to 72851ae Compare May 16, 2025 23:24

jdoerfert approved these changes May 17, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp Outdated Show resolved Hide resolved

[AMDGPU][Attributor] Rework calculation of waves per eu

352a09a

shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from 72851ae to 352a09a Compare May 17, 2025 02:38

shiltian merged commit 578741b into main May 17, 2025
11 checks passed

shiltian deleted the users/shiltian/waves-per-eu-aa-rework branch May 17, 2025 05:01

[AMDGPU][Attributor] Rework update of AAAMDWavesPerEU #123995

[AMDGPU][Attributor] Rework update of AAAMDWavesPerEU #123995

Uh oh!

Conversation

shiltian commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shiltian commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

shiltian commented Jan 22, 2025

Uh oh!

shiltian commented Feb 21, 2025

Uh oh!

jplehr commented Feb 22, 2025

Uh oh!

arsenm commented Feb 23, 2025

Uh oh!

github-actions bot commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arsenm May 1, 2025

Choose a reason for hiding this comment

Uh oh!

shiltian May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

arsenm May 1, 2025

Choose a reason for hiding this comment

Uh oh!

shiltian May 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shiltian commented May 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jdoerfert left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

llvm-ci commented May 17, 2025

Uh oh!

Uh oh!

[AMDGPU][Attributor] Rework update of `AAAMDWavesPerEU` #123995

[AMDGPU][Attributor] Rework update of `AAAMDWavesPerEU` #123995

shiltian commented Jan 22, 2025 •

edited

Loading

shiltian commented Jan 22, 2025 •

edited

Loading

llvmbot commented Jan 22, 2025 •

edited

Loading

github-actions bot commented Mar 21, 2025 •

edited

Loading

shiltian May 1, 2025 •

edited

Loading