Skip to content

[AMDGPU][Attributor] Rework update of AAAMDWavesPerEU #123995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 17, 2025

Conversation

shiltian
Copy link
Contributor

@shiltian shiltian commented Jan 22, 2025

Currently, we use AAAMDWavesPerEU to iteratively update values based on attributes from the associated function, potentially propagating user-annotated values, along with AAAMDFlatWorkGroupSize. Similarly, we have AAAMDFlatWorkGroupSize. However, since the value calculated through the flat workgroup size always dominates the user annotation (i.e., the attribute), running AAAMDWavesPerEU iteratively is unnecessary if no user-annotated value exists.

This PR completely rewrites how the amdgpu-waves-per-eu attribute is handled in AMDGPUAttributor. The key changes are as follows:

  • AAAMDFlatWorkGroupSize remains unchanged.
  • AAAMDWavesPerEU now only propagates user-annotated values.
  • A new function is added to check and update amdgpu-waves-per-eu based on the following rules:
    • No waves per eu, no flat workgroup size: Assume a flat workgroup size of 1,1024 and compute waves per eu based on this.
    • No waves per eu, flat workgroup size exists: Use the provided flat workgroup size to compute waves-per-eu.
    • Waves per eu exists, no flat workgroup size: This is a tricky case. In this PR, we assume a flat workgroup size of 1,1024, but this can be adjusted if a different approach is preferred. Alternatively, we could directly use the user-annotated value.
    • Both waves per eu and flat workgroup size exist: If there’s a conflict, the value derived from the flat workgroup size takes precedence over waves per eu.

This PR also updates the logic for merging two waves per eu pairs. The current implementation, which uses clampStateAndIndicateChange to compute a union, might not be ideal. If we think from ensure proper resource allocation perspective, for instance, if one pair specifies a minimum of 2 waves per eu, and another specifies a minimum of 4, we should guarantee that 4 waves per eu can be supported, as failing to do so could result in excessive resource allocation per wave. A similar principle applies to the upper bound. Thus, the PR uses the following approach for merging two pairs, lo_a,up_a and lo_b,up_b: max(lo_a, lo_b), max(up_a, up_b). This ensures that resource allocation adheres to the stricter constraints from both inputs.

Fix #123092.

Copy link
Contributor Author

shiltian commented Jan 22, 2025

@llvmbot
Copy link
Member

llvmbot commented Jan 22, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-amdgpu

Author: Shilei Tian (shiltian)

Changes

Patch is 164.14 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/123995.diff

31 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp (+74-51)
  • (modified) llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll (+6-6)
  • (modified) llvm/test/CodeGen/AMDGPU/annotate-existing-abi-attributes.ll (+10-10)
  • (modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll (+39-41)
  • (modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll (+14-14)
  • (modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features.ll (+10-10)
  • (modified) llvm/test/CodeGen/AMDGPU/attr-amdgpu-max-num-workgroups-propagate.ll (+25-27)
  • (modified) llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit.ll (+12-12)
  • (modified) llvm/test/CodeGen/AMDGPU/attributor-loop-issue-58639.ll (+2-3)
  • (modified) llvm/test/CodeGen/AMDGPU/direct-indirect-call.ll (+3-4)
  • (modified) llvm/test/CodeGen/AMDGPU/duplicate-attribute-indirect.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/implicitarg-offset-attributes.ll (+16-16)
  • (modified) llvm/test/CodeGen/AMDGPU/indirect-call-set-from-other-function.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/inline-attr.ll (+6-6)
  • (modified) llvm/test/CodeGen/AMDGPU/issue120256-annotate-constexpr-addrspacecast.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/pal-simple-indirect-call.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/propagate-flat-work-group-size.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/propagate-waves-per-eu.ll (+18-19)
  • (modified) llvm/test/CodeGen/AMDGPU/recursive_global_initializer.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/remove-no-kernel-id-attribute.ll (+7-8)
  • (modified) llvm/test/CodeGen/AMDGPU/simple-indirect-call-2.ll (+6-6)
  • (modified) llvm/test/CodeGen/AMDGPU/simple-indirect-call.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/simplify-libcalls.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-attribute-missing.ll (+1-1)
  • (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-multistep.ll (+5-6)
  • (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-nested-function-calls.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-prevent-attribute-propagation.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-propagate-attribute.ll (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-recursion-test.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-test.ll (+1-1)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index 2bc68cf2fd6a4a..dc5ba1afcfdf7a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -1109,74 +1109,38 @@ struct AAAMDWavesPerEU : public AAAMDSizeRangeAttribute {
     Function *F = getAssociatedFunction();
     auto &InfoCache = static_cast<AMDGPUInformationCache &>(A.getInfoCache());
 
-    auto TakeRange = [&](std::pair<unsigned, unsigned> R) {
-      auto [Min, Max] = R;
-      ConstantRange Range(APInt(32, Min), APInt(32, Max + 1));
-      IntegerRangeState RangeState(Range);
-      clampStateAndIndicateChange(this->getState(), RangeState);
-      indicateOptimisticFixpoint();
-    };
-
-    std::pair<unsigned, unsigned> MaxWavesPerEURange{
-        1U, InfoCache.getMaxWavesPerEU(*F)};
-
     // If the attribute exists, we will honor it if it is not the default.
     if (auto Attr = InfoCache.getWavesPerEUAttr(*F)) {
+      std::pair<unsigned, unsigned> MaxWavesPerEURange{
+          1U, InfoCache.getMaxWavesPerEU(*F)};
       if (*Attr != MaxWavesPerEURange) {
-        TakeRange(*Attr);
+        auto [Min, Max] = *Attr;
+        ConstantRange Range(APInt(32, Min), APInt(32, Max + 1));
+        IntegerRangeState RangeState(Range);
+        clampStateAndIndicateChange(this->getState(), RangeState);
+        indicateOptimisticFixpoint();
         return;
       }
     }
 
-    // Unlike AAAMDFlatWorkGroupSize, it's getting trickier here. Since the
-    // calculation of waves per EU involves flat work group size, we can't
-    // simply use an assumed flat work group size as a start point, because the
-    // update of flat work group size is in an inverse direction of waves per
-    // EU. However, we can still do something if it is an entry function. Since
-    // an entry function is a terminal node, and flat work group size either
-    // from attribute or default will be used anyway, we can take that value and
-    // calculate the waves per EU based on it. This result can't be updated by
-    // no means, but that could still allow us to propagate it.
-    if (AMDGPU::isEntryFunctionCC(F->getCallingConv())) {
-      std::pair<unsigned, unsigned> FlatWorkGroupSize;
-      if (auto Attr = InfoCache.getFlatWorkGroupSizeAttr(*F))
-        FlatWorkGroupSize = *Attr;
-      else
-        FlatWorkGroupSize = InfoCache.getDefaultFlatWorkGroupSize(*F);
-      TakeRange(InfoCache.getEffectiveWavesPerEU(*F, MaxWavesPerEURange,
-                                                 FlatWorkGroupSize));
-    }
+    if (AMDGPU::isEntryFunctionCC(F->getCallingConv()))
+      indicatePessimisticFixpoint();
   }
 
   ChangeStatus updateImpl(Attributor &A) override {
-    auto &InfoCache = static_cast<AMDGPUInformationCache &>(A.getInfoCache());
     ChangeStatus Change = ChangeStatus::UNCHANGED;
 
     auto CheckCallSite = [&](AbstractCallSite CS) {
       Function *Caller = CS.getInstruction()->getFunction();
-      Function *Func = getAssociatedFunction();
-      LLVM_DEBUG(dbgs() << '[' << getName() << "] Call " << Caller->getName()
-                        << "->" << Func->getName() << '\n');
-
       const auto *CallerInfo = A.getAAFor<AAAMDWavesPerEU>(
           *this, IRPosition::function(*Caller), DepClassTy::REQUIRED);
-      const auto *AssumedGroupSize = A.getAAFor<AAAMDFlatWorkGroupSize>(
-          *this, IRPosition::function(*Func), DepClassTy::REQUIRED);
-      if (!CallerInfo || !AssumedGroupSize || !CallerInfo->isValidState() ||
-          !AssumedGroupSize->isValidState())
+      if (!CallerInfo || !CallerInfo->isValidState())
         return false;
-
-      unsigned Min, Max;
-      std::tie(Min, Max) = InfoCache.getEffectiveWavesPerEU(
-          *Caller,
-          {CallerInfo->getAssumed().getLower().getZExtValue(),
-           CallerInfo->getAssumed().getUpper().getZExtValue() - 1},
-          {AssumedGroupSize->getAssumed().getLower().getZExtValue(),
-           AssumedGroupSize->getAssumed().getUpper().getZExtValue() - 1});
-      ConstantRange CallerRange(APInt(32, Min), APInt(32, Max + 1));
+      unsigned Min = CallerInfo->getAssumed().getLower().getZExtValue();
+      unsigned Max = CallerInfo->getAssumed().getUpper().getZExtValue();
+      ConstantRange CallerRange(APInt(32, Min), APInt(32, Max));
       IntegerRangeState CallerRangeState(CallerRange);
       Change |= clampStateAndIndicateChange(this->getState(), CallerRangeState);
-
       return true;
     };
 
@@ -1329,6 +1293,59 @@ static void addPreloadKernArgHint(Function &F, TargetMachine &TM) {
   }
 }
 
+static void checkWavesPerEU(Module &M, TargetMachine &TM) {
+  for (Function &F : M) {
+    const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);
+
+    auto FlatWgrpSizeAttr =
+        AMDGPU::getIntegerPairAttribute(F, "amdgpu-flat-work-group-size");
+    auto WavesPerEUAttr = AMDGPU::getIntegerPairAttribute(
+        F, "amdgpu-waves-per-eu", /*OnlyFirstRequired=*/true);
+
+    unsigned MinWavesPerEU = ST.getMinWavesPerEU();
+    unsigned MaxWavesPerEU = ST.getMaxWavesPerEU();
+
+    unsigned MinFlatWgrpSize = 1U;
+    unsigned MaxFlatWgrpSize = 1024U;
+    if (FlatWgrpSizeAttr.has_value()) {
+      MinFlatWgrpSize = FlatWgrpSizeAttr->first;
+      MaxFlatWgrpSize = *(FlatWgrpSizeAttr->second);
+    }
+
+    // Start with the max range.
+    unsigned Min = MinWavesPerEU;
+    unsigned Max = MaxWavesPerEU;
+
+    // If the attribute exists, set them to the value from the attribute.
+    if (WavesPerEUAttr.has_value()) {
+      Min = WavesPerEUAttr->first;
+      if (WavesPerEUAttr->second.has_value())
+        Max = *(WavesPerEUAttr->second);
+    }
+
+    // Compute the range from flat workgroup size.
+    auto [MinFromFlatWgrpSize, MaxFromFlatWgrpSize] =
+        ST.getWavesPerEU(F, std::make_pair(MinFlatWgrpSize, MaxFlatWgrpSize));
+
+    // For the lower bound, we have to "tighten" it.
+    Min = std::max(Min, MinFromFlatWgrpSize);
+    // For the upper bound, we have to "extend" it.
+    Max = std::max(Max, MaxFromFlatWgrpSize);
+
+    // Clamp the range to the max range.
+    Min = std::max(Min, MinWavesPerEU);
+    Max = std::min(Max, MaxWavesPerEU);
+
+    // Update the attribute if it is not the max.
+    if (Min != MinWavesPerEU || Max != MaxWavesPerEU) {
+      SmallString<10> Buffer;
+      raw_svector_ostream OS(Buffer);
+      OS << Min << ',' << Max;
+      F.addFnAttr("amdgpu-waves-per-eu", OS.str());
+    }
+  }
+}
+
 static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM,
                     AMDGPUAttributorOptions Options,
                     ThinOrFullLTOPhase LTOPhase) {
@@ -1421,8 +1438,14 @@ static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM,
     }
   }
 
-  ChangeStatus Change = A.run();
-  return Change == ChangeStatus::CHANGED;
+  bool Changed = A.run() == ChangeStatus::CHANGED;
+
+  if (Changed && (LTOPhase == ThinOrFullLTOPhase::None ||
+                  LTOPhase == ThinOrFullLTOPhase::FullLTOPostLink ||
+                  LTOPhase == ThinOrFullLTOPhase::ThinLTOPostLink))
+    checkWavesPerEU(M, TM);
+
+  return Changed;
 }
 
 class AMDGPUAttributorLegacy : public ModulePass {
diff --git a/llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll b/llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll
index d316e10037757b..51f0348c050cd9 100644
--- a/llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll
+++ b/llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll
@@ -232,9 +232,9 @@ attributes #1 = { nounwind }
 ; AKF_HSA: attributes #[[ATTR0:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }
 ; AKF_HSA: attributes #[[ATTR1]] = { nounwind }
 ;.
-; ATTRIBUTOR_HSA: attributes #[[ATTR0:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }
-; ATTRIBUTOR_HSA: attributes #[[ATTR1]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
-; ATTRIBUTOR_HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
+; ATTRIBUTOR_HSA: attributes #[[ATTR0:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) "amdgpu-waves-per-eu"="4,10" }
+; ATTRIBUTOR_HSA: attributes #[[ATTR1]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
+; ATTRIBUTOR_HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
 ;.
 ; AKF_HSA: [[META0:![0-9]+]] = !{i32 1, !"amdhsa_code_object_version", i32 500}
 ;.
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll
index 33e7e7a7a019e3..e28caf8f455e21 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll
@@ -254,11 +254,11 @@ define amdgpu_kernel void @indirect_calls_none_agpr(i1 %cond) {
 
 attributes #0 = { "amdgpu-no-agpr" }
 ;.
-; CHECK: attributes #[[ATTR0]] = { "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR1]] = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR2]] = { "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR3:[0-9]+]] = { convergent nocallback nofree nosync nounwind willreturn memory(none) "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR4:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR5:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR0]] = { "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,8" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR1]] = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,8" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR2]] = { "amdgpu-waves-per-eu"="4,8" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR3:[0-9]+]] = { convergent nocallback nofree nosync nounwind willreturn memory(none) "amdgpu-waves-per-eu"="4,8" "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR4:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) "amdgpu-waves-per-eu"="4,8" "target-cpu"="gfx90a" }
+; CHECK: attributes #[[ATTR5:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) "amdgpu-waves-per-eu"="4,8" "target-cpu"="gfx90a" }
 ; CHECK: attributes #[[ATTR6]] = { "amdgpu-no-agpr" }
 ;.
diff --git a/llvm/test/CodeGen/AMDGPU/annotate-existing-abi-attributes.ll b/llvm/test/CodeGen/AMDGPU/annotate-existing-abi-attributes.ll
index 7e0208cd1f45aa..28722021e0448f 100644
--- a/llvm/test/CodeGen/AMDGPU/annotate-existing-abi-attributes.ll
+++ b/llvm/test/CodeGen/AMDGPU/annotate-existing-abi-attributes.ll
@@ -117,14 +117,14 @@ define void @call_no_dispatch_id() {
   ret void
 }
 ;.
-; CHECK: attributes #[[ATTR0]] = { "amdgpu-no-workitem-id-x" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR1]] = { "amdgpu-no-workitem-id-y" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR2]] = { "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR3]] = { "amdgpu-no-workgroup-id-x" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR4]] = { "amdgpu-no-workgroup-id-y" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR5]] = { "amdgpu-no-workgroup-id-z" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR6]] = { "amdgpu-no-dispatch-ptr" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR7]] = { "amdgpu-no-queue-ptr" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR8]] = { "amdgpu-no-implicitarg-ptr" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR9]] = { "amdgpu-no-dispatch-id" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR0]] = { "amdgpu-no-workitem-id-x" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR1]] = { "amdgpu-no-workitem-id-y" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR2]] = { "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR3]] = { "amdgpu-no-workgroup-id-x" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR4]] = { "amdgpu-no-workgroup-id-y" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR5]] = { "amdgpu-no-workgroup-id-z" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR6]] = { "amdgpu-no-dispatch-ptr" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR7]] = { "amdgpu-no-queue-ptr" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR8]] = { "amdgpu-no-implicitarg-ptr" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR9]] = { "amdgpu-no-dispatch-id" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
 ;.
diff --git a/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll b/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll
index ea3f08ede2c5dc..da0f710abe08cf 100644
--- a/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll
+++ b/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll
@@ -688,7 +688,7 @@ define void @func_call_asm() #3 {
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_call_asm
 ; ATTRIBUTOR_HSA-SAME: () #[[ATTR16]] {
-; ATTRIBUTOR_HSA-NEXT:    call void asm sideeffect "", ""() #[[ATTR26:[0-9]+]]
+; ATTRIBUTOR_HSA-NEXT:    call void asm sideeffect "", ""() #[[ATTR24:[0-9]+]]
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
   call void asm sideeffect "", ""() #3
@@ -717,7 +717,7 @@ define amdgpu_kernel void @func_kern_defined() #3 {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_kern_defined
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR17:[0-9]+]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR16]] {
 ; ATTRIBUTOR_HSA-NEXT:    call void @defined.func()
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
@@ -845,7 +845,7 @@ define amdgpu_kernel void @kern_sanitize_address() #4 {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@kern_sanitize_address
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR18:[0-9]+]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR17:[0-9]+]] {
 ; ATTRIBUTOR_HSA-NEXT:    store volatile i32 0, ptr addrspace(1) null, align 4
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
@@ -861,7 +861,7 @@ define void @func_sanitize_address() #4 {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_sanitize_address
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR18]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR17]] {
 ; ATTRIBUTOR_HSA-NEXT:    store volatile i32 0, ptr addrspace(1) null, align 4
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
@@ -877,7 +877,7 @@ define void @func_indirect_sanitize_address() #3 {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@func_indirect_sanitize_address
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR19:[0-9]+]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR18:[0-9]+]] {
 ; ATTRIBUTOR_HSA-NEXT:    call void @func_sanitize_address()
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
@@ -893,7 +893,7 @@ define amdgpu_kernel void @kern_indirect_sanitize_address() #3 {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@kern_indirect_sanitize_address
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR19]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR18]] {
 ; ATTRIBUTOR_HSA-NEXT:    call void @func_sanitize_address()
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
@@ -928,7 +928,7 @@ define internal void @enqueue_block_def() #6 {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@enqueue_block_def
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR22:[0-9]+]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR21:[0-9]+]] {
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
   ret void
@@ -941,7 +941,7 @@ define amdgpu_kernel void @kern_call_enqueued_block_decl() {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@kern_call_enqueued_block_decl
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR23:[0-9]+]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR22:[0-9]+]] {
 ; ATTRIBUTOR_HSA-NEXT:    call void @enqueue_block_decl()
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
@@ -956,7 +956,7 @@ define amdgpu_kernel void @kern_call_enqueued_block_def() {
 ; AKF_HSA-NEXT:    ret void
 ;
 ; ATTRIBUTOR_HSA-LABEL: define {{[^@]+}}@kern_call_enqueued_block_def
-; ATTRIBUTOR_HSA-SAME: () #[[ATTR24:[0-9]+]] {
+; ATTRIBUTOR_HSA-SAME: () #[[ATTR23:[0-9]+]] {
 ; ATTRIBUTOR_HSA-NEXT:    call void @enqueue_block_def()
 ; ATTRIBUTOR_HSA-NEXT:    ret void
 ;
@@ -969,7 +969,7 @@ define void @unused_enqueue_block() {
 ; AKF_HSA-NEXT:    ret void
...
[truncated]

@shiltian
Copy link
Contributor Author

I'll also make more tests if the code changes look reasonable.

@shiltian shiltian changed the title [AMDGPU][Attributor] Rework calculation of waves per eu [AMDGPU][Attributor] Rework update of waves per eu Jan 22, 2025
@shiltian shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from 0b40b9b to 3c03967 Compare February 4, 2025 17:59
@shiltian shiltian force-pushed the users/shiltian/pass-stage-to-amdgpuattributor branch from 34fd831 to 1c1acf3 Compare February 4, 2025 17:59
@shiltian shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from 3c03967 to b063f21 Compare February 6, 2025 17:16
@shiltian shiltian force-pushed the users/shiltian/pass-stage-to-amdgpuattributor branch 2 times, most recently from 33c1672 to f044145 Compare February 6, 2025 17:26
@shiltian shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from b063f21 to 66cd929 Compare February 6, 2025 17:26
@shiltian shiltian changed the title [AMDGPU][Attributor] Rework update of waves per eu [AMDGPU][Attributor] Rework update of AAAMDWavesPerEU Feb 6, 2025
@shiltian shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from d1dd771 to fafefe4 Compare February 10, 2025 16:20
@shiltian shiltian force-pushed the users/shiltian/pass-stage-to-amdgpuattributor branch from 55196cc to 9b9a320 Compare February 10, 2025 16:20
@shiltian
Copy link
Contributor Author

ping

@arsenm @jdoerfert

@jplehr
Copy link
Contributor

jplehr commented Feb 22, 2025

Out of curiosity: I believe this is case three, why would we not want to use the user-provided values?

@arsenm
Copy link
Contributor

arsenm commented Feb 23, 2025

Out of curiosity: I believe this is case three, why would we not want to use the user-provided values?

A library use does not have the usage context. It may have specified a bounds to avoid negatively impacting occupancy of some unknown calling kernel. If we know the bounds of the concrete calling kernels, the function should specialize for its bound

@shiltian shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from fafefe4 to 64d1817 Compare March 21, 2025 18:56
@shiltian shiltian force-pushed the users/shiltian/pass-stage-to-amdgpuattributor branch from 9b9a320 to 3118593 Compare March 21, 2025 18:56
Copy link

github-actions bot commented Mar 21, 2025

✅ With the latest revision this PR passed the undef deprecator.

@shiltian shiltian force-pushed the users/shiltian/pass-stage-to-amdgpuattributor branch from 3118593 to c3685eb Compare March 24, 2025 02:37
@shiltian shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from 64d1817 to 2750f73 Compare March 24, 2025 02:37
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Mar 24, 2025
@shiltian shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from 2750f73 to 7167ec8 Compare March 24, 2025 02:39
@shiltian shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from 1a376e2 to 0532ee5 Compare March 26, 2025 16:51
@shiltian shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from 0532ee5 to f89193e Compare April 8, 2025 02:39
@shiltian shiltian force-pushed the users/shiltian/pass-stage-to-amdgpuattributor branch 2 times, most recently from b7efaf3 to aafd825 Compare April 8, 2025 03:17
@shiltian shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from f89193e to eff8de0 Compare April 8, 2025 03:17
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests look like they were reverted to some old revision and lost updates

/// during attributor run because the two attributes grow in opposite direction,
/// we should not use any intermediate value to calculate waves per eu until we
/// have a determined flat workgroup size.
static void updateWavesPerEU(Module &M, TargetMachine &TM) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just occurred to me that AMDGPUAttributor is only a Module pass. Should we add or replace it with a CGSCC pass?

Copy link
Contributor Author

@shiltian shiltian May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add a CGSCC pass.

After a second thought, no, since we don't want to run it multiple times.

@@ -1408,8 +1433,14 @@ static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM,
}
}

ChangeStatus Change = A.run();
return Change == ChangeStatus::CHANGED;
bool Changed = A.run() == ChangeStatus::CHANGED;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be more refined on waves per eu or flat workgroup size changing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't follow your question. Can you expatiate it?

@shiltian shiltian force-pushed the users/shiltian/pass-stage-to-amdgpuattributor branch from aafd825 to 35fda65 Compare May 1, 2025 18:14
@shiltian shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch 5 times, most recently from 66d27ed to 1e140dd Compare May 1, 2025 18:26
@shiltian shiltian force-pushed the users/shiltian/pass-stage-to-amdgpuattributor branch from 35fda65 to ed136cc Compare May 1, 2025 22:33
@shiltian shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from 1e140dd to 7c79a09 Compare May 1, 2025 22:33
Base automatically changed from users/shiltian/pass-stage-to-amdgpuattributor to main May 2, 2025 15:33
@shiltian shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from 7c79a09 to 6b7be9c Compare May 2, 2025 15:38
@shiltian
Copy link
Contributor Author

shiltian commented May 7, 2025

ping

@shiltian shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from 6b7be9c to c235997 Compare May 14, 2025 16:59
@shiltian shiltian requested a review from arsenm May 16, 2025 13:02
@shiltian shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch 3 times, most recently from a24f0d0 to 72851ae Compare May 16, 2025 23:24
Copy link
Member

@jdoerfert jdoerfert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit, feel free to ignore.

@shiltian shiltian force-pushed the users/shiltian/waves-per-eu-aa-rework branch from 72851ae to 352a09a Compare May 17, 2025 02:38
@shiltian shiltian merged commit 578741b into main May 17, 2025
11 checks passed
@shiltian shiltian deleted the users/shiltian/waves-per-eu-aa-rework branch May 17, 2025 05:01
@llvm-ci
Copy link
Collaborator

llvm-ci commented May 17, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-gcc-ubuntu-no-asserts running on doug-worker-6 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/202/builds/1279

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: tools/dsymutil/X86/op-convert-offset.test' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
warning: /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs/private/tmp/op-convert-offset/op-convert-offset.o: timestamp mismatch between object file (2025-03-11 17:27:45.671531848) and debug map (2022-07-12 20:49:30.000000000)
warning: /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs/private/tmp/op-convert-offset/op-convert-offset.o: timestamp mismatch between object file (2025-03-11 17:27:45.671531848) and debug map (2022-07-12 20:49:30.000000000)
warning: cann't read address attribute value.
note: while processing op-convert-offset1.c

--
Command Output (stderr):
--
/home/buildbot/buildbot-root/gcc-no-asserts/build/bin/dsymutil -oso-prepend-path /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs/private/tmp/op-convert-offset/op-convert-offset -o /home/buildbot/buildbot-root/gcc-no-asserts/build/test/tools/dsymutil/X86/Output/op-convert-offset.test.tmp.dSYM 2>&1 # RUN: at line 23
+ /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/dsymutil -oso-prepend-path /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs/private/tmp/op-convert-offset/op-convert-offset -o /home/buildbot/buildbot-root/gcc-no-asserts/build/test/tools/dsymutil/X86/Output/op-convert-offset.test.tmp.dSYM
/home/buildbot/buildbot-root/gcc-no-asserts/build/bin/llvm-dwarfdump /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs/private/tmp/op-convert-offset/op-convert-offset.o 2>&1 | /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/FileCheck /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/op-convert-offset.test --check-prefix OBJ # RUN: at line 24
+ /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/llvm-dwarfdump /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs/private/tmp/op-convert-offset/op-convert-offset.o
+ /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/FileCheck /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/op-convert-offset.test --check-prefix OBJ
/home/buildbot/buildbot-root/gcc-no-asserts/build/bin/llvm-dwarfdump /home/buildbot/buildbot-root/gcc-no-asserts/build/test/tools/dsymutil/X86/Output/op-convert-offset.test.tmp.dSYM 2>&1 | /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/FileCheck /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/op-convert-offset.test --check-prefix DSYM # RUN: at line 25
+ /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/llvm-dwarfdump /home/buildbot/buildbot-root/gcc-no-asserts/build/test/tools/dsymutil/X86/Output/op-convert-offset.test.tmp.dSYM
+ /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/FileCheck /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/op-convert-offset.test --check-prefix DSYM
/home/buildbot/buildbot-root/gcc-no-asserts/build/bin/dsymutil --linker parallel -oso-prepend-path /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs   /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs/private/tmp/op-convert-offset/op-convert-offset   -o /home/buildbot/buildbot-root/gcc-no-asserts/build/test/tools/dsymutil/X86/Output/op-convert-offset.test.tmp.dSYM 2>&1 # RUN: at line 27
+ /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/dsymutil --linker parallel -oso-prepend-path /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs/private/tmp/op-convert-offset/op-convert-offset -o /home/buildbot/buildbot-root/gcc-no-asserts/build/test/tools/dsymutil/X86/Output/op-convert-offset.test.tmp.dSYM
/home/buildbot/buildbot-root/gcc-no-asserts/build/bin/llvm-dwarfdump    /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs/private/tmp/op-convert-offset/op-convert-offset.o 2>&1    | /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/FileCheck /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/op-convert-offset.test --check-prefix OBJ # RUN: at line 30
+ /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/llvm-dwarfdump /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/../Inputs/private/tmp/op-convert-offset/op-convert-offset.o
+ /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/FileCheck /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/op-convert-offset.test --check-prefix OBJ
/home/buildbot/buildbot-root/gcc-no-asserts/build/bin/llvm-dwarfdump /home/buildbot/buildbot-root/gcc-no-asserts/build/test/tools/dsymutil/X86/Output/op-convert-offset.test.tmp.dSYM 2>&1 | /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/FileCheck /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/op-convert-offset.test --check-prefix DSYM # RUN: at line 33
+ /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/llvm-dwarfdump /home/buildbot/buildbot-root/gcc-no-asserts/build/test/tools/dsymutil/X86/Output/op-convert-offset.test.tmp.dSYM
+ /home/buildbot/buildbot-root/gcc-no-asserts/build/bin/FileCheck /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/op-convert-offset.test --check-prefix DSYM
�[1m/home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/op-convert-offset.test:45:7: �[0m�[0;1;31merror: �[0m�[1mDSYM: expected string not found in input
�[0mDSYM: 0x00000084: DW_TAG_base_type
�[0;1;32m      ^
�[0m�[1m<stdin>:1:1: �[0m�[0;1;30mnote: �[0m�[1mscanning from here
�[0m/home/buildbot/buildbot-root/gcc-no-asserts/build/test/tools/dsymutil/X86/Output/op-convert-offset.test.tmp.dSYM/Contents/Resources/DWARF/op-convert-offset: file format Mach-O 64-bit x86-64
�[0;1;32m^
�[0m�[1m<stdin>:16:1: �[0m�[0;1;30mnote: �[0m�[1mpossible intended match here
�[0m0x0000001e: DW_TAG_base_type
�[0;1;32m^
�[0m
Input file: <stdin>
Check file: /home/buildbot/buildbot-root/gcc-no-asserts/llvm-project/llvm/test/tools/dsymutil/X86/op-convert-offset.test

-dump-input=help explains the following input dump.

Input was:
<<<<<<
�[1m�[0m�[0;1;30m            1: �[0m�[1m�[0;1;46m/home/buildbot/buildbot-root/gcc-no-asserts/build/test/tools/dsymutil/X86/Output/op-convert-offset.test.tmp.dSYM/Contents/Resources/DWARF/op-convert-offset: file format Mach-O 64-bit x86-64 �[0m
�[0;1;31mcheck:45'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
�[0m�[0;1;30m            2: �[0m�[1m�[0;1;46m �[0m
�[0;1;31mcheck:45'0     ~
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AMDGPU][Attributor] OpenMC Performance Regression on AMD GPU
6 participants