Skip to content

Commit 578741b

Browse files
authored
[AMDGPU][Attributor] Rework update of AAAMDWavesPerEU (#123995)
Currently, we use `AAAMDWavesPerEU` to iteratively update values based on attributes from the associated function, potentially propagating user-annotated values, along with `AAAMDFlatWorkGroupSize`. Similarly, we have `AAAMDFlatWorkGroupSize`. However, since the value calculated through the flat workgroup size always dominates the user annotation (i.e., the attribute), running `AAAMDWavesPerEU` iteratively is unnecessary if no user-annotated value exists. This PR completely rewrites how the `amdgpu-waves-per-eu` attribute is handled in `AMDGPUAttributor`. The key changes are as follows: - `AAAMDFlatWorkGroupSize` remains unchanged. - `AAAMDWavesPerEU` now only propagates user-annotated values. - A new function is added to check and update `amdgpu-waves-per-eu` based on the following rules: - No waves per eu, no flat workgroup size: Assume a flat workgroup size of `1,1024` and compute waves per eu based on this. - No waves per eu, flat workgroup size exists: Use the provided flat workgroup size to compute waves-per-eu. - Waves per eu exists, no flat workgroup size: This is a tricky case. In this PR, we assume a flat workgroup size of `1,1024`, but this can be adjusted if a different approach is preferred. Alternatively, we could directly use the user-annotated value. - Both waves per eu and flat workgroup size exist: If there’s a conflict, the value derived from the flat workgroup size takes precedence over waves per eu. This PR also updates the logic for merging two waves per eu pairs. The current implementation, which uses `clampStateAndIndicateChange` to compute a union, might not be ideal. If we think from ensure proper resource allocation perspective, for instance, if one pair specifies a minimum of 2 waves per eu, and another specifies a minimum of 4, we should guarantee that 4 waves per eu can be supported, as failing to do so could result in excessive resource allocation per wave. A similar principle applies to the upper bound. Thus, the PR uses the following approach for merging two pairs, `lo_a,up_a` and `lo_b,up_b`: `max(lo_a, lo_b), max(up_a, up_b)`. This ensures that resource allocation adheres to the stricter constraints from both inputs. Fix #123092.
1 parent e66cecd commit 578741b

33 files changed

+341
-279
lines changed

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp

Lines changed: 94 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -1113,47 +1113,25 @@ struct AAAMDWavesPerEU : public AAAMDSizeRangeAttribute {
11131113
Function *F = getAssociatedFunction();
11141114
auto &InfoCache = static_cast<AMDGPUInformationCache &>(A.getInfoCache());
11151115

1116-
auto TakeRange = [&](std::pair<unsigned, unsigned> R) {
1117-
auto [Min, Max] = R;
1118-
ConstantRange Range(APInt(32, Min), APInt(32, Max + 1));
1119-
IntegerRangeState RangeState(Range);
1120-
clampStateAndIndicateChange(this->getState(), RangeState);
1121-
indicateOptimisticFixpoint();
1122-
};
1123-
1124-
std::pair<unsigned, unsigned> MaxWavesPerEURange{
1125-
1U, InfoCache.getMaxWavesPerEU(*F)};
1126-
11271116
// If the attribute exists, we will honor it if it is not the default.
11281117
if (auto Attr = InfoCache.getWavesPerEUAttr(*F)) {
1118+
std::pair<unsigned, unsigned> MaxWavesPerEURange{
1119+
1U, InfoCache.getMaxWavesPerEU(*F)};
11291120
if (*Attr != MaxWavesPerEURange) {
1130-
TakeRange(*Attr);
1121+
auto [Min, Max] = *Attr;
1122+
ConstantRange Range(APInt(32, Min), APInt(32, Max + 1));
1123+
IntegerRangeState RangeState(Range);
1124+
this->getState() = RangeState;
1125+
indicateOptimisticFixpoint();
11311126
return;
11321127
}
11331128
}
11341129

1135-
// Unlike AAAMDFlatWorkGroupSize, it's getting trickier here. Since the
1136-
// calculation of waves per EU involves flat work group size, we can't
1137-
// simply use an assumed flat work group size as a start point, because the
1138-
// update of flat work group size is in an inverse direction of waves per
1139-
// EU. However, we can still do something if it is an entry function. Since
1140-
// an entry function is a terminal node, and flat work group size either
1141-
// from attribute or default will be used anyway, we can take that value and
1142-
// calculate the waves per EU based on it. This result can't be updated by
1143-
// no means, but that could still allow us to propagate it.
1144-
if (AMDGPU::isEntryFunctionCC(F->getCallingConv())) {
1145-
std::pair<unsigned, unsigned> FlatWorkGroupSize;
1146-
if (auto Attr = InfoCache.getFlatWorkGroupSizeAttr(*F))
1147-
FlatWorkGroupSize = *Attr;
1148-
else
1149-
FlatWorkGroupSize = InfoCache.getDefaultFlatWorkGroupSize(*F);
1150-
TakeRange(InfoCache.getEffectiveWavesPerEU(*F, MaxWavesPerEURange,
1151-
FlatWorkGroupSize));
1152-
}
1130+
if (AMDGPU::isEntryFunctionCC(F->getCallingConv()))
1131+
indicatePessimisticFixpoint();
11531132
}
11541133

11551134
ChangeStatus updateImpl(Attributor &A) override {
1156-
auto &InfoCache = static_cast<AMDGPUInformationCache &>(A.getInfoCache());
11571135
ChangeStatus Change = ChangeStatus::UNCHANGED;
11581136

11591137
auto CheckCallSite = [&](AbstractCallSite CS) {
@@ -1162,24 +1140,21 @@ struct AAAMDWavesPerEU : public AAAMDSizeRangeAttribute {
11621140
LLVM_DEBUG(dbgs() << '[' << getName() << "] Call " << Caller->getName()
11631141
<< "->" << Func->getName() << '\n');
11641142

1165-
const auto *CallerInfo = A.getAAFor<AAAMDWavesPerEU>(
1143+
const auto *CallerAA = A.getAAFor<AAAMDWavesPerEU>(
11661144
*this, IRPosition::function(*Caller), DepClassTy::REQUIRED);
1167-
const auto *AssumedGroupSize = A.getAAFor<AAAMDFlatWorkGroupSize>(
1168-
*this, IRPosition::function(*Func), DepClassTy::REQUIRED);
1169-
if (!CallerInfo || !AssumedGroupSize || !CallerInfo->isValidState() ||
1170-
!AssumedGroupSize->isValidState())
1145+
if (!CallerAA || !CallerAA->isValidState())
11711146
return false;
11721147

1173-
unsigned Min, Max;
1174-
std::tie(Min, Max) = InfoCache.getEffectiveWavesPerEU(
1175-
*Caller,
1176-
{CallerInfo->getAssumed().getLower().getZExtValue(),
1177-
CallerInfo->getAssumed().getUpper().getZExtValue() - 1},
1178-
{AssumedGroupSize->getAssumed().getLower().getZExtValue(),
1179-
AssumedGroupSize->getAssumed().getUpper().getZExtValue() - 1});
1180-
ConstantRange CallerRange(APInt(32, Min), APInt(32, Max + 1));
1181-
IntegerRangeState CallerRangeState(CallerRange);
1182-
Change |= clampStateAndIndicateChange(this->getState(), CallerRangeState);
1148+
ConstantRange Assumed = getAssumed();
1149+
unsigned Min = std::max(Assumed.getLower().getZExtValue(),
1150+
CallerAA->getAssumed().getLower().getZExtValue());
1151+
unsigned Max = std::max(Assumed.getUpper().getZExtValue(),
1152+
CallerAA->getAssumed().getUpper().getZExtValue());
1153+
ConstantRange Range(APInt(32, Min), APInt(32, Max));
1154+
IntegerRangeState RangeState(Range);
1155+
getState() = RangeState;
1156+
Change |= getState() == Assumed ? ChangeStatus::UNCHANGED
1157+
: ChangeStatus::CHANGED;
11831158

11841159
return true;
11851160
};
@@ -1323,6 +1298,74 @@ struct AAAMDGPUNoAGPR
13231298

13241299
const char AAAMDGPUNoAGPR::ID = 0;
13251300

1301+
/// Performs the final check and updates the 'amdgpu-waves-per-eu' attribute
1302+
/// based on the finalized 'amdgpu-flat-work-group-size' attribute.
1303+
/// Both attributes start with narrow ranges that expand during iteration.
1304+
/// However, a narrower flat-workgroup-size leads to a wider waves-per-eu range,
1305+
/// preventing optimal updates later. Therefore, waves-per-eu can't be updated
1306+
/// with intermediate values during the attributor run. We defer the
1307+
/// finalization of waves-per-eu until after the flat-workgroup-size is
1308+
/// finalized.
1309+
/// TODO: Remove this and move similar logic back into the attributor run once
1310+
/// we have a better representation for waves-per-eu.
1311+
static bool updateWavesPerEU(Module &M, TargetMachine &TM) {
1312+
bool Changed = false;
1313+
1314+
LLVMContext &Ctx = M.getContext();
1315+
1316+
for (Function &F : M) {
1317+
if (F.isDeclaration())
1318+
continue;
1319+
1320+
const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);
1321+
1322+
std::optional<std::pair<unsigned, std::optional<unsigned>>>
1323+
FlatWgrpSizeAttr =
1324+
AMDGPU::getIntegerPairAttribute(F, "amdgpu-flat-work-group-size");
1325+
1326+
unsigned MinWavesPerEU = ST.getMinWavesPerEU();
1327+
unsigned MaxWavesPerEU = ST.getMaxWavesPerEU();
1328+
1329+
unsigned MinFlatWgrpSize = ST.getMinFlatWorkGroupSize();
1330+
unsigned MaxFlatWgrpSize = ST.getMaxFlatWorkGroupSize();
1331+
if (FlatWgrpSizeAttr.has_value()) {
1332+
MinFlatWgrpSize = FlatWgrpSizeAttr->first;
1333+
MaxFlatWgrpSize = *(FlatWgrpSizeAttr->second);
1334+
}
1335+
1336+
// Start with the "best" range.
1337+
unsigned Min = MinWavesPerEU;
1338+
unsigned Max = MinWavesPerEU;
1339+
1340+
// Compute the range from flat workgroup size. `getWavesPerEU` will also
1341+
// account for the 'amdgpu-waves-er-eu' attribute.
1342+
auto [MinFromFlatWgrpSize, MaxFromFlatWgrpSize] =
1343+
ST.getWavesPerEU(F, {MinFlatWgrpSize, MaxFlatWgrpSize});
1344+
1345+
// For the lower bound, we have to "tighten" it.
1346+
Min = std::max(Min, MinFromFlatWgrpSize);
1347+
// For the upper bound, we have to "extend" it.
1348+
Max = std::max(Max, MaxFromFlatWgrpSize);
1349+
1350+
// Clamp the range to the max range.
1351+
Min = std::max(Min, MinWavesPerEU);
1352+
Max = std::min(Max, MaxWavesPerEU);
1353+
1354+
// Update the attribute if it is not the max.
1355+
if (Min != MinWavesPerEU || Max != MaxWavesPerEU) {
1356+
SmallString<10> Buffer;
1357+
raw_svector_ostream OS(Buffer);
1358+
OS << Min << ',' << Max;
1359+
Attribute OldAttr = F.getFnAttribute("amdgpu-waves-per-eu");
1360+
Attribute NewAttr = Attribute::get(Ctx, "amdgpu-waves-per-eu", OS.str());
1361+
F.addFnAttr(NewAttr);
1362+
Changed |= OldAttr == NewAttr;
1363+
}
1364+
}
1365+
1366+
return Changed;
1367+
}
1368+
13261369
static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM,
13271370
AMDGPUAttributorOptions Options,
13281371
ThinOrFullLTOPhase LTOPhase) {
@@ -1396,8 +1439,11 @@ static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM,
13961439
}
13971440
}
13981441

1399-
ChangeStatus Change = A.run();
1400-
return Change == ChangeStatus::CHANGED;
1442+
bool Changed = A.run() == ChangeStatus::CHANGED;
1443+
1444+
Changed |= updateWavesPerEU(M, TM);
1445+
1446+
return Changed;
14011447
}
14021448

14031449
class AMDGPUAttributorLegacy : public ModulePass {

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -216,6 +216,15 @@ AMDGPUSubtarget::getWavesPerEU(const Function &F) const {
216216
return getWavesPerEU(FlatWorkGroupSizes, LDSBytes, F);
217217
}
218218

219+
std::pair<unsigned, unsigned> AMDGPUSubtarget::getWavesPerEU(
220+
const Function &F, std::pair<unsigned, unsigned> FlatWorkGroupSizes) const {
221+
// Minimum number of bytes allocated in the LDS.
222+
unsigned LDSBytes = AMDGPU::getIntegerPairAttribute(F, "amdgpu-lds-size",
223+
{0, UINT32_MAX}, true)
224+
.first;
225+
return getWavesPerEU(FlatWorkGroupSizes, LDSBytes, F);
226+
}
227+
219228
std::pair<unsigned, unsigned>
220229
AMDGPUSubtarget::getWavesPerEU(std::pair<unsigned, unsigned> FlatWorkGroupSizes,
221230
unsigned LDSBytes, const Function &F) const {

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,13 @@ class AMDGPUSubtarget {
108108
/// size, register usage, and/or lds usage.
109109
std::pair<unsigned, unsigned> getWavesPerEU(const Function &F) const;
110110

111+
/// Overload which uses the specified values for the flat work group sizes,
112+
/// rather than querying the function itself. \p FlatWorkGroupSizes Should
113+
/// correspond to the function's value for getFlatWorkGroupSizes.
114+
std::pair<unsigned, unsigned>
115+
getWavesPerEU(const Function &F,
116+
std::pair<unsigned, unsigned> FlatWorkGroupSizes) const;
117+
111118
/// Overload which uses the specified values for the flat workgroup sizes and
112119
/// LDS space rather than querying the function itself. \p FlatWorkGroupSizes
113120
/// should correspond to the function's value for getFlatWorkGroupSizes and \p

llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,6 @@ attributes #1 = { nounwind }
169169

170170
;.
171171
; HSA: attributes #[[ATTR0:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }
172-
; HSA: attributes #[[ATTR1]] = { nounwind "amdgpu-agpr-alloc"="0" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
173-
; HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-agpr-alloc"="0" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
172+
; HSA: attributes #[[ATTR1]] = { nounwind "amdgpu-agpr-alloc"="0" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
173+
; HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-agpr-alloc"="0" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
174174
;.

llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll

Lines changed: 14 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ declare void @unknown()
105105

106106
define amdgpu_kernel void @kernel_calls_extern() {
107107
; CHECK-LABEL: define amdgpu_kernel void @kernel_calls_extern(
108-
; CHECK-SAME: ) #[[ATTR2:[0-9]+]] {
108+
; CHECK-SAME: ) #[[ATTR3:[0-9]+]] {
109109
; CHECK-NEXT: call void @unknown()
110110
; CHECK-NEXT: ret void
111111
;
@@ -115,8 +115,8 @@ define amdgpu_kernel void @kernel_calls_extern() {
115115

116116
define amdgpu_kernel void @kernel_calls_extern_marked_callsite() {
117117
; CHECK-LABEL: define amdgpu_kernel void @kernel_calls_extern_marked_callsite(
118-
; CHECK-SAME: ) #[[ATTR2]] {
119-
; CHECK-NEXT: call void @unknown() #[[ATTR6:[0-9]+]]
118+
; CHECK-SAME: ) #[[ATTR3]] {
119+
; CHECK-NEXT: call void @unknown() #[[ATTR7:[0-9]+]]
120120
; CHECK-NEXT: ret void
121121
;
122122
call void @unknown() #0
@@ -125,7 +125,7 @@ define amdgpu_kernel void @kernel_calls_extern_marked_callsite() {
125125

126126
define amdgpu_kernel void @kernel_calls_indirect(ptr %indirect) {
127127
; CHECK-LABEL: define amdgpu_kernel void @kernel_calls_indirect(
128-
; CHECK-SAME: ptr [[INDIRECT:%.*]]) #[[ATTR2]] {
128+
; CHECK-SAME: ptr [[INDIRECT:%.*]]) #[[ATTR3]] {
129129
; CHECK-NEXT: call void [[INDIRECT]]()
130130
; CHECK-NEXT: ret void
131131
;
@@ -135,8 +135,8 @@ define amdgpu_kernel void @kernel_calls_indirect(ptr %indirect) {
135135

136136
define amdgpu_kernel void @kernel_calls_indirect_marked_callsite(ptr %indirect) {
137137
; CHECK-LABEL: define amdgpu_kernel void @kernel_calls_indirect_marked_callsite(
138-
; CHECK-SAME: ptr [[INDIRECT:%.*]]) #[[ATTR2]] {
139-
; CHECK-NEXT: call void [[INDIRECT]]() #[[ATTR6]]
138+
; CHECK-SAME: ptr [[INDIRECT:%.*]]) #[[ATTR3]] {
139+
; CHECK-NEXT: call void [[INDIRECT]]() #[[ATTR7]]
140140
; CHECK-NEXT: ret void
141141
;
142142
call void %indirect() #0
@@ -254,11 +254,12 @@ define amdgpu_kernel void @indirect_calls_none_agpr(i1 %cond) {
254254

255255
attributes #0 = { "amdgpu-agpr-alloc"="0" }
256256
;.
257-
; CHECK: attributes #[[ATTR0]] = { "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
258-
; CHECK: attributes #[[ATTR1]] = { "amdgpu-agpr-alloc"="0" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
259-
; CHECK: attributes #[[ATTR2]] = { "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
260-
; CHECK: attributes #[[ATTR3:[0-9]+]] = { convergent nocallback nofree nosync nounwind willreturn memory(none) "target-cpu"="gfx90a" }
261-
; CHECK: attributes #[[ATTR4:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) "target-cpu"="gfx90a" }
262-
; CHECK: attributes #[[ATTR5:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) "target-cpu"="gfx90a" }
263-
; CHECK: attributes #[[ATTR6]] = { "amdgpu-agpr-alloc"="0" }
257+
; CHECK: attributes #[[ATTR0]] = { "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,8" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
258+
; CHECK: attributes #[[ATTR1]] = { "amdgpu-agpr-alloc"="0" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,8" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
259+
; CHECK: attributes #[[ATTR2:[0-9]+]] = { "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
260+
; CHECK: attributes #[[ATTR3]] = { "amdgpu-waves-per-eu"="4,8" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
261+
; CHECK: attributes #[[ATTR4:[0-9]+]] = { convergent nocallback nofree nosync nounwind willreturn memory(none) "target-cpu"="gfx90a" }
262+
; CHECK: attributes #[[ATTR5:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) "target-cpu"="gfx90a" }
263+
; CHECK: attributes #[[ATTR6:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) "target-cpu"="gfx90a" }
264+
; CHECK: attributes #[[ATTR7]] = { "amdgpu-agpr-alloc"="0" }
264265
;.

llvm/test/CodeGen/AMDGPU/annotate-existing-abi-attributes.ll

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -117,14 +117,14 @@ define void @call_no_dispatch_id() {
117117
ret void
118118
}
119119
;.
120-
; CHECK: attributes #[[ATTR0]] = { "amdgpu-no-workitem-id-x" "uniform-work-group-size"="false" }
121-
; CHECK: attributes #[[ATTR1]] = { "amdgpu-no-workitem-id-y" "uniform-work-group-size"="false" }
122-
; CHECK: attributes #[[ATTR2]] = { "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
123-
; CHECK: attributes #[[ATTR3]] = { "amdgpu-no-workgroup-id-x" "uniform-work-group-size"="false" }
124-
; CHECK: attributes #[[ATTR4]] = { "amdgpu-no-workgroup-id-y" "uniform-work-group-size"="false" }
125-
; CHECK: attributes #[[ATTR5]] = { "amdgpu-no-workgroup-id-z" "uniform-work-group-size"="false" }
126-
; CHECK: attributes #[[ATTR6]] = { "amdgpu-no-dispatch-ptr" "uniform-work-group-size"="false" }
127-
; CHECK: attributes #[[ATTR7]] = { "amdgpu-no-queue-ptr" "uniform-work-group-size"="false" }
128-
; CHECK: attributes #[[ATTR8]] = { "amdgpu-no-implicitarg-ptr" "uniform-work-group-size"="false" }
129-
; CHECK: attributes #[[ATTR9]] = { "amdgpu-no-dispatch-id" "uniform-work-group-size"="false" }
120+
; CHECK: attributes #[[ATTR0]] = { "amdgpu-no-workitem-id-x" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
121+
; CHECK: attributes #[[ATTR1]] = { "amdgpu-no-workitem-id-y" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
122+
; CHECK: attributes #[[ATTR2]] = { "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
123+
; CHECK: attributes #[[ATTR3]] = { "amdgpu-no-workgroup-id-x" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
124+
; CHECK: attributes #[[ATTR4]] = { "amdgpu-no-workgroup-id-y" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
125+
; CHECK: attributes #[[ATTR5]] = { "amdgpu-no-workgroup-id-z" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
126+
; CHECK: attributes #[[ATTR6]] = { "amdgpu-no-dispatch-ptr" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
127+
; CHECK: attributes #[[ATTR7]] = { "amdgpu-no-queue-ptr" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
128+
; CHECK: attributes #[[ATTR8]] = { "amdgpu-no-implicitarg-ptr" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
129+
; CHECK: attributes #[[ATTR9]] = { "amdgpu-no-dispatch-id" "amdgpu-waves-per-eu"="4,10" "uniform-work-group-size"="false" }
130130
;.

0 commit comments

Comments
 (0)