-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[AMDGPU][Attributor] Make AAAMDFlatWorkGroupSize
honor existing attribute
#114357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
@llvm/pr-subscribers-backend-amdgpu Author: Shilei Tian (shiltian) ChangesIf a function has Full diff: https://github.com/llvm/llvm-project/pull/114357.diff 1 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index 6a69b9d2bfc716..53e44bb5e094fd 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -168,7 +168,15 @@ class AMDGPUInformationCache : public InformationCache {
return ST.supportsGetDoorbellID();
}
- std::pair<unsigned, unsigned> getFlatWorkGroupSizes(const Function &F) {
+ std::optional<std::pair<unsigned, unsigned>>
+ getFlatWorkGroupSizeAttr(const Function &F) const {
+ Attribute A = F.getFnAttribute("amdgpu-flat-work-group-size");
+ if (!A.isStringAttribute())
+ return std::nullopt;
+ return getFlatWorkGroupSizes(F);
+ }
+
+ std::pair<unsigned, unsigned> getFlatWorkGroupSizes(const Function &F) const {
const GCNSubtarget &ST = TM.getSubtarget<GCNSubtarget>(F);
return ST.getFlatWorkGroupSizes(F);
}
@@ -707,8 +715,7 @@ struct AAAMDSizeRangeAttribute
/// See AbstractAttribute::trackStatistics()
void trackStatistics() const override {}
- template <class AttributeImpl>
- ChangeStatus updateImplImpl(Attributor &A) {
+ template <class AttributeImpl> ChangeStatus updateImplImpl(Attributor &A) {
ChangeStatus Change = ChangeStatus::UNCHANGED;
auto CheckCallSite = [&](AbstractCallSite CS) {
@@ -728,12 +735,43 @@ struct AAAMDSizeRangeAttribute
};
bool AllCallSitesKnown = true;
- if (!A.checkForAllCallSites(CheckCallSite, *this, true, AllCallSitesKnown))
+ if (!A.checkForAllCallSites(CheckCallSite, *this,
+ /*RequireAllCallSites=*/true,
+ AllCallSitesKnown))
return indicatePessimisticFixpoint();
return Change;
}
+ /// Clamp the assumed range to the default value ([Min, Max]) and emit the
+ /// attribute if it is not same as default.
+ ChangeStatus
+ emitAttributeIfNotDefaultAfterClamp(Attributor &A,
+ std::pair<unsigned, unsigned> Default) {
+ auto [Min, Max] = Default;
+ unsigned Lower = getAssumed().getLower().getZExtValue();
+ unsigned Upper = getAssumed().getUpper().getZExtValue();
+
+ // Clamp the range to the default value.
+ if (Lower < Min)
+ Lower = Min;
+ if (Upper > Max + 1)
+ Upper = Max + 1;
+
+ // No manifest if the value is same as default after clamp.
+ if (Lower == Min && Upper == Max + 1)
+ return ChangeStatus::UNCHANGED;
+
+ Function *F = getAssociatedFunction();
+ LLVMContext &Ctx = F->getContext();
+ SmallString<10> Buffer;
+ raw_svector_ostream OS(Buffer);
+ OS << Lower << ',' << Upper - 1;
+ return A.manifestAttrs(getIRPosition(),
+ {Attribute::get(Ctx, AttrName, OS.str())},
+ /* ForceReplace=*/true);
+ }
+
ChangeStatus emitAttributeIfNotDefault(Attributor &A, unsigned Min,
unsigned Max) {
// Don't add the attribute if it's the implied default.
@@ -767,11 +805,17 @@ struct AAAMDFlatWorkGroupSize : public AAAMDSizeRangeAttribute {
void initialize(Attributor &A) override {
Function *F = getAssociatedFunction();
+
auto &InfoCache = static_cast<AMDGPUInformationCache &>(A.getInfoCache());
- unsigned MinGroupSize, MaxGroupSize;
- std::tie(MinGroupSize, MaxGroupSize) = InfoCache.getFlatWorkGroupSizes(*F);
- intersectKnown(
- ConstantRange(APInt(32, MinGroupSize), APInt(32, MaxGroupSize + 1)));
+ std::optional<std::pair<unsigned, unsigned>> Attr =
+ InfoCache.getFlatWorkGroupSizeAttr(*F);
+
+ if (Attr.has_value()) {
+ auto [Min, Max] = *Attr;
+ intersectKnown(ConstantRange(APInt(32, Min), APInt(32, Max + 1)));
+ indicatePessimisticFixpoint();
+ return;
+ }
if (AMDGPU::isEntryFunctionCC(F->getCallingConv()))
indicatePessimisticFixpoint();
@@ -788,9 +832,8 @@ struct AAAMDFlatWorkGroupSize : public AAAMDSizeRangeAttribute {
ChangeStatus manifest(Attributor &A) override {
Function *F = getAssociatedFunction();
auto &InfoCache = static_cast<AMDGPUInformationCache &>(A.getInfoCache());
- unsigned Min, Max;
- std::tie(Min, Max) = InfoCache.getMaximumFlatWorkGroupRange(*F);
- return emitAttributeIfNotDefault(A, Min, Max);
+ auto [Min, Max] = InfoCache.getMaximumFlatWorkGroupRange(*F);
+ return emitAttributeIfNotDefaultAfterClamp(A, {Min, Max});
}
/// See AbstractAttribute::getName()
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test changes?
19e0cce
to
f2314c9
Compare
AAAMDFlatWorkGroupSize
honor existing attributeAAAMDFlatWorkGroupSize
honor existing attribute
642636d
to
f1aea50
Compare
c35f519
to
2803f73
Compare
; NOINFS: attributes #0 = { nounwind "amdgpu-waves-per-eu"="4,10" "no-infs-fp-math"="true" "uniform-work-group-size"="false" } | ||
; NOINFS: attributes #1 = { nounwind "less-precise-fpmad"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="false" "uniform-work-group-size"="false" "unsafe-fp-math"="false" } | ||
; NOINFS: attributes #0 = { nounwind "no-infs-fp-math"="true" "uniform-work-group-size"="false" "unsafe-fp-math"="true" } | ||
; NOINFS: attributes #1 = { nounwind "less-precise-fpmad"="false" "no-infs-fp-math"="true" "no-nans-fp-math"="false" "uniform-work-group-size"="false" "unsafe-fp-math"="true" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this change "unsafe-fp-math"="true"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it should not and I don't see why this is changed...
fe201bc
to
b6b851f
Compare
bump |
LGTM |
@choikwa you might forget to accept it. |
43322a4
to
2d1641d
Compare
ping, I prefer not to update tests again and again... |
2d1641d
to
d949823
Compare
Merge activity
|
181386d
to
768a316
Compare
…ribute If a function has `amdgpu-flat-work-group-size`, honor it in `initialize` by taking its value directly; otherwise, it uses the default range as a starting point. We will no longer manipulate the known range, which can cause issues because the known range is a "throttle" to the assumed range such that the assumed range can't get widened properly in `updateImpl` if the known range is not set properly for whatever reasons. Another benefit of not touching the known range is, if we indicate pessimistic state, it also invalidates the AA such that `manifest` will not be called. Since we honor the attribute, we don't want and will not add any half-baked attribute added to a function.
768a316
to
a062b29
Compare
auto R = AMDGPU::getIntegerPairAttribute(F, "amdgpu-flat-work-group-size"); | ||
if (!R) | ||
return std::nullopt; | ||
return std::make_pair(R->first, *(R->second)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra parens?
/// \returns \p std::nullopt and emits error if one of the requested values | ||
/// cannot be converted to integer, or \p OnlyFirstRequired is false and | ||
/// "second" value is not present. | ||
std::optional<std::pair<unsigned, std::optional<unsigned>>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why the second entry is optional. This is spreading a parsing detail to users of the value. Users should be able to just treat it as-if it always has 2 components, the one entry forms are just a compatibility thing with old IR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our front end can still emit "X,"
. I think we could potentially fix this in the front end and then remove the 2nd optional entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should never happen. But as a parsing function, that should be abstracted from the attributor's usage
Looks like this PR resulted in about a 2x slowdown on MI250 for the most expensive kernel in OpenMC, resulting in an overall 20% performance degradation. Any ideas what might have caused it? |
You might have device-libs that still come with half-baked values. To verify it, you could use |
Ah, interesting! Which file(s) should I be converting to |
Just rebuild the device libraries. They should be version locked to the llvm commit you are using |
Here's what my current cmake line looks like for LLVM:
What would I need to change to specifically rebuild the device libraries, or am I already doing that? |
@jtramm You are building vanilla LLVM but the device-libs are in AMD's fork. You can simply use the LLVM you build to build device-libs. Here is the command I usually use to build device-libs:
|
I don't understand -- are you saying I cannot use vanilla LLVM to compile for AMD anymore, that I need AMD's fork now instead? |
No, vanilla LLVM doesn't have device-libs. That is only available in AMD's fork. For this particular case, you will have to build device-libs because the device-libs in your system have half-baked values that can cause performance regression. That's why I suggested to simply just remove the attributes from those You can still build vanilla LLVM, and use that to build device-libs. The only thing you want out of AMD's fork is the device-libs, which sits in To make it simple, can you first verify if |
The rocm install on this cluster seems to be in a different location, but the
|
Ah, it looks like the |
Can you disassemble the |
This should replaces llvm#114357. The problem this resolves is that AAAMDWavesPerEU used "assumed" information from AAAMDFlatWorkGroupSize to derive "known" information. This is generally not valid. The cannonical way is to use "assumed"/"known" information to improve "assumed"/"known" information, respectively. What happend before was that the new "known" information (derived from "assumed") was manifested even though it should not have been. That is how we ended up with external functions that had "amdgpu-waves-per-eu" set. This won't happen if we only set the assumed information as the invalidation, caused by unknown callers, will fallback to the "unset" known information. In addition to the above, we will identify invalid waves-per-eu values, emit a warning, and replace the invalid values with valid ones. This happens also for kernels. See `llvm/test/CodeGen/AMDGPU/propagate-waves-per-eu.ll` This should hopefully fix llvm#123092.
If a function has
amdgpu-flat-work-group-size
, honor it ininitialize
bytaking its value directly; otherwise, it uses the default range as a starting
point. We will no longer manipulate the known range, which can cause issues
because the known range is a "throttle" to the assumed range such that the
assumed range can't get widened properly in
updateImpl
if the known range isnot set properly for whatever reasons. Another benefit of not touching the known
range is, if we indicate pessimistic state, it also invalidates the AA such that
manifest
will not be called. Since we honor the attribute, we don't want andwill not add any half-baked attribute added to a function.