-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[AMDGPU][SplitModule] Cleanup CallsExternal Handling #106528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-amdgpu Author: Pierre van Houtryve (Pierre-vh) ChangesThe CallGraph quickly gives up whenever a call isn't obvious. This caused us to misidentify inline assembly and calls to aliases as indirect calls. Add a fallback that iterates over the function to fix whatever has to be fixed if the CG isn't certain. Full diff: https://github.com/llvm/llvm-project/pull/106528.diff 4 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp b/llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp
index ca371a883b897e..91f99b85b24efc 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp
@@ -103,6 +103,15 @@ static cl::opt<bool> NoExternalizeGlobals(
cl::desc("disables externalization of global variable with local linkage; "
"may cause globals to be duplicated which increases binary size"));
+static cl::opt<bool> NoExternalizeOnAddrTaken(
+ "amdgpu-module-splitting-no-externalize-address-taken", cl::Hidden,
+ cl::desc(
+ "disables externalization of functions whose addresses are taken"));
+
+static cl::opt<bool> InlineAsmIsIndirectCall(
+ "amdgpu-module-splitting-inline-asm-is-indirect-call", cl::Hidden,
+ cl::desc("consider inline assembly as na indirect call"));
+
static cl::opt<std::string>
ModuleDotCfgOutput("amdgpu-module-splitting-print-module-dotcfg",
cl::Hidden,
@@ -501,6 +510,9 @@ void SplitGraph::buildGraph(CallGraph &CG) {
dbgs()
<< "[build graph] constructing graph representation of the input\n");
+ // FIXME(?): Is the callgraph really worth using if we have to iterate the
+ // function again whenever it fails to give us enough information?
+
// We build the graph by just iterating all functions in the module and
// working on their direct callees. At the end, all nodes should be linked
// together as expected.
@@ -511,29 +523,68 @@ void SplitGraph::buildGraph(CallGraph &CG) {
continue;
// Look at direct callees and create the necessary edges in the graph.
- bool HasIndirectCall = false;
- Node &N = getNode(Cache, Fn);
+ SetVector<const Function *> DirectCallees;
+ bool CallsExternal = false;
for (auto &CGEntry : *CG[&Fn]) {
auto *CGNode = CGEntry.second;
- auto *Callee = CGNode->getFunction();
- if (!Callee) {
- // TODO: Don't consider inline assembly as indirect calls.
- if (CGNode == CG.getCallsExternalNode())
- HasIndirectCall = true;
- continue;
- }
-
- if (!Callee->isDeclaration())
- createEdge(N, getNode(Cache, *Callee), EdgeKind::DirectCall);
+ if (auto *Callee = CGNode->getFunction()) {
+ if (!Callee->isDeclaration())
+ DirectCallees.insert(Callee);
+ } else if (CGNode == CG.getCallsExternalNode())
+ CallsExternal = true;
}
// Keep track of this function if it contains an indirect call and/or if it
// can be indirectly called.
- if (HasIndirectCall) {
- LLVM_DEBUG(dbgs() << "indirect call found in " << Fn.getName() << "\n");
- FnsWithIndirectCalls.push_back(&Fn);
+ if (CallsExternal) {
+ LLVM_DEBUG(dbgs() << " [!] callgraph is incomplete for " << Fn.getName()
+ << " - analyzing function\n");
+
+ bool HasIndirectCall = false;
+ for (const auto &BB : Fn) {
+ for (const auto &Inst : BB) {
+ // look at all calls without a direct callee.
+ if (const auto *CB = dyn_cast<CallBase>(&Inst);
+ CB && !CB->getCalledFunction()) {
+ // inline assembly can be ignored, unless InlineAsmIsIndirectCall is
+ // true.
+ if (CB->isInlineAsm()) {
+ if (InlineAsmIsIndirectCall)
+ HasIndirectCall = true;
+ LLVM_DEBUG(dbgs() << " found inline assembly\n");
+ continue;
+ }
+
+ // see through calls of aliases
+ const Value *CalledV = CB->getCalledOperand();
+ if (isa<GlobalAlias>(CalledV)) {
+ if (const auto *RealFn = dyn_cast<Function>(
+ CalledV->stripPointerCastsAndAliases());
+ RealFn && !RealFn->isDeclaration()) {
+ LLVM_DEBUG(dbgs()
+ << " resolved call to " << RealFn->getName()
+ << " in: " << Inst << '\n');
+ DirectCallees.insert(RealFn);
+ continue;
+ }
+ }
+
+ // everything else is handled conservatively.
+ HasIndirectCall = true;
+ }
+ }
+ }
+
+ if (HasIndirectCall) {
+ LLVM_DEBUG(dbgs() << " indirect call found\n");
+ FnsWithIndirectCalls.push_back(&Fn);
+ }
}
+ Node &N = getNode(Cache, Fn);
+ for (const auto *Callee : DirectCallees)
+ createEdge(N, getNode(Cache, *Callee), EdgeKind::DirectCall);
+
if (canBeIndirectlyCalled(Fn))
IndirectlyCallableFns.push_back(&Fn);
}
@@ -1337,13 +1388,23 @@ static void splitAMDGPUModule(
//
// Additionally, it guides partitioning to not duplicate this function if it's
// called directly at some point.
- for (auto &Fn : M) {
- if (Fn.hasAddressTaken()) {
- if (Fn.hasLocalLinkage()) {
- LLVM_DEBUG(dbgs() << "[externalize] " << Fn.getName()
- << " because its address is taken\n");
+ //
+ // TODO: Could we be smarter about this ? This makes all functions whose
+ // addresses are taken non-copyable. We should probably model this type of
+ // constraint in the graph and use it to guide splitting, instead of
+ // externalizing like this. Maybe non-copyable should really mean "keep one
+ // visible copy, then internalize all other copies" for some functions?
+ if (!NoExternalizeOnAddrTaken) {
+ for (auto &Fn : M) {
+ // TODO: Should aliases count? Probably not but they're so rare I'm not
+ // sure it's worth fixing.
+ if (Fn.hasAddressTaken()) {
+ if (Fn.hasLocalLinkage()) {
+ LLVM_DEBUG(dbgs() << "[externalize] " << Fn.getName()
+ << " because its address is taken\n");
+ }
+ externalize(Fn);
}
- externalize(Fn);
}
}
@@ -1379,7 +1440,8 @@ static void splitAMDGPUModule(
dbgs() << "[graph] nodes:\n";
for (const SplitGraph::Node *N : SG.nodes()) {
dbgs() << " - [" << N->getID() << "]: " << N->getName() << " "
- << (N->isGraphEntryPoint() ? "(entry)" : "") << "\n";
+ << (N->isGraphEntryPoint() ? "(entry)" : "") << " "
+ << (N->isNonCopyable() ? "(noncopyable)" : "") << "\n";
}
});
diff --git a/llvm/test/tools/llvm-split/AMDGPU/indirect-call-inline-asm.ll b/llvm/test/tools/llvm-split/AMDGPU/indirect-call-inline-asm.ll
new file mode 100644
index 00000000000000..5c7ddaeecd7044
--- /dev/null
+++ b/llvm/test/tools/llvm-split/AMDGPU/indirect-call-inline-asm.ll
@@ -0,0 +1,41 @@
+; RUN: llvm-split -o %t %s -j 2 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-no-externalize-address-taken
+; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s
+; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s
+
+; RUN: llvm-split -o %t_as_indirect %s -j 2 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-no-externalize-address-taken -amdgpu-module-splitting-inline-asm-is-indirect-call
+; RUN: llvm-dis -o - %t_as_indirect0 | FileCheck --check-prefix=CHECK-INDIRECT0 --implicit-check-not=define %s
+; RUN: llvm-dis -o - %t_as_indirect1 | FileCheck --check-prefix=CHECK-INDIRECT1 --implicit-check-not=define %s
+
+; CHECK0: define internal void @HelperB
+; CHECK0: define amdgpu_kernel void @B
+
+; CHECK1: define internal void @HelperA()
+; CHECK1: define amdgpu_kernel void @A()
+
+; CHECK-INDIRECT0: define internal void @HelperB
+; CHECK-INDIRECT0: define amdgpu_kernel void @B
+
+; CHECK-INDIRECT1: define internal void @HelperA()
+; CHECK-INDIRECT1: define internal void @HelperB()
+; CHECK-INDIRECT1: define amdgpu_kernel void @A()
+
+@addrthief = global [2 x ptr] [ptr @HelperA, ptr @HelperB]
+
+define internal void @HelperA() {
+ ret void
+}
+
+define internal void @HelperB() {
+ ret void
+}
+
+define amdgpu_kernel void @A() {
+ call void asm sideeffect "v_mov_b32 v0, 7", "~{v0}"()
+ call void @HelperA()
+ ret void
+}
+
+define amdgpu_kernel void @B(ptr %out) {
+ call void @HelperB()
+ ret void
+}
diff --git a/llvm/test/tools/llvm-split/AMDGPU/kernels-alias-dependencies-debug.ll b/llvm/test/tools/llvm-split/AMDGPU/kernels-alias-dependencies-debug.ll
new file mode 100644
index 00000000000000..0bfa9150d9e90e
--- /dev/null
+++ b/llvm/test/tools/llvm-split/AMDGPU/kernels-alias-dependencies-debug.ll
@@ -0,0 +1,22 @@
+; REQUIRES: asserts
+
+; RUN: llvm-split -o %t %s -j 2 -mtriple amdgcn-amd-amdhsa -debug -amdgpu-module-splitting-no-externalize-address-taken 2>&1 | FileCheck %s
+
+; CHECK: [!] callgraph is incomplete for A - analyzing function
+; CHECK-NEXT: resolved call to PerryThePlatypus in: call void @Perry()
+
+@Perry = internal alias ptr(), ptr @PerryThePlatypus
+
+define internal void @PerryThePlatypus() {
+ ret void
+}
+
+define amdgpu_kernel void @A() {
+ call void @Perry()
+ ret void
+}
+
+define amdgpu_kernel void @B() {
+ call void @PerryThePlatypus()
+ ret void
+}
diff --git a/llvm/test/tools/llvm-split/AMDGPU/kernels-alias-dependencies.ll b/llvm/test/tools/llvm-split/AMDGPU/kernels-alias-dependencies.ll
index d7e84abd5f968d..ed160cceaf8f28 100644
--- a/llvm/test/tools/llvm-split/AMDGPU/kernels-alias-dependencies.ll
+++ b/llvm/test/tools/llvm-split/AMDGPU/kernels-alias-dependencies.ll
@@ -1,4 +1,4 @@
-; RUN: llvm-split -o %t %s -j 2 -mtriple amdgcn-amd-amdhsa
+; RUN: llvm-split -o %t %s -j 2 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-no-externalize-address-taken
; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s
; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s
@@ -7,18 +7,14 @@
; - B calls @PerryThePlatypus
; - C calls @Perry, an alias of @PerryThePlatypus
;
-; We should see through the alias and put B/C in the same
-; partition.
-;
-; Additionally, @PerryThePlatypus gets externalized as
-; the alias counts as taking its address.
+; We should see through the alias and treat C as-if it called PerryThePlatypus directly.
-; CHECK0: define amdgpu_kernel void @A
+; CHECK0: define internal void @PerryThePlatypus()
+; CHECK0: define amdgpu_kernel void @C
-; CHECK1: @Perry = internal alias ptr (), ptr @PerryThePlatypus
-; CHECK1: define hidden void @PerryThePlatypus()
+; CHECK1: define internal void @PerryThePlatypus()
+; CHECK1: define amdgpu_kernel void @A
; CHECK1: define amdgpu_kernel void @B
-; CHECK1: define amdgpu_kernel void @C
@Perry = internal alias ptr(), ptr @PerryThePlatypus
|
Would it make sense to replace the alias uses by the aliased global instead ? Such that all references (or at least the calls) inside a module point directly to the actual global and not to the alias. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other optimizations already redirect calls to aliases to direct calls, so I don't think this pass should have to put in the effort to also do the same. I think GlobalOpt?
static cl::opt<bool> NoExternalizeOnAddrTaken( | ||
"amdgpu-module-splitting-no-externalize-address-taken", cl::Hidden, | ||
cl::desc( | ||
"disables externalization of functions whose addresses are taken")); | ||
|
||
static cl::opt<bool> InlineAsmIsIndirectCall( | ||
"amdgpu-module-splitting-inline-asm-is-indirect-call", cl::Hidden, | ||
cl::desc("consider inline assembly as na indirect call")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nobody will ever use these flags, I would prefer to not add them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're useful for debugging, some of the tests I added use them.
I don't mind making them debug only if you prefer
Now that I remember, this is indeed an artificial case. EDIT: I checked and globalopt indeed eliminates aliases whenever possible |
I'd rather just let global opt deal with it. It probably doesn't run at -O0. Though I guess in this instance it isn't really about optimization |
e3eebbd
to
d347a80
Compare
612192e
to
3668534
Compare
ping |
} | ||
|
||
// everything else is handled conservatively. | ||
HasIndirectCall = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can break at this point? Also can just turn this into a helper function to scan for indirect calls and use early return
4ab6a34
to
b5e322f
Compare
- Don't treat inline ASM as indirect calls - Remove alias testing, which was broken (only working by pure luck right now) and isn't needed anyway. GlobalOpt should take care of them for us.
b5e322f
to
c70db32
Compare
See #106528 to review the first commit. Handle the `!callees` metadata to further reduce the amount of indirect call cases that end up conservatively assuming that any indirectly callable function is a potential target.
- Don't treat inline ASM as indirect calls - Remove call to alias testing, which was broken (only working by pure luck right now) and isn't needed anyway. GlobalOpt should take care of them for us.
See llvm#106528 to review the first commit. Handle the `!callees` metadata to further reduce the amount of indirect call cases that end up conservatively assuming that any indirectly callable function is a potential target.
- Don't treat inline ASM as indirect calls - Remove call to alias testing, which was broken (only working by pure luck right now) and isn't needed anyway. GlobalOpt should take care of them for us. Change-Id: I7de442dd07b00198e9d01722a7e27601e399b540
Uh oh!
There was an error while loading. Please reload this page.