Skip to content

[AMDGPU][SplitModule] Cleanup CallsExternal Handling #106528

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 11, 2024

Conversation

Pierre-vh
Copy link
Contributor

@Pierre-vh Pierre-vh commented Aug 29, 2024

  • Don't treat inline ASM as indirect calls
  • Remove call to alias testing, which was broken (only working by pure luck right now) and isn't needed anyway. GlobalOpt should take care of them for us.

@llvmbot
Copy link
Member

llvmbot commented Aug 29, 2024

@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)

Changes

The CallGraph quickly gives up whenever a call isn't obvious. This caused us to misidentify inline assembly and calls to aliases as indirect calls.

Add a fallback that iterates over the function to fix whatever has to be fixed if the CG isn't certain.


Full diff: https://github.com/llvm/llvm-project/pull/106528.diff

4 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp (+84-22)
  • (added) llvm/test/tools/llvm-split/AMDGPU/indirect-call-inline-asm.ll (+41)
  • (added) llvm/test/tools/llvm-split/AMDGPU/kernels-alias-dependencies-debug.ll (+22)
  • (modified) llvm/test/tools/llvm-split/AMDGPU/kernels-alias-dependencies.ll (+6-10)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp b/llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp
index ca371a883b897e..91f99b85b24efc 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp
@@ -103,6 +103,15 @@ static cl::opt<bool> NoExternalizeGlobals(
     cl::desc("disables externalization of global variable with local linkage; "
              "may cause globals to be duplicated which increases binary size"));
 
+static cl::opt<bool> NoExternalizeOnAddrTaken(
+    "amdgpu-module-splitting-no-externalize-address-taken", cl::Hidden,
+    cl::desc(
+        "disables externalization of functions whose addresses are taken"));
+
+static cl::opt<bool> InlineAsmIsIndirectCall(
+    "amdgpu-module-splitting-inline-asm-is-indirect-call", cl::Hidden,
+    cl::desc("consider inline assembly as na indirect call"));
+
 static cl::opt<std::string>
     ModuleDotCfgOutput("amdgpu-module-splitting-print-module-dotcfg",
                        cl::Hidden,
@@ -501,6 +510,9 @@ void SplitGraph::buildGraph(CallGraph &CG) {
       dbgs()
       << "[build graph] constructing graph representation of the input\n");
 
+  // FIXME(?): Is the callgraph really worth using if we have to iterate the
+  // function again whenever it fails to give us enough information?
+
   // We build the graph by just iterating all functions in the module and
   // working on their direct callees. At the end, all nodes should be linked
   // together as expected.
@@ -511,29 +523,68 @@ void SplitGraph::buildGraph(CallGraph &CG) {
       continue;
 
     // Look at direct callees and create the necessary edges in the graph.
-    bool HasIndirectCall = false;
-    Node &N = getNode(Cache, Fn);
+    SetVector<const Function *> DirectCallees;
+    bool CallsExternal = false;
     for (auto &CGEntry : *CG[&Fn]) {
       auto *CGNode = CGEntry.second;
-      auto *Callee = CGNode->getFunction();
-      if (!Callee) {
-        // TODO: Don't consider inline assembly as indirect calls.
-        if (CGNode == CG.getCallsExternalNode())
-          HasIndirectCall = true;
-        continue;
-      }
-
-      if (!Callee->isDeclaration())
-        createEdge(N, getNode(Cache, *Callee), EdgeKind::DirectCall);
+      if (auto *Callee = CGNode->getFunction()) {
+        if (!Callee->isDeclaration())
+          DirectCallees.insert(Callee);
+      } else if (CGNode == CG.getCallsExternalNode())
+        CallsExternal = true;
     }
 
     // Keep track of this function if it contains an indirect call and/or if it
     // can be indirectly called.
-    if (HasIndirectCall) {
-      LLVM_DEBUG(dbgs() << "indirect call found in " << Fn.getName() << "\n");
-      FnsWithIndirectCalls.push_back(&Fn);
+    if (CallsExternal) {
+      LLVM_DEBUG(dbgs() << "  [!] callgraph is incomplete for " << Fn.getName()
+                        << " - analyzing function\n");
+
+      bool HasIndirectCall = false;
+      for (const auto &BB : Fn) {
+        for (const auto &Inst : BB) {
+          // look at all calls without a direct callee.
+          if (const auto *CB = dyn_cast<CallBase>(&Inst);
+              CB && !CB->getCalledFunction()) {
+            // inline assembly can be ignored, unless InlineAsmIsIndirectCall is
+            // true.
+            if (CB->isInlineAsm()) {
+              if (InlineAsmIsIndirectCall)
+                HasIndirectCall = true;
+              LLVM_DEBUG(dbgs() << "    found inline assembly\n");
+              continue;
+            }
+
+            // see through calls of aliases
+            const Value *CalledV = CB->getCalledOperand();
+            if (isa<GlobalAlias>(CalledV)) {
+              if (const auto *RealFn = dyn_cast<Function>(
+                      CalledV->stripPointerCastsAndAliases());
+                  RealFn && !RealFn->isDeclaration()) {
+                LLVM_DEBUG(dbgs()
+                           << "    resolved call to " << RealFn->getName()
+                           << " in: " << Inst << '\n');
+                DirectCallees.insert(RealFn);
+                continue;
+              }
+            }
+
+            // everything else is handled conservatively.
+            HasIndirectCall = true;
+          }
+        }
+      }
+
+      if (HasIndirectCall) {
+        LLVM_DEBUG(dbgs() << "    indirect call found\n");
+        FnsWithIndirectCalls.push_back(&Fn);
+      }
     }
 
+    Node &N = getNode(Cache, Fn);
+    for (const auto *Callee : DirectCallees)
+      createEdge(N, getNode(Cache, *Callee), EdgeKind::DirectCall);
+
     if (canBeIndirectlyCalled(Fn))
       IndirectlyCallableFns.push_back(&Fn);
   }
@@ -1337,13 +1388,23 @@ static void splitAMDGPUModule(
   //
   // Additionally, it guides partitioning to not duplicate this function if it's
   // called directly at some point.
-  for (auto &Fn : M) {
-    if (Fn.hasAddressTaken()) {
-      if (Fn.hasLocalLinkage()) {
-        LLVM_DEBUG(dbgs() << "[externalize] " << Fn.getName()
-                          << " because its address is taken\n");
+  //
+  // TODO: Could we be smarter about this ? This makes all functions whose
+  // addresses are taken non-copyable. We should probably model this type of
+  // constraint in the graph and use it to guide splitting, instead of
+  // externalizing like this. Maybe non-copyable should really mean "keep one
+  // visible copy, then internalize all other copies" for some functions?
+  if (!NoExternalizeOnAddrTaken) {
+    for (auto &Fn : M) {
+      // TODO: Should aliases count? Probably not but they're so rare I'm not
+      // sure it's worth fixing.
+      if (Fn.hasAddressTaken()) {
+        if (Fn.hasLocalLinkage()) {
+          LLVM_DEBUG(dbgs() << "[externalize] " << Fn.getName()
+                            << " because its address is taken\n");
+        }
+        externalize(Fn);
       }
-      externalize(Fn);
     }
   }
 
@@ -1379,7 +1440,8 @@ static void splitAMDGPUModule(
     dbgs() << "[graph] nodes:\n";
     for (const SplitGraph::Node *N : SG.nodes()) {
       dbgs() << "  - [" << N->getID() << "]: " << N->getName() << " "
-             << (N->isGraphEntryPoint() ? "(entry)" : "") << "\n";
+             << (N->isGraphEntryPoint() ? "(entry)" : "") << " "
+             << (N->isNonCopyable() ? "(noncopyable)" : "") << "\n";
     }
   });
 
diff --git a/llvm/test/tools/llvm-split/AMDGPU/indirect-call-inline-asm.ll b/llvm/test/tools/llvm-split/AMDGPU/indirect-call-inline-asm.ll
new file mode 100644
index 00000000000000..5c7ddaeecd7044
--- /dev/null
+++ b/llvm/test/tools/llvm-split/AMDGPU/indirect-call-inline-asm.ll
@@ -0,0 +1,41 @@
+; RUN: llvm-split -o %t %s -j 2 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-no-externalize-address-taken
+; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s
+; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s
+
+; RUN: llvm-split -o %t_as_indirect %s -j 2 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-no-externalize-address-taken -amdgpu-module-splitting-inline-asm-is-indirect-call
+; RUN: llvm-dis -o - %t_as_indirect0 | FileCheck --check-prefix=CHECK-INDIRECT0 --implicit-check-not=define %s
+; RUN: llvm-dis -o - %t_as_indirect1 | FileCheck --check-prefix=CHECK-INDIRECT1 --implicit-check-not=define %s
+
+; CHECK0: define internal void @HelperB
+; CHECK0: define amdgpu_kernel void @B
+
+; CHECK1: define internal void @HelperA()
+; CHECK1: define amdgpu_kernel void @A()
+
+; CHECK-INDIRECT0: define internal void @HelperB
+; CHECK-INDIRECT0: define amdgpu_kernel void @B
+
+; CHECK-INDIRECT1: define internal void @HelperA()
+; CHECK-INDIRECT1: define internal void @HelperB()
+; CHECK-INDIRECT1: define amdgpu_kernel void @A()
+
+@addrthief = global [2 x ptr] [ptr @HelperA, ptr @HelperB]
+
+define internal void @HelperA() {
+  ret void
+}
+
+define internal void @HelperB() {
+  ret void
+}
+
+define amdgpu_kernel void @A() {
+  call void asm sideeffect "v_mov_b32 v0, 7", "~{v0}"()
+  call void @HelperA()
+  ret void
+}
+
+define amdgpu_kernel void @B(ptr %out) {
+  call void @HelperB()
+  ret void
+}
diff --git a/llvm/test/tools/llvm-split/AMDGPU/kernels-alias-dependencies-debug.ll b/llvm/test/tools/llvm-split/AMDGPU/kernels-alias-dependencies-debug.ll
new file mode 100644
index 00000000000000..0bfa9150d9e90e
--- /dev/null
+++ b/llvm/test/tools/llvm-split/AMDGPU/kernels-alias-dependencies-debug.ll
@@ -0,0 +1,22 @@
+; REQUIRES: asserts
+
+; RUN: llvm-split -o %t %s -j 2 -mtriple amdgcn-amd-amdhsa -debug -amdgpu-module-splitting-no-externalize-address-taken 2>&1 | FileCheck %s
+
+; CHECK:      [!] callgraph is incomplete for A - analyzing function
+; CHECK-NEXT:    resolved call to PerryThePlatypus in:   call void @Perry()
+
+@Perry = internal alias ptr(), ptr @PerryThePlatypus
+
+define internal void @PerryThePlatypus() {
+  ret void
+}
+
+define amdgpu_kernel void @A() {
+  call void @Perry()
+  ret void
+}
+
+define amdgpu_kernel void @B() {
+  call void @PerryThePlatypus()
+  ret void
+}
diff --git a/llvm/test/tools/llvm-split/AMDGPU/kernels-alias-dependencies.ll b/llvm/test/tools/llvm-split/AMDGPU/kernels-alias-dependencies.ll
index d7e84abd5f968d..ed160cceaf8f28 100644
--- a/llvm/test/tools/llvm-split/AMDGPU/kernels-alias-dependencies.ll
+++ b/llvm/test/tools/llvm-split/AMDGPU/kernels-alias-dependencies.ll
@@ -1,4 +1,4 @@
-; RUN: llvm-split -o %t %s -j 2 -mtriple amdgcn-amd-amdhsa
+; RUN: llvm-split -o %t %s -j 2 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-no-externalize-address-taken
 ; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s
 ; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s
 
@@ -7,18 +7,14 @@
 ;   - B calls @PerryThePlatypus
 ;   - C calls @Perry, an alias of @PerryThePlatypus
 ;
-; We should see through the alias and put B/C in the same
-; partition.
-;
-; Additionally, @PerryThePlatypus gets externalized as
-; the alias counts as taking its address.
+; We should see through the alias and treat C as-if it called PerryThePlatypus directly.
 
-; CHECK0: define amdgpu_kernel void @A
+; CHECK0: define internal void @PerryThePlatypus()
+; CHECK0: define amdgpu_kernel void @C
 
-; CHECK1: @Perry = internal alias ptr (), ptr @PerryThePlatypus
-; CHECK1: define hidden void @PerryThePlatypus()
+; CHECK1: define internal void @PerryThePlatypus()
+; CHECK1: define amdgpu_kernel void @A
 ; CHECK1: define amdgpu_kernel void @B
-; CHECK1: define amdgpu_kernel void @C
 
 @Perry = internal alias ptr(), ptr @PerryThePlatypus
 

@jmmartinez
Copy link
Contributor

Would it make sense to replace the alias uses by the aliased global instead ? Such that all references (or at least the calls) inside a module point directly to the actual global and not to the alias.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other optimizations already redirect calls to aliases to direct calls, so I don't think this pass should have to put in the effort to also do the same. I think GlobalOpt?

Comment on lines 106 to 113
static cl::opt<bool> NoExternalizeOnAddrTaken(
"amdgpu-module-splitting-no-externalize-address-taken", cl::Hidden,
cl::desc(
"disables externalization of functions whose addresses are taken"));

static cl::opt<bool> InlineAsmIsIndirectCall(
"amdgpu-module-splitting-inline-asm-is-indirect-call", cl::Hidden,
cl::desc("consider inline assembly as na indirect call"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nobody will ever use these flags, I would prefer to not add them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're useful for debugging, some of the tests I added use them.
I don't mind making them debug only if you prefer

@Pierre-vh
Copy link
Contributor Author

Pierre-vh commented Sep 3, 2024

Other optimizations already redirect calls to aliases to direct calls, so I don't think this pass should have to put in the effort to also do the same. I think GlobalOpt?

Now that I remember, this is indeed an artificial case.
I'll check if just adding an opt run line also does the same thing, if it does, then I'll just eliminate that part of the alias handling.

EDIT: I checked and globalopt indeed eliminates aliases whenever possible
Though I'm debating whether it's worth removing. It's just a few lines of code and it makes the pass more correct in case globalopt didn't run for some reason (not sure if it runs even at O0?), but it's also redundant code which isn't great

@arsenm
Copy link
Contributor

arsenm commented Sep 4, 2024

Though I'm debating whether it's worth removing. It's just a few lines of code and it makes the pass more correct in case globalopt didn't run for some reason (not sure if it runs even at O0?), but it's also redundant code which isn't great

I'd rather just let global opt deal with it. It probably doesn't run at -O0. Though I guess in this instance it isn't really about optimization

@Pierre-vh Pierre-vh force-pushed the indirect-calls-handling branch from 612192e to 3668534 Compare September 16, 2024 08:18
@Pierre-vh Pierre-vh changed the title [AMDGPU] SplitModule: Better handling of aliases & inline assembly [AMDGPU][SplitModule] Cleanup CallsExternal Handling Sep 16, 2024
@Pierre-vh Pierre-vh requested a review from arsenm September 16, 2024 08:18
@Pierre-vh
Copy link
Contributor Author

ping

@Pierre-vh Pierre-vh requested a review from arsenm September 25, 2024 07:24
}

// everything else is handled conservatively.
HasIndirectCall = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can break at this point? Also can just turn this into a helper function to scan for indirect calls and use early return

@Pierre-vh Pierre-vh force-pushed the indirect-calls-handling branch from 4ab6a34 to b5e322f Compare October 7, 2024 05:45
@Pierre-vh Pierre-vh requested a review from arsenm October 7, 2024 05:47
- Don't treat inline ASM as indirect calls
- Remove alias testing, which was broken (only working by pure luck right now) and isn't needed anyway. GlobalOpt should take care of them for us.
@Pierre-vh Pierre-vh force-pushed the indirect-calls-handling branch from b5e322f to c70db32 Compare October 11, 2024 06:13
@Pierre-vh Pierre-vh merged commit d656b20 into llvm:main Oct 11, 2024
5 of 8 checks passed
@Pierre-vh Pierre-vh deleted the indirect-calls-handling branch October 11, 2024 06:37
Pierre-vh added a commit that referenced this pull request Oct 14, 2024
See #106528 to review the first commit.

Handle the `!callees` metadata to further reduce the amount of indirect
call cases that end up conservatively assuming that any indirectly
callable function is a potential target.
DanielCChen pushed a commit to DanielCChen/llvm-project that referenced this pull request Oct 16, 2024
- Don't treat inline ASM as indirect calls
- Remove call to alias testing, which was broken (only working by pure
luck right now) and isn't needed anyway. GlobalOpt should take care of
them for us.
DanielCChen pushed a commit to DanielCChen/llvm-project that referenced this pull request Oct 16, 2024
See llvm#106528 to review the first commit.

Handle the `!callees` metadata to further reduce the amount of indirect
call cases that end up conservatively assuming that any indirectly
callable function is a potential target.
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Dec 5, 2024
- Don't treat inline ASM as indirect calls
- Remove call to alias testing, which was broken (only working by pure
luck right now) and isn't needed anyway. GlobalOpt should take care of
them for us.

Change-Id: I7de442dd07b00198e9d01722a7e27601e399b540
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants