[AMDGPU][SplitModule] Do not create empty modules #135761

Pierre-vh · 2025-04-15T08:39:23Z

Skip creating a module if no function is going to be imported.
Also includes a change so that if the first partition is empty (which can happen),
we import global with non-local linkage into the first non-empty partition, instead
of P0 all the time.

I thought we'd need to change users of the SplitModule callback so they can deal with less modules
than the number requested, but no. We already return only 1 module in some cases and
it seems to be handled just fine.

Fixes SWDEV-523146

Pierre-vh · 2025-04-15T08:39:39Z

[AMDGPU][SplitModule] Do not create empty modules #135761 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-04-15T08:41:11Z

@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)

Changes

Skip creating a module if no function is going to be imported.
Also includes a change so that if the first partition is empty (which can happen),
we import global with non-local linkage into the first non-empty partition, instead
of P0 all the time.

I thought we'd need to change users of the SplitModule callback so they can deal with less modules
than the number requested, but no. We already return only 1 module in some cases and
it seems to be handled just fine.

Fixes SWDEV-523146

Full diff: https://github.com/llvm/llvm-project/pull/135761.diff

6 Files Affected:

(modified) llvm/include/llvm/Target/TargetMachine.h (+3)
(modified) llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp (+14-1)
(modified) llvm/test/tools/llvm-split/AMDGPU/address-taken-externalize-with-call.ll (+4-8)
(modified) llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging-weak_odr.ll (+10-12)
(modified) llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging.ll (+10-12)
(added) llvm/test/tools/llvm-split/AMDGPU/preserve-globals.ll (+21)

diff --git a/llvm/include/llvm/Target/TargetMachine.h b/llvm/include/llvm/Target/TargetMachine.h
index 5a16c9cafcd7a..566e7dba6792b 100644
--- a/llvm/include/llvm/Target/TargetMachine.h
+++ b/llvm/include/llvm/Target/TargetMachine.h
@@ -451,6 +451,9 @@ class TargetMachine {
   /// Entry point for module splitting. Targets can implement custom module
   /// splitting logic, mainly used by LTO for --lto-partitions.
   ///
+  /// On success, this guarantees that between 1 and \p NumParts modules were
+  /// created and passed to \p ModuleCallBack.
+  ///
   /// \returns `true` if the module was split, `false` otherwise. When  `false`
   /// is returned, it is assumed that \p ModuleCallback has never been called
   /// and \p M has not been modified.
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp b/llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp
index dd3bec774ec67..b7b9814b93427 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp
@@ -1478,6 +1478,10 @@ static void splitAMDGPUModule(
              << "' - Partition summaries will not be printed\n";
   }
 
+  // One module will import all GlobalValues that are not Functions
+  // and are not subject to conservative import.
+  bool ImportAllGVs = true;
+
   for (unsigned PID = 0; PID < NumParts; ++PID) {
     SplitModuleTimer SMT2("modules_creation",
                           "creating modules for each partition");
@@ -1487,6 +1491,13 @@ static void splitAMDGPUModule(
     for (unsigned NodeID : (*Proposal)[PID].set_bits())
       FnsInPart.insert(&SG.getNode(NodeID).getFunction());
 
+    // Don't create empty module, except for PID 0 because
+    if (FnsInPart.empty()) {
+      LLVM_DEBUG(dbgs() << "[split] P" << PID
+                        << " is empty, not creating module\n");
+      continue;
+    }
+
     ValueToValueMapTy VMap;
     CostType PartCost = 0;
     std::unique_ptr<Module> MPart(
@@ -1501,9 +1512,11 @@ static void splitAMDGPUModule(
           }
 
           // Everything else goes in the first partition.
-          return needsConservativeImport(GV) || PID == 0;
+          return needsConservativeImport(GV) || ImportAllGVs;
         }));
 
+    ImportAllGVs = false;
+
     // FIXME: Aliases aren't seen often, and their handling isn't perfect so
     // bugs are possible.
 
diff --git a/llvm/test/tools/llvm-split/AMDGPU/address-taken-externalize-with-call.ll b/llvm/test/tools/llvm-split/AMDGPU/address-taken-externalize-with-call.ll
index 708b5a006be60..0f20a73b1928e 100644
--- a/llvm/test/tools/llvm-split/AMDGPU/address-taken-externalize-with-call.ll
+++ b/llvm/test/tools/llvm-split/AMDGPU/address-taken-externalize-with-call.ll
@@ -1,7 +1,6 @@
 ; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-large-threshold=0
 ; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s
 ; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s
-; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 --implicit-check-not=define %s
 
 ; 3 kernels:
 ;   - A does a direct call to HelperA
@@ -11,14 +10,11 @@
 ; The helper functions will get externalized, so C/A will end up
 ; in the same partition.
 
-; P0 is empty.
-; CHECK0: declare
+; CHECK0: define amdgpu_kernel void @B(ptr %dst)
 
-; CHECK1: define amdgpu_kernel void @B(ptr %dst)
-
-; CHECK2: define hidden void @HelperA()
-; CHECK2: define amdgpu_kernel void @A()
-; CHECK2: define amdgpu_kernel void @C()
+; CHECK1: define hidden void @HelperA()
+; CHECK1: define amdgpu_kernel void @A()
+; CHECK1: define amdgpu_kernel void @C()
 
 define internal void @HelperA() {
   ret void
diff --git a/llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging-weak_odr.ll b/llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging-weak_odr.ll
index 839688e7feb8b..567275686fb9f 100644
--- a/llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging-weak_odr.ll
+++ b/llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging-weak_odr.ll
@@ -1,7 +1,6 @@
 ; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-max-depth=0 -amdgpu-module-splitting-large-threshold=1.2 -amdgpu-module-splitting-merge-threshold=0.5
 ; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s
 ; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s
-; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 --implicit-check-not=define %s
 
 ; RUN: llvm-split -o %t.nolarge %s -j 3 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-large-threshold=0 -amdgpu-module-splitting-max-depth=0
 ; RUN: llvm-dis -o - %t.nolarge0 | FileCheck --check-prefix=NOLARGEKERNELS-CHECK0 --implicit-check-not=define %s
@@ -15,19 +14,18 @@
 ; Also check w/o large kernels processing to verify they are indeed handled
 ; differently.
 
-; P0 is empty
-; CHECK0: declare
+; Only two partitions created for the first command.
 
-; CHECK1: define internal void @HelperC()
-; CHECK1: define weak_odr amdgpu_kernel void @C
+; CHECK0: define internal void @HelperC()
+; CHECK0: define weak_odr amdgpu_kernel void @C
 
-; CHECK2: define internal void @large2()
-; CHECK2: define internal void @large1()
-; CHECK2: define internal void @large0()
-; CHECK2: define internal void @HelperA()
-; CHECK2: define internal void @HelperB()
-; CHECK2: define amdgpu_kernel void @A
-; CHECK2: define weak_odr amdgpu_kernel void @B
+; CHECK1: define internal void @large2()
+; CHECK1: define internal void @large1()
+; CHECK1: define internal void @large0()
+; CHECK1: define internal void @HelperA()
+; CHECK1: define internal void @HelperB()
+; CHECK1: define amdgpu_kernel void @A
+; CHECK1: define weak_odr amdgpu_kernel void @B
 
 ; NOLARGEKERNELS-CHECK0: define internal void @HelperC()
 ; NOLARGEKERNELS-CHECK0: define weak_odr amdgpu_kernel void @C
diff --git a/llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging.ll b/llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging.ll
index 807fb2e5f33ce..35133d20c4e07 100644
--- a/llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging.ll
+++ b/llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging.ll
@@ -1,7 +1,6 @@
 ; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-max-depth=0 -amdgpu-module-splitting-large-threshold=1.2 -amdgpu-module-splitting-merge-threshold=0.5
 ; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s
 ; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s
-; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 --implicit-check-not=define %s
 
 ; RUN: llvm-split -o %t.nolarge %s -j 3 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-large-threshold=0 -amdgpu-module-splitting-max-depth=0
 ; RUN: llvm-dis -o - %t.nolarge0 | FileCheck --check-prefix=NOLARGEKERNELS-CHECK0 --implicit-check-not=define %s
@@ -15,19 +14,18 @@
 ; Also check w/o large kernels processing to verify they are indeed handled
 ; differently.
 
-; P0 is empty
-; CHECK0: declare
+; Only 2 partitions for the first command.
 
-; CHECK1: define internal void @HelperC()
-; CHECK1: define amdgpu_kernel void @C
+; CHECK0: define internal void @HelperC()
+; CHECK0: define amdgpu_kernel void @C
 
-; CHECK2: define internal void @large2()
-; CHECK2: define internal void @large1()
-; CHECK2: define internal void @large0()
-; CHECK2: define internal void @HelperA()
-; CHECK2: define internal void @HelperB()
-; CHECK2: define amdgpu_kernel void @A
-; CHECK2: define amdgpu_kernel void @B
+; CHECK1: define internal void @large2()
+; CHECK1: define internal void @large1()
+; CHECK1: define internal void @large0()
+; CHECK1: define internal void @HelperA()
+; CHECK1: define internal void @HelperB()
+; CHECK1: define amdgpu_kernel void @A
+; CHECK1: define amdgpu_kernel void @B
 
 ; NOLARGEKERNELS-CHECK0: define internal void @HelperC()
 ; NOLARGEKERNELS-CHECK0: define amdgpu_kernel void @C
diff --git a/llvm/test/tools/llvm-split/AMDGPU/preserve-globals.ll b/llvm/test/tools/llvm-split/AMDGPU/preserve-globals.ll
new file mode 100644
index 0000000000000..091010edd6be3
--- /dev/null
+++ b/llvm/test/tools/llvm-split/AMDGPU/preserve-globals.ll
@@ -0,0 +1,21 @@
+; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-max-depth=0 -amdgpu-module-splitting-large-threshold=1.2 -amdgpu-module-splitting-merge-threshold=0.5
+; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s
+; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s
+
+; Only 2 out of 3 partitions are created, check the external global is preserved in the first partition.
+
+; CHECK0: @foobar = linkonce_odr global i64 52
+; CHECK0: define amdgpu_kernel void @B
+
+; CHECK1-NOT: @foobar = linkonce_odr global i64 52
+; CHECK1: define amdgpu_kernel void @A
+
+@foobar = linkonce_odr global i64 52
+
+define amdgpu_kernel void @A() {
+  ret void
+}
+
+define amdgpu_kernel void @B() {
+  ret void
+}

llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp

Skip creating a module if no function is going to be imported. Also includes a change so that if the first partition is empty (which can happen), we import global with non-local linkage into the first non-empty partition, instead of P0 all the time. I thought we'd need to change users of the SplitModule callback so they can deal with less modules than the number requested, but no. We already return only 1 module in some cases and it seems to be handled just fine.

jmmartinez

Just a nitpick. Everything else looks good.

jmmartinez · 2025-04-23T08:56:12Z

llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp

    for (unsigned NodeID : (*Proposal)[PID].set_bits())
      FnsInPart.insert(&SG.getNode(NodeID).getFunction());

+    // Don't create empty modules.
+    if (FnsInPart.empty()) {
+      LLVM_DEBUG(dbgs() << "[split] P" << PID
+                        << " is empty, not creating module\n");
+      continue;
+    }
+


Suggested change

for (unsigned NodeID : (*Proposal)[PID].set_bits())

FnsInPart.insert(&SG.getNode(NodeID).getFunction());

// Don't create empty modules.

if (FnsInPart.empty()) {

LLVM_DEBUG(dbgs() << "[split] P" << PID

<< " is empty, not creating module\n");

continue;

}

// Don't create empty modules.

if ((*Proposal)[PID].none()) {

LLVM_DEBUG(dbgs() << "[split] P" << PID

<< " is empty, not creating module\n");

continue;

}

for (unsigned NodeID : (*Proposal)[PID].set_bits())

FnsInPart.insert(&SG.getNode(NodeID).getFunction());

llvm-ci · 2025-04-25T08:28:48Z

LLVM Buildbot has detected a new failure on builder lldb-remote-linux-win running on as-builder-10 while building llvm at step 17 "test-check-lldb-api".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/197/builds/4458

Here is the relevant piece of the build log for the reference

Step 17 (test-check-lldb-api) failure: Test just built components: check-lldb-api completed (failure)
******************** TEST 'lldb-api :: commands/watchpoints/hello_watchlocation/TestWatchLocation.py' FAILED ********************
Script:
--
C:/Python312/python.exe C:/buildbot/as-builder-10/lldb-x-aarch64/llvm-project/lldb\test\API\dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=C:/buildbot/as-builder-10/lldb-x-aarch64/build/./lib --env LLVM_INCLUDE_DIR=C:/buildbot/as-builder-10/lldb-x-aarch64/build/include --env LLVM_TOOLS_DIR=C:/buildbot/as-builder-10/lldb-x-aarch64/build/./bin --arch aarch64 --build-dir C:/buildbot/as-builder-10/lldb-x-aarch64/build/lldb-test-build.noindex --lldb-module-cache-dir C:/buildbot/as-builder-10/lldb-x-aarch64/build/lldb-test-build.noindex/module-cache-lldb\lldb-api --clang-module-cache-dir C:/buildbot/as-builder-10/lldb-x-aarch64/build/lldb-test-build.noindex/module-cache-clang\lldb-api --executable C:/buildbot/as-builder-10/lldb-x-aarch64/build/./bin/lldb.exe --compiler C:/buildbot/as-builder-10/lldb-x-aarch64/build/./bin/clang.exe --dsymutil C:/buildbot/as-builder-10/lldb-x-aarch64/build/./bin/dsymutil.exe --make C:/ninja/make.exe --llvm-tools-dir C:/buildbot/as-builder-10/lldb-x-aarch64/build/./bin --lldb-obj-root C:/buildbot/as-builder-10/lldb-x-aarch64/build/tools/lldb --lldb-libs-dir C:/buildbot/as-builder-10/lldb-x-aarch64/build/./lib --platform-url connect://jetson-agx-0086.lab.llvm.org:1234 --platform-working-dir /home/ubuntu/lldb-tests --sysroot c:/buildbot/fs/jetson-agx-ubuntu --env ARCH_CFLAGS=-mcpu=cortex-a78 --platform-name remote-linux --skip-category=lldb-server C:\buildbot\as-builder-10\lldb-x-aarch64\llvm-project\lldb\test\API\commands\watchpoints\hello_watchlocation -p TestWatchLocation.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 21.0.0git (https://github.com/llvm/llvm-project.git revision 2168455ef4cdbec9df3b63900fe9b316154187cf)
  clang revision 2168455ef4cdbec9df3b63900fe9b316154187cf
  llvm revision 2168455ef4cdbec9df3b63900fe9b316154187cf
Setting up remote platform 'remote-linux'

Connecting to remote platform 'remote-linux' at 'connect://jetson-agx-0086.lab.llvm.org:1234'...

Connected.

Setting remote platform working directory to '/home/ubuntu/lldb-tests'...

Skipping the following test categories: ['lldb-server', 'dsym', 'gmodules', 'debugserver', 'objc', 'lldb-dap']


--
Command Output (stderr):
--
FAIL: LLDB (C:\buildbot\as-builder-10\lldb-x-aarch64\build\bin\clang.exe-aarch64) :: test_hello_watchlocation (TestWatchLocation.HelloWatchLocationTestCase.test_hello_watchlocation)

======================================================================

FAIL: test_hello_watchlocation (TestWatchLocation.HelloWatchLocationTestCase.test_hello_watchlocation)

   Test watching a location with '-s size' option.

----------------------------------------------------------------------

Traceback (most recent call last):

  File "C:\buildbot\as-builder-10\lldb-x-aarch64\llvm-project\lldb\test\API\commands\watchpoints\hello_watchlocation\TestWatchLocation.py", line 62, in test_hello_watchlocation

    self.expect(

  File "C:\buildbot\as-builder-10\lldb-x-aarch64\llvm-project\lldb\packages\Python\lldbsuite\test\lldbtest.py", line 2406, in expect

    self.runCmd(

  File "C:\buildbot\as-builder-10\lldb-x-aarch64\llvm-project\lldb\packages\Python\lldbsuite\test\lldbtest.py", line 1005, in runCmd

    self.assertTrue(self.res.Succeeded(), msg + output)

...

Skip creating a module if no function is going to be imported. Also includes a change so that if the first partition is empty (which can happen), we import global with non-local linkage into the first non-empty partition, instead of P0 all the time. I thought we'd need to change users of the SplitModule callback so they can deal with less modules than the number requested, but no. We already return only 1 module in some cases and it seems to be handled just fine. Fixes SWDEV-523146

Pierre-vh requested review from arsenm, kazutakahirata, scchan, shiltian and teresajohnson April 15, 2025 08:40

Pierre-vh marked this pull request as ready for review April 15, 2025 08:40

llvmbot added the backend:AMDGPU label Apr 15, 2025

shiltian reviewed Apr 15, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp Outdated Show resolved Hide resolved

Pierre-vh force-pushed the users/pierre-vh/amdsplitmodule-skip-parts branch from d44f28b to c1a06b2 Compare April 16, 2025 09:49

arsenm reviewed Apr 17, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp Outdated Show resolved Hide resolved

Pierre-vh added 3 commits April 22, 2025 09:48

comments

c58d21d

comments

0f680c8

Pierre-vh force-pushed the users/pierre-vh/amdsplitmodule-skip-parts branch from c1a06b2 to 0f680c8 Compare April 22, 2025 07:48

jmmartinez approved these changes Apr 23, 2025

View reviewed changes

Pierre-vh merged commit 2168455 into main Apr 25, 2025
11 checks passed

Pierre-vh deleted the users/pierre-vh/amdsplitmodule-skip-parts branch April 25, 2025 07:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU][SplitModule] Do not create empty modules #135761

[AMDGPU][SplitModule] Do not create empty modules #135761

Uh oh!

Pierre-vh commented Apr 15, 2025 •

edited

Loading

Uh oh!

Pierre-vh commented Apr 15, 2025

Uh oh!

llvmbot commented Apr 15, 2025

Uh oh!

Uh oh!

Uh oh!

jmmartinez left a comment

Uh oh!

jmmartinez Apr 23, 2025

Uh oh!

Uh oh!

llvm-ci commented Apr 25, 2025

Uh oh!

Uh oh!

[AMDGPU][SplitModule] Do not create empty modules #135761

[AMDGPU][SplitModule] Do not create empty modules #135761

Uh oh!

Conversation

Pierre-vh commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Pierre-vh commented Apr 15, 2025

Uh oh!

llvmbot commented Apr 15, 2025

Uh oh!

Uh oh!

Uh oh!

jmmartinez left a comment

Choose a reason for hiding this comment

Uh oh!

jmmartinez Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Apr 25, 2025

Uh oh!

Uh oh!

Pierre-vh commented Apr 15, 2025 •

edited

Loading