Skip to content

[AMDGPU][SplitModule] Do not create empty modules #135761

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 25, 2025

Conversation

Pierre-vh
Copy link
Contributor

@Pierre-vh Pierre-vh commented Apr 15, 2025

Skip creating a module if no function is going to be imported.
Also includes a change so that if the first partition is empty (which can happen),
we import global with non-local linkage into the first non-empty partition, instead
of P0 all the time.

I thought we'd need to change users of the SplitModule callback so they can deal with less modules
than the number requested, but no. We already return only 1 module in some cases and
it seems to be handled just fine.

Fixes SWDEV-523146

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@Pierre-vh Pierre-vh marked this pull request as ready for review April 15, 2025 08:40
@llvmbot
Copy link
Member

llvmbot commented Apr 15, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)

Changes

Skip creating a module if no function is going to be imported.
Also includes a change so that if the first partition is empty (which can happen),
we import global with non-local linkage into the first non-empty partition, instead
of P0 all the time.

I thought we'd need to change users of the SplitModule callback so they can deal with less modules
than the number requested, but no. We already return only 1 module in some cases and
it seems to be handled just fine.

Fixes SWDEV-523146


Full diff: https://github.com/llvm/llvm-project/pull/135761.diff

6 Files Affected:

  • (modified) llvm/include/llvm/Target/TargetMachine.h (+3)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp (+14-1)
  • (modified) llvm/test/tools/llvm-split/AMDGPU/address-taken-externalize-with-call.ll (+4-8)
  • (modified) llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging-weak_odr.ll (+10-12)
  • (modified) llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging.ll (+10-12)
  • (added) llvm/test/tools/llvm-split/AMDGPU/preserve-globals.ll (+21)
diff --git a/llvm/include/llvm/Target/TargetMachine.h b/llvm/include/llvm/Target/TargetMachine.h
index 5a16c9cafcd7a..566e7dba6792b 100644
--- a/llvm/include/llvm/Target/TargetMachine.h
+++ b/llvm/include/llvm/Target/TargetMachine.h
@@ -451,6 +451,9 @@ class TargetMachine {
   /// Entry point for module splitting. Targets can implement custom module
   /// splitting logic, mainly used by LTO for --lto-partitions.
   ///
+  /// On success, this guarantees that between 1 and \p NumParts modules were
+  /// created and passed to \p ModuleCallBack.
+  ///
   /// \returns `true` if the module was split, `false` otherwise. When  `false`
   /// is returned, it is assumed that \p ModuleCallback has never been called
   /// and \p M has not been modified.
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp b/llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp
index dd3bec774ec67..b7b9814b93427 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUSplitModule.cpp
@@ -1478,6 +1478,10 @@ static void splitAMDGPUModule(
              << "' - Partition summaries will not be printed\n";
   }
 
+  // One module will import all GlobalValues that are not Functions
+  // and are not subject to conservative import.
+  bool ImportAllGVs = true;
+
   for (unsigned PID = 0; PID < NumParts; ++PID) {
     SplitModuleTimer SMT2("modules_creation",
                           "creating modules for each partition");
@@ -1487,6 +1491,13 @@ static void splitAMDGPUModule(
     for (unsigned NodeID : (*Proposal)[PID].set_bits())
       FnsInPart.insert(&SG.getNode(NodeID).getFunction());
 
+    // Don't create empty module, except for PID 0 because
+    if (FnsInPart.empty()) {
+      LLVM_DEBUG(dbgs() << "[split] P" << PID
+                        << " is empty, not creating module\n");
+      continue;
+    }
+
     ValueToValueMapTy VMap;
     CostType PartCost = 0;
     std::unique_ptr<Module> MPart(
@@ -1501,9 +1512,11 @@ static void splitAMDGPUModule(
           }
 
           // Everything else goes in the first partition.
-          return needsConservativeImport(GV) || PID == 0;
+          return needsConservativeImport(GV) || ImportAllGVs;
         }));
 
+    ImportAllGVs = false;
+
     // FIXME: Aliases aren't seen often, and their handling isn't perfect so
     // bugs are possible.
 
diff --git a/llvm/test/tools/llvm-split/AMDGPU/address-taken-externalize-with-call.ll b/llvm/test/tools/llvm-split/AMDGPU/address-taken-externalize-with-call.ll
index 708b5a006be60..0f20a73b1928e 100644
--- a/llvm/test/tools/llvm-split/AMDGPU/address-taken-externalize-with-call.ll
+++ b/llvm/test/tools/llvm-split/AMDGPU/address-taken-externalize-with-call.ll
@@ -1,7 +1,6 @@
 ; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-large-threshold=0
 ; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s
 ; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s
-; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 --implicit-check-not=define %s
 
 ; 3 kernels:
 ;   - A does a direct call to HelperA
@@ -11,14 +10,11 @@
 ; The helper functions will get externalized, so C/A will end up
 ; in the same partition.
 
-; P0 is empty.
-; CHECK0: declare
+; CHECK0: define amdgpu_kernel void @B(ptr %dst)
 
-; CHECK1: define amdgpu_kernel void @B(ptr %dst)
-
-; CHECK2: define hidden void @HelperA()
-; CHECK2: define amdgpu_kernel void @A()
-; CHECK2: define amdgpu_kernel void @C()
+; CHECK1: define hidden void @HelperA()
+; CHECK1: define amdgpu_kernel void @A()
+; CHECK1: define amdgpu_kernel void @C()
 
 define internal void @HelperA() {
   ret void
diff --git a/llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging-weak_odr.ll b/llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging-weak_odr.ll
index 839688e7feb8b..567275686fb9f 100644
--- a/llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging-weak_odr.ll
+++ b/llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging-weak_odr.ll
@@ -1,7 +1,6 @@
 ; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-max-depth=0 -amdgpu-module-splitting-large-threshold=1.2 -amdgpu-module-splitting-merge-threshold=0.5
 ; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s
 ; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s
-; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 --implicit-check-not=define %s
 
 ; RUN: llvm-split -o %t.nolarge %s -j 3 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-large-threshold=0 -amdgpu-module-splitting-max-depth=0
 ; RUN: llvm-dis -o - %t.nolarge0 | FileCheck --check-prefix=NOLARGEKERNELS-CHECK0 --implicit-check-not=define %s
@@ -15,19 +14,18 @@
 ; Also check w/o large kernels processing to verify they are indeed handled
 ; differently.
 
-; P0 is empty
-; CHECK0: declare
+; Only two partitions created for the first command.
 
-; CHECK1: define internal void @HelperC()
-; CHECK1: define weak_odr amdgpu_kernel void @C
+; CHECK0: define internal void @HelperC()
+; CHECK0: define weak_odr amdgpu_kernel void @C
 
-; CHECK2: define internal void @large2()
-; CHECK2: define internal void @large1()
-; CHECK2: define internal void @large0()
-; CHECK2: define internal void @HelperA()
-; CHECK2: define internal void @HelperB()
-; CHECK2: define amdgpu_kernel void @A
-; CHECK2: define weak_odr amdgpu_kernel void @B
+; CHECK1: define internal void @large2()
+; CHECK1: define internal void @large1()
+; CHECK1: define internal void @large0()
+; CHECK1: define internal void @HelperA()
+; CHECK1: define internal void @HelperB()
+; CHECK1: define amdgpu_kernel void @A
+; CHECK1: define weak_odr amdgpu_kernel void @B
 
 ; NOLARGEKERNELS-CHECK0: define internal void @HelperC()
 ; NOLARGEKERNELS-CHECK0: define weak_odr amdgpu_kernel void @C
diff --git a/llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging.ll b/llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging.ll
index 807fb2e5f33ce..35133d20c4e07 100644
--- a/llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging.ll
+++ b/llvm/test/tools/llvm-split/AMDGPU/large-kernels-merging.ll
@@ -1,7 +1,6 @@
 ; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-max-depth=0 -amdgpu-module-splitting-large-threshold=1.2 -amdgpu-module-splitting-merge-threshold=0.5
 ; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s
 ; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s
-; RUN: llvm-dis -o - %t2 | FileCheck --check-prefix=CHECK2 --implicit-check-not=define %s
 
 ; RUN: llvm-split -o %t.nolarge %s -j 3 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-large-threshold=0 -amdgpu-module-splitting-max-depth=0
 ; RUN: llvm-dis -o - %t.nolarge0 | FileCheck --check-prefix=NOLARGEKERNELS-CHECK0 --implicit-check-not=define %s
@@ -15,19 +14,18 @@
 ; Also check w/o large kernels processing to verify they are indeed handled
 ; differently.
 
-; P0 is empty
-; CHECK0: declare
+; Only 2 partitions for the first command.
 
-; CHECK1: define internal void @HelperC()
-; CHECK1: define amdgpu_kernel void @C
+; CHECK0: define internal void @HelperC()
+; CHECK0: define amdgpu_kernel void @C
 
-; CHECK2: define internal void @large2()
-; CHECK2: define internal void @large1()
-; CHECK2: define internal void @large0()
-; CHECK2: define internal void @HelperA()
-; CHECK2: define internal void @HelperB()
-; CHECK2: define amdgpu_kernel void @A
-; CHECK2: define amdgpu_kernel void @B
+; CHECK1: define internal void @large2()
+; CHECK1: define internal void @large1()
+; CHECK1: define internal void @large0()
+; CHECK1: define internal void @HelperA()
+; CHECK1: define internal void @HelperB()
+; CHECK1: define amdgpu_kernel void @A
+; CHECK1: define amdgpu_kernel void @B
 
 ; NOLARGEKERNELS-CHECK0: define internal void @HelperC()
 ; NOLARGEKERNELS-CHECK0: define amdgpu_kernel void @C
diff --git a/llvm/test/tools/llvm-split/AMDGPU/preserve-globals.ll b/llvm/test/tools/llvm-split/AMDGPU/preserve-globals.ll
new file mode 100644
index 0000000000000..091010edd6be3
--- /dev/null
+++ b/llvm/test/tools/llvm-split/AMDGPU/preserve-globals.ll
@@ -0,0 +1,21 @@
+; RUN: llvm-split -o %t %s -j 3 -mtriple amdgcn-amd-amdhsa -amdgpu-module-splitting-max-depth=0 -amdgpu-module-splitting-large-threshold=1.2 -amdgpu-module-splitting-merge-threshold=0.5
+; RUN: llvm-dis -o - %t0 | FileCheck --check-prefix=CHECK0 --implicit-check-not=define %s
+; RUN: llvm-dis -o - %t1 | FileCheck --check-prefix=CHECK1 --implicit-check-not=define %s
+
+; Only 2 out of 3 partitions are created, check the external global is preserved in the first partition.
+
+; CHECK0: @foobar = linkonce_odr global i64 52
+; CHECK0: define amdgpu_kernel void @B
+
+; CHECK1-NOT: @foobar = linkonce_odr global i64 52
+; CHECK1: define amdgpu_kernel void @A
+
+@foobar = linkonce_odr global i64 52
+
+define amdgpu_kernel void @A() {
+  ret void
+}
+
+define amdgpu_kernel void @B() {
+  ret void
+}

@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/amdsplitmodule-skip-parts branch from d44f28b to c1a06b2 Compare April 16, 2025 09:49
Skip creating a module if no function is going to be imported.
Also includes a change so that if the first partition is empty (which can happen),
we import global with non-local linkage into the first non-empty partition, instead
of P0 all the time.

I thought we'd need to change users of the SplitModule callback so they can deal with less modules
than the number requested, but no. We already return only 1 module in some cases and
it seems to be handled just fine.
@Pierre-vh Pierre-vh force-pushed the users/pierre-vh/amdsplitmodule-skip-parts branch from c1a06b2 to 0f680c8 Compare April 22, 2025 07:48
Copy link
Contributor

@jmmartinez jmmartinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nitpick. Everything else looks good.

Comment on lines 1491 to +1500
for (unsigned NodeID : (*Proposal)[PID].set_bits())
FnsInPart.insert(&SG.getNode(NodeID).getFunction());

// Don't create empty modules.
if (FnsInPart.empty()) {
LLVM_DEBUG(dbgs() << "[split] P" << PID
<< " is empty, not creating module\n");
continue;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (unsigned NodeID : (*Proposal)[PID].set_bits())
FnsInPart.insert(&SG.getNode(NodeID).getFunction());
// Don't create empty modules.
if (FnsInPart.empty()) {
LLVM_DEBUG(dbgs() << "[split] P" << PID
<< " is empty, not creating module\n");
continue;
}
// Don't create empty modules.
if ((*Proposal)[PID].none()) {
LLVM_DEBUG(dbgs() << "[split] P" << PID
<< " is empty, not creating module\n");
continue;
}
for (unsigned NodeID : (*Proposal)[PID].set_bits())
FnsInPart.insert(&SG.getNode(NodeID).getFunction());

@Pierre-vh Pierre-vh merged commit 2168455 into main Apr 25, 2025
11 checks passed
@Pierre-vh Pierre-vh deleted the users/pierre-vh/amdsplitmodule-skip-parts branch April 25, 2025 07:36
@llvm-ci
Copy link
Collaborator

llvm-ci commented Apr 25, 2025

LLVM Buildbot has detected a new failure on builder lldb-remote-linux-win running on as-builder-10 while building llvm at step 17 "test-check-lldb-api".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/197/builds/4458

Here is the relevant piece of the build log for the reference
Step 17 (test-check-lldb-api) failure: Test just built components: check-lldb-api completed (failure)
******************** TEST 'lldb-api :: commands/watchpoints/hello_watchlocation/TestWatchLocation.py' FAILED ********************
Script:
--
C:/Python312/python.exe C:/buildbot/as-builder-10/lldb-x-aarch64/llvm-project/lldb\test\API\dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=C:/buildbot/as-builder-10/lldb-x-aarch64/build/./lib --env LLVM_INCLUDE_DIR=C:/buildbot/as-builder-10/lldb-x-aarch64/build/include --env LLVM_TOOLS_DIR=C:/buildbot/as-builder-10/lldb-x-aarch64/build/./bin --arch aarch64 --build-dir C:/buildbot/as-builder-10/lldb-x-aarch64/build/lldb-test-build.noindex --lldb-module-cache-dir C:/buildbot/as-builder-10/lldb-x-aarch64/build/lldb-test-build.noindex/module-cache-lldb\lldb-api --clang-module-cache-dir C:/buildbot/as-builder-10/lldb-x-aarch64/build/lldb-test-build.noindex/module-cache-clang\lldb-api --executable C:/buildbot/as-builder-10/lldb-x-aarch64/build/./bin/lldb.exe --compiler C:/buildbot/as-builder-10/lldb-x-aarch64/build/./bin/clang.exe --dsymutil C:/buildbot/as-builder-10/lldb-x-aarch64/build/./bin/dsymutil.exe --make C:/ninja/make.exe --llvm-tools-dir C:/buildbot/as-builder-10/lldb-x-aarch64/build/./bin --lldb-obj-root C:/buildbot/as-builder-10/lldb-x-aarch64/build/tools/lldb --lldb-libs-dir C:/buildbot/as-builder-10/lldb-x-aarch64/build/./lib --platform-url connect://jetson-agx-0086.lab.llvm.org:1234 --platform-working-dir /home/ubuntu/lldb-tests --sysroot c:/buildbot/fs/jetson-agx-ubuntu --env ARCH_CFLAGS=-mcpu=cortex-a78 --platform-name remote-linux --skip-category=lldb-server C:\buildbot\as-builder-10\lldb-x-aarch64\llvm-project\lldb\test\API\commands\watchpoints\hello_watchlocation -p TestWatchLocation.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 21.0.0git (https://github.com/llvm/llvm-project.git revision 2168455ef4cdbec9df3b63900fe9b316154187cf)
  clang revision 2168455ef4cdbec9df3b63900fe9b316154187cf
  llvm revision 2168455ef4cdbec9df3b63900fe9b316154187cf
Setting up remote platform 'remote-linux'

Connecting to remote platform 'remote-linux' at 'connect://jetson-agx-0086.lab.llvm.org:1234'...

Connected.

Setting remote platform working directory to '/home/ubuntu/lldb-tests'...

Skipping the following test categories: ['lldb-server', 'dsym', 'gmodules', 'debugserver', 'objc', 'lldb-dap']


--
Command Output (stderr):
--
FAIL: LLDB (C:\buildbot\as-builder-10\lldb-x-aarch64\build\bin\clang.exe-aarch64) :: test_hello_watchlocation (TestWatchLocation.HelloWatchLocationTestCase.test_hello_watchlocation)

======================================================================

FAIL: test_hello_watchlocation (TestWatchLocation.HelloWatchLocationTestCase.test_hello_watchlocation)

   Test watching a location with '-s size' option.

----------------------------------------------------------------------

Traceback (most recent call last):

  File "C:\buildbot\as-builder-10\lldb-x-aarch64\llvm-project\lldb\test\API\commands\watchpoints\hello_watchlocation\TestWatchLocation.py", line 62, in test_hello_watchlocation

    self.expect(

  File "C:\buildbot\as-builder-10\lldb-x-aarch64\llvm-project\lldb\packages\Python\lldbsuite\test\lldbtest.py", line 2406, in expect

    self.runCmd(

  File "C:\buildbot\as-builder-10\lldb-x-aarch64\llvm-project\lldb\packages\Python\lldbsuite\test\lldbtest.py", line 1005, in runCmd

    self.assertTrue(self.res.Succeeded(), msg + output)

...

IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
Skip creating a module if no function is going to be imported.
Also includes a change so that if the first partition is empty (which
can happen),
we import global with non-local linkage into the first non-empty
partition, instead
of P0 all the time.

I thought we'd need to change users of the SplitModule callback so they
can deal with less modules
than the number requested, but no. We already return only 1 module in
some cases and
it seems to be handled just fine.

Fixes SWDEV-523146
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
Skip creating a module if no function is going to be imported.
Also includes a change so that if the first partition is empty (which
can happen),
we import global with non-local linkage into the first non-empty
partition, instead
of P0 all the time.

I thought we'd need to change users of the SplitModule callback so they
can deal with less modules
than the number requested, but no. We already return only 1 module in
some cases and
it seems to be handled just fine.

Fixes SWDEV-523146
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
Skip creating a module if no function is going to be imported.
Also includes a change so that if the first partition is empty (which
can happen),
we import global with non-local linkage into the first non-empty
partition, instead
of P0 all the time.

I thought we'd need to change users of the SplitModule callback so they
can deal with less modules
than the number requested, but no. We already return only 1 module in
some cases and
it seems to be handled just fine.

Fixes SWDEV-523146
Ankur-0429 pushed a commit to Ankur-0429/llvm-project that referenced this pull request May 9, 2025
Skip creating a module if no function is going to be imported.
Also includes a change so that if the first partition is empty (which
can happen),
we import global with non-local linkage into the first non-empty
partition, instead
of P0 all the time.

I thought we'd need to change users of the SplitModule callback so they
can deal with less modules
than the number requested, but no. We already return only 1 module in
some cases and
it seems to be handled just fine.

Fixes SWDEV-523146
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants