Skip to content

Implement areInlineCompatible for SystemZ using feature bitset #132976

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 7, 2025

Conversation

chavandres
Copy link
Contributor

What?

Implement areInlineCompatible for the SystemZ target using FeatureBitset comparison.

Why?

The default implementation in TargetTransformInfoImpl.h makes a string comparison and only inlines when the target-cpu and the target-features for caller and callee are the same. We are missing out on optimizations when the callee has a subset of features of the caller.

How?

Get the FeatureBitset of the caller and callee and check when callee is a subset or equal to the caller's features. It's a similar implementation to ARM, PowerPC...

Testing?

Test cases check for when the callee is a subset of the caller, when it's not a subset and when both are equals.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@chavandres chavandres changed the title Systemz tti inlinecompat Implement areInlineCompatible for SystemZ using feature bitset Mar 25, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 25, 2025

@llvm/pr-subscribers-backend-systemz

Author: Andres Chavarria (chavandres)

Changes

What?

Implement areInlineCompatible for the SystemZ target using FeatureBitset comparison.

Why?

The default implementation in TargetTransformInfoImpl.h makes a string comparison and only inlines when the target-cpu and the target-features for caller and callee are the same. We are missing out on optimizations when the callee has a subset of features of the caller.

How?

Get the FeatureBitset of the caller and callee and check when callee is a subset or equal to the caller's features. It's a similar implementation to ARM, PowerPC...

Testing?

Test cases check for when the callee is a subset of the caller, when it's not a subset and when both are equals.


Full diff: https://github.com/llvm/llvm-project/pull/132976.diff

4 Files Affected:

  • (modified) llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp (+14)
  • (modified) llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h (+4)
  • (added) llvm/test/Transforms/Inline/SystemZ/inline-target-attr.ll (+50)
  • (added) llvm/test/Transforms/Inline/SystemZ/lit.local.cfg (+2)
diff --git a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
index 06a0a3a631654..bd0fdb414bedf 100644
--- a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
@@ -422,6 +422,20 @@ bool SystemZTTIImpl::isLSRCostLess(const TargetTransformInfo::LSRCost &C1,
              C2.ScaleCost, C2.SetupCost);
 }
 
+bool SystemZTTIImpl::areInlineCompatible(const Function *Caller,
+                                         const Function *Callee) const {
+  const TargetMachine &TM = getTLI()->getTargetMachine();
+
+  const FeatureBitset &CallerBits =
+      TM.getSubtargetImpl(*Caller)->getFeatureBits();
+  const FeatureBitset &CalleeBits =
+      TM.getSubtargetImpl(*Callee)->getFeatureBits();
+
+  // Check that target features from the callee are subset or
+  // equal to the caller's features.
+  return (CalleeBits == CallerBits) || (CalleeBits < CallerBits);
+}
+
 unsigned SystemZTTIImpl::getNumberOfRegisters(unsigned ClassID) const {
   bool Vector = (ClassID == 1);
   if (!Vector)
diff --git a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h
index 512fcc854d532..45de346cf97f7 100644
--- a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h
+++ b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h
@@ -62,6 +62,10 @@ class SystemZTTIImpl : public BasicTTIImplBase<SystemZTTIImpl> {
 
   bool isLSRCostLess(const TargetTransformInfo::LSRCost &C1,
                      const TargetTransformInfo::LSRCost &C2);
+  
+  bool areInlineCompatible(const Function *Caller,
+                          const Function *Callee) const;
+  
   /// @}
 
   /// \name Vector TTI Implementations
diff --git a/llvm/test/Transforms/Inline/SystemZ/inline-target-attr.ll b/llvm/test/Transforms/Inline/SystemZ/inline-target-attr.ll
new file mode 100644
index 0000000000000..1c70962dd18ee
--- /dev/null
+++ b/llvm/test/Transforms/Inline/SystemZ/inline-target-attr.ll
@@ -0,0 +1,50 @@
+; RUN: opt < %s -mtriple=s390x-linux-gnu -S -passes=inline | FileCheck %s
+; RUN: opt < %s -mtriple=s390x-linux-gnu -S -passes='cgscc(inline)' | FileCheck %s
+; Check that we only inline when we have compatible target attributes.
+
+define i32 @foo() #0 {
+entry:
+  %call = call i32 (...) @baz()
+  ret i32 %call
+; CHECK-LABEL: foo
+; CHECK: call i32 (...) @baz()
+}
+
+declare i32 @baz(...) #0
+
+define i32 @bar() #1 {
+entry:
+  %call = call i32 @foo()
+  ret i32 %call
+; CHECK-LABEL: bar
+; CHECK: call i32 (...) @baz()
+}
+
+define i32 @qux() #0 {
+entry:
+  %call = call i32 @bar()
+  ret i32 %call
+; CHECK-LABEL: qux
+; CHECK: call i32 @bar()
+}
+
+define i32 @quux() #2 {
+entry:
+  %call = call i32 @bar()
+  ret i32 %call
+; CHECK-LABEL: quux
+; CHECK: call i32 (...) @baz()
+}
+
+define i32 @foobar() #1 {
+entry:
+  %call = call i32 @bar()
+  ret i32 %call
+; CHECK-LABEL: foobar
+; CHECK: call i32 (...) @baz()
+}
+
+
+attributes #0 = { "target-cpu"="generic" "target-features"="+guarded-storage" }
+attributes #1 = { "target-cpu"="generic" "target-features"="+guarded-storage,+enhanced-sort" }
+attributes #2 = { "target-cpu"="generic" "target-features"="+concurrent-functions" }
diff --git a/llvm/test/Transforms/Inline/SystemZ/lit.local.cfg b/llvm/test/Transforms/Inline/SystemZ/lit.local.cfg
new file mode 100644
index 0000000000000..f9dd98a21cc3e
--- /dev/null
+++ b/llvm/test/Transforms/Inline/SystemZ/lit.local.cfg
@@ -0,0 +1,2 @@
+if not "SystemZ" in config.root.targets:
+    config.unsupported = True

@llvmbot
Copy link
Member

llvmbot commented Mar 25, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Andres Chavarria (chavandres)

Changes

What?

Implement areInlineCompatible for the SystemZ target using FeatureBitset comparison.

Why?

The default implementation in TargetTransformInfoImpl.h makes a string comparison and only inlines when the target-cpu and the target-features for caller and callee are the same. We are missing out on optimizations when the callee has a subset of features of the caller.

How?

Get the FeatureBitset of the caller and callee and check when callee is a subset or equal to the caller's features. It's a similar implementation to ARM, PowerPC...

Testing?

Test cases check for when the callee is a subset of the caller, when it's not a subset and when both are equals.


Full diff: https://github.com/llvm/llvm-project/pull/132976.diff

4 Files Affected:

  • (modified) llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp (+14)
  • (modified) llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h (+4)
  • (added) llvm/test/Transforms/Inline/SystemZ/inline-target-attr.ll (+50)
  • (added) llvm/test/Transforms/Inline/SystemZ/lit.local.cfg (+2)
diff --git a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
index 06a0a3a631654..bd0fdb414bedf 100644
--- a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
@@ -422,6 +422,20 @@ bool SystemZTTIImpl::isLSRCostLess(const TargetTransformInfo::LSRCost &C1,
              C2.ScaleCost, C2.SetupCost);
 }
 
+bool SystemZTTIImpl::areInlineCompatible(const Function *Caller,
+                                         const Function *Callee) const {
+  const TargetMachine &TM = getTLI()->getTargetMachine();
+
+  const FeatureBitset &CallerBits =
+      TM.getSubtargetImpl(*Caller)->getFeatureBits();
+  const FeatureBitset &CalleeBits =
+      TM.getSubtargetImpl(*Callee)->getFeatureBits();
+
+  // Check that target features from the callee are subset or
+  // equal to the caller's features.
+  return (CalleeBits == CallerBits) || (CalleeBits < CallerBits);
+}
+
 unsigned SystemZTTIImpl::getNumberOfRegisters(unsigned ClassID) const {
   bool Vector = (ClassID == 1);
   if (!Vector)
diff --git a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h
index 512fcc854d532..45de346cf97f7 100644
--- a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h
+++ b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h
@@ -62,6 +62,10 @@ class SystemZTTIImpl : public BasicTTIImplBase<SystemZTTIImpl> {
 
   bool isLSRCostLess(const TargetTransformInfo::LSRCost &C1,
                      const TargetTransformInfo::LSRCost &C2);
+  
+  bool areInlineCompatible(const Function *Caller,
+                          const Function *Callee) const;
+  
   /// @}
 
   /// \name Vector TTI Implementations
diff --git a/llvm/test/Transforms/Inline/SystemZ/inline-target-attr.ll b/llvm/test/Transforms/Inline/SystemZ/inline-target-attr.ll
new file mode 100644
index 0000000000000..1c70962dd18ee
--- /dev/null
+++ b/llvm/test/Transforms/Inline/SystemZ/inline-target-attr.ll
@@ -0,0 +1,50 @@
+; RUN: opt < %s -mtriple=s390x-linux-gnu -S -passes=inline | FileCheck %s
+; RUN: opt < %s -mtriple=s390x-linux-gnu -S -passes='cgscc(inline)' | FileCheck %s
+; Check that we only inline when we have compatible target attributes.
+
+define i32 @foo() #0 {
+entry:
+  %call = call i32 (...) @baz()
+  ret i32 %call
+; CHECK-LABEL: foo
+; CHECK: call i32 (...) @baz()
+}
+
+declare i32 @baz(...) #0
+
+define i32 @bar() #1 {
+entry:
+  %call = call i32 @foo()
+  ret i32 %call
+; CHECK-LABEL: bar
+; CHECK: call i32 (...) @baz()
+}
+
+define i32 @qux() #0 {
+entry:
+  %call = call i32 @bar()
+  ret i32 %call
+; CHECK-LABEL: qux
+; CHECK: call i32 @bar()
+}
+
+define i32 @quux() #2 {
+entry:
+  %call = call i32 @bar()
+  ret i32 %call
+; CHECK-LABEL: quux
+; CHECK: call i32 (...) @baz()
+}
+
+define i32 @foobar() #1 {
+entry:
+  %call = call i32 @bar()
+  ret i32 %call
+; CHECK-LABEL: foobar
+; CHECK: call i32 (...) @baz()
+}
+
+
+attributes #0 = { "target-cpu"="generic" "target-features"="+guarded-storage" }
+attributes #1 = { "target-cpu"="generic" "target-features"="+guarded-storage,+enhanced-sort" }
+attributes #2 = { "target-cpu"="generic" "target-features"="+concurrent-functions" }
diff --git a/llvm/test/Transforms/Inline/SystemZ/lit.local.cfg b/llvm/test/Transforms/Inline/SystemZ/lit.local.cfg
new file mode 100644
index 0000000000000..f9dd98a21cc3e
--- /dev/null
+++ b/llvm/test/Transforms/Inline/SystemZ/lit.local.cfg
@@ -0,0 +1,2 @@
+if not "SystemZ" in config.root.targets:
+    config.unsupported = True

@redstar redstar requested review from uweigand and redstar March 25, 2025 19:24

// Check that target features from the callee are subset or
// equal to the caller's features.
return (CalleeBits == CallerBits) || (CalleeBits < CallerBits);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, '<' on the FeatureBitset type does not implement a subset test. To test for that, you'd need something like

CalleeBits & CallerBits == CalleeBits

However, in fact allowing any subsets is semantically incorrect. In particular, the vector feature changes the ABI, so we need to be careful here - we may need something similar to the AVX-512 tests in X86. (Or, maybe, we can at least support inlining as long as both caller and callee agree on the vector feature.)

There might be other features that could be problematic as well, I need to review in more detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I agree with you on the danger of inlining without agreeing on the vector feature.

So the implementation can change to only inline when the bitset are the same, unless we add logic to agree on the vector feature assuming that is the only problematic case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inlining when the bitsets are the same would certainly be correct, and we might want to do that as an interim solution. But is that actually any different from the current (default) behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of behavior it will be the same as the default implementation. But given that the default uses string comparion, checking bitset equality is a cleaner solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I guess that would be fine with me, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I updated the code and test case to only inline when both callee and caller have equal bitset.

Copy link

github-actions bot commented Apr 7, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@uweigand
Copy link
Member

uweigand commented Apr 7, 2025

Can you please fix the formatting issues clang-format has complained about? Also, maybe expand the comment to say that we're currently only supporting identical feature sets, but that restriction should be relaxed in the future. Otherwise, this looks good.

@chavandres chavandres force-pushed the systemz-tti-inlinecompat branch from 9eb4ea6 to 99f1107 Compare April 7, 2025 18:32
@chavandres
Copy link
Contributor Author

Fixed the formatting issues, plus expanded the comment that we might support subsets in the future. I rebased the branch with the latest main and squashed my changes to single commit.

Copy link
Member

@uweigand uweigand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, thanks!

@uweigand uweigand merged commit 9b63a92 into llvm:main Apr 7, 2025
11 checks passed
Copy link

github-actions bot commented Apr 7, 2025

@chavandres Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants