Skip to content

[BOLT] Add --pad-funcs-before=func:n #117924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 11, 2024

Conversation

peterwaller-arm
Copy link
Contributor

@peterwaller-arm peterwaller-arm commented Nov 27, 2024

This complements --pad-funcs, and by using both simultaneously, enables
moving a specific function through the address space without modifying any code
other than the targeted function (and references to it) by doing
(before+after=constant).

See also: proposed functionality to enable inserting random padding in
https://discourse.llvm.org/t/rfc-lld-feature-for-controlling-for-code-size-dependent-measurement-bias
and #117653

Copy link

github-actions bot commented Nov 27, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@peterwaller-arm peterwaller-arm force-pushed the pad-before-funcs branch 2 times, most recently from 19b8e54 to dbfc1b5 Compare November 27, 2024 21:18
@peterwaller-arm peterwaller-arm marked this pull request as ready for review November 27, 2024 21:22
@llvmbot llvmbot added the BOLT label Nov 27, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 27, 2024

@llvm/pr-subscribers-bolt

Author: Peter Waller (peterwaller-arm)

Changes

This complements --pad-funcs, and by using both simultaneously, enables
moving a specific function through the address space without modifying any code
other than the targeted function (and references to it) by doing
(before+after=constant).

See also: proposed functionality to enable inserting random padding in
https://discourse.llvm.org/t/rfc-lld-feature-for-controlling-for-code-size-dependent-measurement-bias
and #117653


Full diff: https://github.com/llvm/llvm-project/pull/117924.diff

2 Files Affected:

  • (modified) bolt/lib/Core/BinaryEmitter.cpp (+60)
  • (added) bolt/test/AArch64/pad-before-funcs.s (+28)
diff --git a/bolt/lib/Core/BinaryEmitter.cpp b/bolt/lib/Core/BinaryEmitter.cpp
index f34a94c5779213..be21f1f363af29 100644
--- a/bolt/lib/Core/BinaryEmitter.cpp
+++ b/bolt/lib/Core/BinaryEmitter.cpp
@@ -54,6 +54,12 @@ FunctionPadSpec("pad-funcs",
   cl::Hidden,
   cl::cat(BoltCategory));
 
+static cl::list<std::string> FunctionPadBeforeSpec(
+    "pad-funcs-before", cl::CommaSeparated,
+    cl::desc("list of functions to pad with amount of bytes"),
+    cl::value_desc("func1:pad1,func2:pad2,func3:pad3,..."), cl::Hidden,
+    cl::cat(BoltCategory));
+
 static cl::opt<bool> MarkFuncs(
     "mark-funcs",
     cl::desc("mark function boundaries with break instruction to make "
@@ -94,6 +100,30 @@ size_t padFunction(const BinaryFunction &Function) {
   return 0;
 }
 
+size_t padFunctionBefore(const BinaryFunction &Function) {
+  static std::map<std::string, size_t> FunctionPadding;
+
+  if (FunctionPadding.empty() && !FunctionPadBeforeSpec.empty()) {
+    for (std::string &Spec : FunctionPadBeforeSpec) {
+      size_t N = Spec.find(':');
+      if (N == std::string::npos)
+        continue;
+      std::string Name = Spec.substr(0, N);
+      size_t Padding = std::stoull(Spec.substr(N + 1));
+      FunctionPadding[Name] = Padding;
+    }
+  }
+
+  for (auto &FPI : FunctionPadding) {
+    std::string Name = FPI.first;
+    size_t Padding = FPI.second;
+    if (Function.hasNameRegex(Name))
+      return Padding;
+  }
+
+  return 0;
+}
+
 } // namespace opts
 
 namespace {
@@ -319,6 +349,36 @@ bool BinaryEmitter::emitFunction(BinaryFunction &Function,
     Streamer.emitCodeAlignment(Function.getAlign(), &*BC.STI);
   }
 
+  if (size_t Padding = opts::padFunctionBefore(Function)) {
+    // Handle padFuncsBefore after the above alignment logic but before
+    // symbol addresses are decided; with the intent that the nops are
+    // not executed and the original alignment logic is preserved.
+    if (!BC.HasRelocations) {
+      errs() << "BOLT-ERROR: -pad-before-funcs is not supported in "
+             << "non-relocation mode\n";
+      exit(1);
+    }
+
+    // Preserve Function.getMinAlign().
+    if (!isAligned(Function.getMinAlign(), Padding)) {
+      errs() << "BOLT-ERROR: User-requested " << Padding
+             << " padding bytes before function " << Function
+             << " is not a multiple of the minimum function alignment ("
+             << Function.getMinAlign().value() << ").\n";
+      exit(1);
+    }
+
+    LLVM_DEBUG(dbgs() << "BOLT-DEBUG: padding function " << Function << " with "
+                      << Padding << " bytes\n");
+
+    // Since the padding is not executed, it can be null bytes.
+    Streamer.emitFill(Padding, 0);
+
+    Function.setMaxSize(Function.getMaxSize() + Padding);
+    Function.setSize(Function.getSize() + Padding);
+    Function.setImageSize(Function.getImageSize() + Padding);
+  }
+
   MCContext &Context = Streamer.getContext();
   const MCAsmInfo *MAI = Context.getAsmInfo();
 
diff --git a/bolt/test/AArch64/pad-before-funcs.s b/bolt/test/AArch64/pad-before-funcs.s
new file mode 100644
index 00000000000000..3b4eae8e70fecc
--- /dev/null
+++ b/bolt/test/AArch64/pad-before-funcs.s
@@ -0,0 +1,28 @@
+# RUN: llvm-mc -filetype=obj -triple aarch64-unknown-unknown %s -o %t.o
+# RUN: %clang %cflags %t.o -o %t.exe -Wl,-q -Wl,--section-start=.text=0x4000
+# RUN: llvm-bolt %t.exe -o %t.bolt.0 --pad-funcs-before=_start:0
+# RUN: llvm-bolt %t.exe -o %t.bolt.4 --pad-funcs-before=_start:4
+# RUN: llvm-bolt %t.exe -o %t.bolt.8 --pad-funcs-before=_start:8
+
+# RUN: not llvm-bolt %t.exe -o %t.bolt.8 --pad-funcs-before=_start:1 2>&1 | FileCheck --check-prefix=CHECK-BAD-ALIGN %s
+
+# CHECK-BAD-ALIGN: User-requested 1 padding bytes before function _start(*2) is not a multiple of the minimum function alignment (4).
+
+# RUN: llvm-objdump --section=.text --disassemble %t.bolt.0 | FileCheck --check-prefix=CHECK-0 %s
+# RUN: llvm-objdump --section=.text --disassemble %t.bolt.4 | FileCheck --check-prefix=CHECK-4 %s
+# RUN: llvm-objdump --section=.text --disassemble %t.bolt.8 | FileCheck --check-prefix=CHECK-8 %s
+
+# Need a symbol reference so that relocations are emitted.
+.section .data
+test:
+    .word 0x0
+
+.section .text
+.globl _start
+
+# CHECK-0: 0000000000400000 <_start>
+# CHECK-4: 0000000000400004 <_start>
+# CHECK-8: 0000000000400008 <_start>
+_start:
+    adr x0, test
+    ret

@pcc
Copy link
Contributor

pcc commented Nov 28, 2024

Would this be more useful as an RNG controlled feature like the LLD feature?

@peterwaller-arm
Copy link
Contributor Author

I am thinking that this is complementary to the seeded approach. In this case it's a surgical tool which can be used to deliberately modify a binary in a particular way with minimal incidental changes. I would also be interested in adding a seed-based approach at some point.

Copy link
Contributor

@maksfb maksfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left the comments on the code as it is. I'll post a comment on the RFC since I'd like to understand the possible application of this feature.

@peterwaller-arm peterwaller-arm force-pushed the pad-before-funcs branch 3 times, most recently from 0a56fcb to cd8f7d3 Compare December 5, 2024 14:17
@peterwaller-arm
Copy link
Contributor Author

I see the CI failure. I'll ping when resolved.

Copy link
Contributor

@maksfb maksfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks. Please address a few nits before committing.

This complements --pad-funcs, and by using both simultaneously, enables
moving a specific function through a binary without modifying any code
other than the targeted function (and references to it) by doing
(before+after=constant).

See also: proposed functionality to enable inserting random padding in
https://discourse.llvm.org/t/rfc-lld-feature-for-controlling-for-code-size-dependent-measurement-bias
@peterwaller-arm peterwaller-arm merged commit 14dcf82 into llvm:main Dec 11, 2024
7 checks passed
@peterwaller-arm peterwaller-arm deleted the pad-before-funcs branch December 11, 2024 09:58
@@ -70,11 +74,12 @@ X86AlignBranchBoundaryHotOnly("x86-align-branch-boundary-hot-only",
cl::init(true),
cl::cat(BoltOptCategory));

size_t padFunction(const BinaryFunction &Function) {
size_t padFunction(const cl::list<std::string> &Spec,
const BinaryFunction &Function) {
static std::map<std::string, size_t> FunctionPadding;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an issue here with the static map.

If either opts::FunctionPadSpec or opts::FunctionPadBeforeSpec are set, the map is going to be populated with the respective spec in the first invocation of BinaryEmitter::emitFunction. The subsequent invocations will pick up the padding from the map irrespective of whether opts::FunctionPadSpec or opts::FunctionPadBeforeSpec is passed as a parameter.

This breaks one of our internal tests so I'm going to revert this patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting and the revert. Fix is at #121918.

aaupov added a commit that referenced this pull request Jan 6, 2025
14dcf82 introduced a subtle bug with
the static `FunctionPadding` map.

If either `opts::FunctionPadSpec` or `opts::FunctionPadBeforeSpec` are set,
the map is going to be populated with the respective spec in the first
invocation of `BinaryEmitter::emitFunction`. The subsequent invocations
will pick up the padding from the map irrespective of whether
`opts::FunctionPadSpec` or `opts::FunctionPadBeforeSpec` is passed as a
parameter.

This breaks an internal test, hence reverting the patch.
peterwaller-arm added a commit that referenced this pull request Jan 7, 2025
- **Reapply "[BOLT] Add --pad-funcs-before=func:n (#117924)"**
- **[BOLT] Fix --pad-funcs{,-before} state misinteraction**

When --pad-funcs-before was introduced, it introduced a bug whereby the
first one to get parsed could influence the other.

Ensure that each has its own state and test that they don't interact in
this manner by testing how the `_subsequent` symbol moves when both
arguments are supplied with different padding values.

Fixed by having a function (and static state) for each of before/after.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants