-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[RFC][Transforms][IPO] Add func suffix in ArgumentPromotion and DeadArgumentElimination #109899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-function-specialization Author: None (yonghong-song) ChangesThe goal is to add suffix to Argument Promotion and Dead Argument Elimination passes. So users will know that function signature get changed. One of motivation is to help kernel tracing with bpf technology. For details of the description for the patch, see [1]. [1] #105742 for details Patch is 123.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/109899.diff 80 Files Affected:
diff --git a/compiler-rt/test/cfi/stats.cpp b/compiler-rt/test/cfi/stats.cpp
index ca6b3bf0df4814..9c4900e86129aa 100644
--- a/compiler-rt/test/cfi/stats.cpp
+++ b/compiler-rt/test/cfi/stats.cpp
@@ -26,12 +26,12 @@ extern "C" __attribute__((noinline)) void nvcall(A *a) {
}
extern "C" __attribute__((noinline)) A *dcast(A *a) {
- // CHECK: stats.cpp:[[@LINE+1]] {{_?}}dcast cfi-derived-cast 24
+ // CHECK: stats.cpp:[[@LINE+1]] {{_?}}dcast.retelim cfi-derived-cast 24
return (A *)(ABase *)a;
}
extern "C" __attribute__((noinline)) A *ucast(A *a) {
- // CHECK: stats.cpp:[[@LINE+1]] {{_?}}ucast cfi-unrelated-cast 81
+ // CHECK: stats.cpp:[[@LINE+1]] {{_?}}ucast.retelim cfi-unrelated-cast 81
return (A *)(char *)a;
}
diff --git a/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp b/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
index 1f9b546ed29996..c8b75dd475ae44 100644
--- a/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
+++ b/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
@@ -215,6 +215,7 @@ doPromotion(Function *F, FunctionAnalysisManager &FAM,
F->getParent()->getFunctionList().insert(F->getIterator(), NF);
NF->takeName(F);
+ NF->setName(NF->getName() + ".argprom");
// Loop over all the callers of the function, transforming the call sites to
// pass in the loaded pointers.
diff --git a/llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp b/llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp
index d1548592b1ce26..b912cc66d19db5 100644
--- a/llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp
+++ b/llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp
@@ -889,6 +889,10 @@ bool DeadArgumentEliminationPass::removeDeadStuffFromFunction(Function *F) {
// it again.
F->getParent()->getFunctionList().insert(F->getIterator(), NF);
NF->takeName(F);
+ if (NumArgumentsEliminated)
+ NF->setName(NF->getName() + ".argelim");
+ else
+ NF->setName(NF->getName() + ".retelim");
NF->IsNewDbgInfoFormat = F->IsNewDbgInfoFormat;
// Loop over all the callers of the function, transforming the call sites to
diff --git a/llvm/test/Analysis/LazyCallGraph/remove-dead-function-spurious-ref-edge.ll b/llvm/test/Analysis/LazyCallGraph/remove-dead-function-spurious-ref-edge.ll
index 2bc486f541c71f..4f16c02b1473ff 100644
--- a/llvm/test/Analysis/LazyCallGraph/remove-dead-function-spurious-ref-edge.ll
+++ b/llvm/test/Analysis/LazyCallGraph/remove-dead-function-spurious-ref-edge.ll
@@ -9,7 +9,7 @@ define internal void @a() alwaysinline {
}
define internal void @b(ptr) noinline {
-; CHECK-LABEL: @b(
+; CHECK-LABEL: @b.argprom(
; CHECK-NEXT: ret void
;
ret void
@@ -17,7 +17,7 @@ define internal void @b(ptr) noinline {
define internal void @c() noinline {
; CHECK-LABEL: @c(
-; CHECK-NEXT: call void @b()
+; CHECK-NEXT: call void @b.argprom()
; CHECK-NEXT: ret void
;
call void @b(ptr @a)
diff --git a/llvm/test/BugPoint/remove_arguments_test.ll b/llvm/test/BugPoint/remove_arguments_test.ll
index 9e9c51eaafc383..bb93e45e4b46ef 100644
--- a/llvm/test/BugPoint/remove_arguments_test.ll
+++ b/llvm/test/BugPoint/remove_arguments_test.ll
@@ -11,7 +11,7 @@
declare i32 @test2()
-; CHECK: define void @test() {
+; CHECK: define void @test.argelim() {
define i32 @test(i32 %A, ptr %B, float %C) {
call i32 @test2()
ret i32 %1
diff --git a/llvm/test/CodeGen/AArch64/arg_promotion.ll b/llvm/test/CodeGen/AArch64/arg_promotion.ll
index cc37d230c6cbe4..724a7f109f1e29 100644
--- a/llvm/test/CodeGen/AArch64/arg_promotion.ll
+++ b/llvm/test/CodeGen/AArch64/arg_promotion.ll
@@ -38,16 +38,16 @@ define dso_local void @caller_4xi32(ptr noalias %src, ptr noalias %dst) #1 {
; CHECK-LABEL: define dso_local void @caller_4xi32(
; CHECK-NEXT: entry:
; CHECK-NEXT: [[SRC_VAL:%.*]] = load <4 x i32>, ptr [[SRC:%.*]], align 16
-; CHECK-NEXT: call fastcc void @callee_4xi32(<4 x i32> [[SRC_VAL]], ptr noalias [[DST:%.*]])
+; CHECK-NEXT: call fastcc void @callee_4xi32.argprom.argprom(<4 x i32> [[SRC_VAL]], ptr noalias [[DST:%.*]])
; CHECK-NEXT: ret void
;
entry:
- call fastcc void @callee_4xi32(ptr noalias %src, ptr noalias %dst)
+ call fastcc void @callee_4xi32.argprom(ptr noalias %src, ptr noalias %dst)
ret void
}
-define internal fastcc void @callee_4xi32(ptr noalias %src, ptr noalias %dst) #1 {
-; CHECK-LABEL: define internal fastcc void @callee_4xi32(
+define internal fastcc void @callee_4xi32.argprom(ptr noalias %src, ptr noalias %dst) #1 {
+; CHECK-LABEL: define internal fastcc void @callee_4xi32.argprom.argprom(
; CHECK-NEXT: entry:
; CHECK-NEXT: store <4 x i32> [[SRC_0_VAL:%.*]], ptr [[DST:%.*]], align 16
; CHECK-NEXT: ret void
@@ -65,7 +65,7 @@ define dso_local void @caller_i256(ptr noalias %src, ptr noalias %dst) #0 {
; CHECK-LABEL: define dso_local void @caller_i256(
; CHECK-NEXT: entry:
; CHECK-NEXT: [[SRC_VAL:%.*]] = load i256, ptr [[SRC:%.*]], align 16
-; CHECK-NEXT: call fastcc void @callee_i256(i256 [[SRC_VAL]], ptr noalias [[DST:%.*]])
+; CHECK-NEXT: call fastcc void @callee_i256.argprom(i256 [[SRC_VAL]], ptr noalias [[DST:%.*]])
; CHECK-NEXT: ret void
;
entry:
@@ -74,7 +74,7 @@ entry:
}
define internal fastcc void @callee_i256(ptr noalias %src, ptr noalias %dst) #0 {
-; CHECK-LABEL: define internal fastcc void @callee_i256(
+; CHECK-LABEL: define internal fastcc void @callee_i256.argprom(
; CHECK-NEXT: entry:
; CHECK-NEXT: store i256 [[SRC_0_VAL:%.*]], ptr [[DST:%.*]], align 16
; CHECK-NEXT: ret void
@@ -159,7 +159,7 @@ define dso_local void @caller_struct4xi32(ptr noalias %src, ptr noalias %dst) #1
; CHECK-NEXT: [[SRC_VAL:%.*]] = load <4 x i32>, ptr [[SRC:%.*]], align 16
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr i8, ptr [[SRC]], i64 16
; CHECK-NEXT: [[SRC_VAL1:%.*]] = load <4 x i32>, ptr [[TMP0]], align 16
-; CHECK-NEXT: call fastcc void @callee_struct4xi32(<4 x i32> [[SRC_VAL]], <4 x i32> [[SRC_VAL1]], ptr noalias [[DST:%.*]])
+; CHECK-NEXT: call fastcc void @callee_struct4xi32.argprom(<4 x i32> [[SRC_VAL]], <4 x i32> [[SRC_VAL1]], ptr noalias [[DST:%.*]])
; CHECK-NEXT: ret void
;
entry:
@@ -168,7 +168,7 @@ entry:
}
define internal fastcc void @callee_struct4xi32(ptr noalias %src, ptr noalias %dst) #1 {
-; CHECK-LABEL: define internal fastcc void @callee_struct4xi32(
+; CHECK-LABEL: define internal fastcc void @callee_struct4xi32.argprom(
; CHECK-NEXT: entry:
; CHECK-NEXT: store <4 x i32> [[SRC_0_VAL:%.*]], ptr [[DST:%.*]], align 16
; CHECK-NEXT: [[DST2:%.*]] = getelementptr inbounds [[STRUCT_4XI32:%.*]], ptr [[DST]], i64 0, i32 1
diff --git a/llvm/test/CodeGen/AMDGPU/internalize.ll b/llvm/test/CodeGen/AMDGPU/internalize.ll
index 6b2a4d5fc328b4..08b42f93bf5f47 100644
--- a/llvm/test/CodeGen/AMDGPU/internalize.ll
+++ b/llvm/test/CodeGen/AMDGPU/internalize.ll
@@ -10,7 +10,7 @@
; ALL: gvar_used
@gvar_used = addrspace(1) global i32 undef, align 4
-; OPT: define internal fastcc void @func_used_noinline(
+; OPT: define internal fastcc void @func_used_noinline.argelim(
; OPT-NONE: define fastcc void @func_used_noinline(
define fastcc void @func_used_noinline(ptr addrspace(1) %out, i32 %tid) #1 {
entry:
diff --git a/llvm/test/ThinLTO/X86/memprof-aliased-location1.ll b/llvm/test/ThinLTO/X86/memprof-aliased-location1.ll
index 42819d5421ca0f..8be9727b316d28 100644
--- a/llvm/test/ThinLTO/X86/memprof-aliased-location1.ll
+++ b/llvm/test/ThinLTO/X86/memprof-aliased-location1.ll
@@ -84,22 +84,22 @@ attributes #0 = { noinline optnone }
;; The first call to foo does not allocate cold memory. It should call the
;; original functions, which ultimately call the original allocation decorated
;; with a "notcold" attribute.
-; IR: call {{.*}} @_Z3foov()
+; IR: call {{.*}} @_Z3foov.retelim()
;; The second call to foo allocates cold memory. It should call cloned functions
;; which ultimately call a cloned allocation decorated with a "cold" attribute.
-; IR: call {{.*}} @_Z3foov.memprof.1()
-; IR: define internal {{.*}} @_Z3barv()
+; IR: call {{.*}} @_Z3foov.memprof.1.retelim()
+; IR: define internal {{.*}} @_Z3barv.retelim()
; IR: call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
-; IR: define internal {{.*}} @_Z3bazv()
-; IR: call {{.*}} @_Z3barv()
-; IR: define internal {{.*}} @_Z3foov()
-; IR: call {{.*}} @_Z3bazv()
-; IR: define internal {{.*}} @_Z3barv.memprof.1()
+; IR: define internal {{.*}} @_Z3bazv.retelim()
+; IR: call {{.*}} @_Z3barv.retelim()
+; IR: define internal {{.*}} @_Z3foov.retelim()
+; IR: call {{.*}} @_Z3bazv.retelim()
+; IR: define internal {{.*}} @_Z3barv.memprof.1.retelim()
; IR: call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
-; IR: define internal {{.*}} @_Z3bazv.memprof.1()
-; IR: call {{.*}} @_Z3barv.memprof.1()
-; IR: define internal {{.*}} @_Z3foov.memprof.1()
-; IR: call {{.*}} @_Z3bazv.memprof.1()
+; IR: define internal {{.*}} @_Z3bazv.memprof.1.retelim()
+; IR: call {{.*}} @_Z3barv.memprof.1.retelim()
+; IR: define internal {{.*}} @_Z3foov.memprof.1.retelim()
+; IR: call {{.*}} @_Z3bazv.memprof.1.retelim()
; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IR: attributes #[[COLD]] = { "memprof"="cold" }
diff --git a/llvm/test/ThinLTO/X86/memprof-aliased-location2.ll b/llvm/test/ThinLTO/X86/memprof-aliased-location2.ll
index 663f8525043c2f..4c18cf8226c8bb 100644
--- a/llvm/test/ThinLTO/X86/memprof-aliased-location2.ll
+++ b/llvm/test/ThinLTO/X86/memprof-aliased-location2.ll
@@ -84,22 +84,22 @@ attributes #0 = { noinline optnone }
;; The first call to foo does not allocate cold memory. It should call the
;; original functions, which ultimately call the original allocation decorated
;; with a "notcold" attribute.
-; IR: call {{.*}} @_Z3foov()
+; IR: call {{.*}} @_Z3foov.retelim()
;; The second call to foo allocates cold memory. It should call cloned functions
;; which ultimately call a cloned allocation decorated with a "cold" attribute.
-; IR: call {{.*}} @_Z3foov.memprof.1()
-; IR: define internal {{.*}} @_Z3barv()
+; IR: call {{.*}} @_Z3foov.memprof.1.retelim()
+; IR: define internal {{.*}} @_Z3barv.retelim()
; IR: call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
-; IR: define internal {{.*}} @_Z3bazv()
-; IR: call {{.*}} @_Z3barv()
-; IR: define internal {{.*}} @_Z3foov()
-; IR: call {{.*}} @_Z3bazv()
-; IR: define internal {{.*}} @_Z3barv.memprof.1()
+; IR: define internal {{.*}} @_Z3bazv.retelim()
+; IR: call {{.*}} @_Z3barv.retelim()
+; IR: define internal {{.*}} @_Z3foov.retelim()
+; IR: call {{.*}} @_Z3bazv.retelim()
+; IR: define internal {{.*}} @_Z3barv.memprof.1.retelim()
; IR: call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
-; IR: define internal {{.*}} @_Z3bazv.memprof.1()
-; IR: call {{.*}} @_Z3barv.memprof.1()
-; IR: define internal {{.*}} @_Z3foov.memprof.1()
-; IR: call {{.*}} @_Z3bazv.memprof.1()
+; IR: define internal {{.*}} @_Z3bazv.memprof.1.retelim()
+; IR: call {{.*}} @_Z3barv.memprof.1.retelim()
+; IR: define internal {{.*}} @_Z3foov.memprof.1.retelim()
+; IR: call {{.*}} @_Z3bazv.memprof.1.retelim()
; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IR: attributes #[[COLD]] = { "memprof"="cold" }
diff --git a/llvm/test/ThinLTO/X86/memprof-basic.ll b/llvm/test/ThinLTO/X86/memprof-basic.ll
index 6922dbfd368467..b7aadf8e32a771 100644
--- a/llvm/test/ThinLTO/X86/memprof-basic.ll
+++ b/llvm/test/ThinLTO/X86/memprof-basic.ll
@@ -53,7 +53,7 @@
;; We should have cloned bar, baz, and foo, for the cold memory allocation.
; RUN: cat %t.ccg.cloned.dot | FileCheck %s --check-prefix=DOTCLONED
-; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST
;; Try again but with distributed ThinLTO
@@ -303,6 +303,23 @@ attributes #0 = { noinline optnone }
; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IR: attributes #[[COLD]] = { "memprof"="cold" }
+; IRNODIST: define {{.*}} @main
+; IRNODIST: call {{.*}} @_Z3foov.retelim()
+; IRNODIST: call {{.*}} @_Z3foov.memprof.1.retelim()
+; IRNODIST: define internal {{.*}} @_Z3barv.retelim()
+; IRNODIST: call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
+; IRNODIST: define internal {{.*}} @_Z3bazv.retelim()
+; IRNODIST: call {{.*}} @_Z3barv.retelim()
+; IRNODIST: define internal {{.*}} @_Z3foov.retelim()
+; IRNODIST: call {{.*}} @_Z3bazv.retelim()
+; IRNODIST: define internal {{.*}} @_Z3barv.memprof.1.retelim()
+; IRNODIST: call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
+; IRNODIST: define internal {{.*}} @_Z3bazv.memprof.1.retelim()
+; IRNODIST: call {{.*}} @_Z3barv.memprof.1.retelim()
+; IRNODIST: define internal {{.*}} @_Z3foov.memprof.1.retelim()
+; IRNODIST: call {{.*}} @_Z3bazv.memprof.1.retelim()
+; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
+; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }
; STATS: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
; STATS-BE: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
diff --git a/llvm/test/ThinLTO/X86/memprof-duplicate-context-ids.ll b/llvm/test/ThinLTO/X86/memprof-duplicate-context-ids.ll
index 65d794e9cba87c..bfc7b02a956c6f 100644
--- a/llvm/test/ThinLTO/X86/memprof-duplicate-context-ids.ll
+++ b/llvm/test/ThinLTO/X86/memprof-duplicate-context-ids.ll
@@ -68,7 +68,7 @@
; RUN: -o %t.out 2>&1 | FileCheck %s --check-prefix=DUMP \
; RUN: --check-prefix=STATS --check-prefix=STATS-BE --check-prefix=REMARKS
-; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST
;; Try again but with distributed ThinLTO
@@ -247,6 +247,18 @@ attributes #0 = { noinline optnone}
; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IR: attributes #[[COLD]] = { "memprof"="cold" }
+; IRNODIST: define internal {{.*}} @_Z1Dv.retelim()
+; IRNODIST: call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
+; IRNODIST: define internal {{.*}} @_Z1Fv.retelim()
+; IRNODIST: call {{.*}} @_Z1Dv.retelim()
+; IRNODIST: define internal {{.*}} @_Z1Bv.retelim()
+; IRNODIST: call {{.*}} @_Z1Dv.memprof.1.retelim()
+; IRNODIST: define internal {{.*}} @_Z1Ev.retelim()
+; IRNODIST: call {{.*}} @_Z1Dv.memprof.1.retelim()
+; IRNODIST: define internal {{.*}} @_Z1Dv.memprof.1.retelim()
+; IRNODIST: call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
+; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
+; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }
; STATS: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
; STATS-BE: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
diff --git a/llvm/test/ThinLTO/X86/memprof-funcassigncloning.ll b/llvm/test/ThinLTO/X86/memprof-funcassigncloning.ll
index f1a494d077fefc..4153524bf44706 100644
--- a/llvm/test/ThinLTO/X86/memprof-funcassigncloning.ll
+++ b/llvm/test/ThinLTO/X86/memprof-funcassigncloning.ll
@@ -61,7 +61,7 @@
; RUN: -o %t.out 2>&1 | FileCheck %s --check-prefix=DUMP \
; RUN: --check-prefix=STATS --check-prefix=STATS-BE --check-prefix=REMARKS
-; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST
;; Try again but with distributed ThinLTO
@@ -283,6 +283,23 @@ attributes #0 = { noinline optnone }
; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IR: attributes #[[COLD]] = { "memprof"="cold" }
+; IRNODIST: define internal {{.*}} @_Z1EPPcS0_.argelim(
+; IRNODIST: call {{.*}} @_Znam(i64 noundef 10) #[[NOTCOLD:[0-9]+]]
+; IRNODIST: call {{.*}} @_Znam(i64 noundef 10) #[[NOTCOLD]]
+; IRNODIST: define internal {{.*}} @_Z1BPPcS0_(
+; IRNODIST: call {{.*}} @_Z1EPPcS0_.argelim(
+; IRNODIST: define internal {{.*}} @_Z1CPPcS0_(
+; IRNODIST: call {{.*}} @_Z1EPPcS0_.memprof.3.argelim(
+; IRNODIST: define internal {{.*}} @_Z1DPPcS0_(
+; IRNODIST: call {{.*}} @_Z1EPPcS0_.memprof.2.argelim(
+; IRNODIST: define internal {{.*}} @_Z1EPPcS0_.memprof.2.argelim(
+; IRNODIST: call {{.*}} @_Znam(i64 noundef 10) #[[COLD:[0-9]+]]
+; IRNODIST: call {{.*}} @_Znam(i64 noundef 10) #[[NOTCOLD]]
+; IRNODIST: define internal {{.*}} @_Z1EPPcS0_.memprof.3.argelim(
+; IRNODIST: call {{.*}} @_Znam(i64 noundef 10) #[[NOTCOLD]]
+; IRNODIST: call {{.*}} @_Znam(i64 noundef 10) #[[COLD]]
+; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
+; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }
; STATS: 2 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
; STATS-BE: 2 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
diff --git a/llvm/test/ThinLTO/X86/memprof-indirectcall.ll b/llvm/test/ThinLTO/X86/memprof-indirectcall.ll
index 07a52f441ca278..ba8811b46175e3 100644
--- a/llvm/test/ThinLTO/X86/memprof-indirectcall.ll
+++ b/llvm/test/ThinLTO/X86/memprof-indirectcall.ll
@@ -74,7 +74,7 @@
;; from main allocating cold memory.
; RUN: cat %t.ccg.cloned.dot | FileCheck %s --check-prefix=DOTCLONED
-; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST
;; Try again but with distributed ThinLTO
@@ -419,6 +419,19 @@ attributes #0 = { noinline optnone }
; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IR: attributes #[[COLD]] = { "memprof"="cold" }
+; IRNODIST: define {{.*}} @main(
+; IRNODIST: call {{.*}} @_Z3foov.argelim()
+; IRNODIST: call {{.*}} @_Z3foov.memprof.1.argelim()
+; IRNODIST: call {{.*}} @_Z3barP1A.argelim(
+; IRNODIST: call {{.*}} @_Z3barP1A.argelim(
+; IRNODIST: call {{.*}} @_Z3barP1A.argelim(
+; IRNODIST: call {{.*}} @_Z3barP1A.argelim(
+; IRNODIST: define internal {{.*}} @_Z3foov.argelim()
+; IRNODIST: call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
+; IRNODIST: define internal {{.*}} @_Z3foov.memprof.1.argelim()
+; IRNODIST: call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
+; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
+; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }
; STATS: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
; STATS-BE: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
diff --git a/llvm/test/ThinLTO/X86/memprof-inlined.ll b/llvm/test/ThinLTO/X86/memprof-inlined.ll
index 89df345b220423..7111a536a3110a 100644
--- a/llvm/test/ThinLTO/X86/memprof-inlined.ll
+++ b/llvm/test/ThinLTO/X86/memprof-inlined.ll
@@ -63,7 +63,7 @@
;; cold memory.
; RUN: cat %t.ccg.cloned.dot | FileCheck %s --check-prefix=DOTCLONED
-; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST
;; Try again but with distributed ThinLTO
@@ -323,6 +323,19 @@ attributes #0 = { noinline optnone }
; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IR: attributes #[[COLD]] = { "memprof"="cold" }
+; IRNODIST: define internal {{.*}} @_Z3barv.retelim()
+; IRNODIST: call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
+; IRNODIST: define internal {{.*}} @_Z3foov.retelim()
+; IRNODIST: call {{.*}} @_Z3barv.retelim()
+; IRNODIST: define {{.*}} @main()
+; IRNODIST: call {{.*}} @_Z3foov.retelim()
+; IRNODIST: call {{.*}} @_Z3foov.memprof.1.retelim()
+; IRNODIST: define internal {{.*}} @_Z3barv.memprof.1.retelim()
+; IRNODIST: call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
+; IRNODIST: define internal {{.*}} @_Z3foov.memprof.1.retelim()
+; IRNODIST: call {{.*}} @_Z3barv.memprof.1.retelim()
+; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
+; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }
; STATS: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
; STATS-BE: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
diff --git a/llvm/test/Transforms/ArgumentPromotion/2008-02-01-ReturnAttrs.ll b/llvm/test/Transforms/ArgumentPromotion/2008-02-01-ReturnAttrs.ll
index daa4e1fb757d21..51839033177034 100644
--- a/llvm/test/Transforms/ArgumentPromotion/2008-02-01-ReturnAttrs.ll
+++ b/llvm/test/Transforms/ArgumentPromotion/2008-02-01-ReturnAttrs.ll
@@ -3,7 +3,7 @@
; RUN: cat %t | FileChe...
[truncated]
|
@llvm/pr-subscribers-llvm-analysis Author: None (yonghong-song) ChangesThe goal is to add suffix to Argument Promotion and Dead Argument Elimination passes. So users will know that function signature get changed. One of motivation is to help kernel tracing with bpf technology. For details of the description for the patch, see [1]. [1] #105742 for details Patch is 123.85 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/109899.diff 80 Files Affected:
diff --git a/compiler-rt/test/cfi/stats.cpp b/compiler-rt/test/cfi/stats.cpp
index ca6b3bf0df4814..9c4900e86129aa 100644
--- a/compiler-rt/test/cfi/stats.cpp
+++ b/compiler-rt/test/cfi/stats.cpp
@@ -26,12 +26,12 @@ extern "C" __attribute__((noinline)) void nvcall(A *a) {
}
extern "C" __attribute__((noinline)) A *dcast(A *a) {
- // CHECK: stats.cpp:[[@LINE+1]] {{_?}}dcast cfi-derived-cast 24
+ // CHECK: stats.cpp:[[@LINE+1]] {{_?}}dcast.retelim cfi-derived-cast 24
return (A *)(ABase *)a;
}
extern "C" __attribute__((noinline)) A *ucast(A *a) {
- // CHECK: stats.cpp:[[@LINE+1]] {{_?}}ucast cfi-unrelated-cast 81
+ // CHECK: stats.cpp:[[@LINE+1]] {{_?}}ucast.retelim cfi-unrelated-cast 81
return (A *)(char *)a;
}
diff --git a/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp b/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
index 1f9b546ed29996..c8b75dd475ae44 100644
--- a/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
+++ b/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
@@ -215,6 +215,7 @@ doPromotion(Function *F, FunctionAnalysisManager &FAM,
F->getParent()->getFunctionList().insert(F->getIterator(), NF);
NF->takeName(F);
+ NF->setName(NF->getName() + ".argprom");
// Loop over all the callers of the function, transforming the call sites to
// pass in the loaded pointers.
diff --git a/llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp b/llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp
index d1548592b1ce26..b912cc66d19db5 100644
--- a/llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp
+++ b/llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp
@@ -889,6 +889,10 @@ bool DeadArgumentEliminationPass::removeDeadStuffFromFunction(Function *F) {
// it again.
F->getParent()->getFunctionList().insert(F->getIterator(), NF);
NF->takeName(F);
+ if (NumArgumentsEliminated)
+ NF->setName(NF->getName() + ".argelim");
+ else
+ NF->setName(NF->getName() + ".retelim");
NF->IsNewDbgInfoFormat = F->IsNewDbgInfoFormat;
// Loop over all the callers of the function, transforming the call sites to
diff --git a/llvm/test/Analysis/LazyCallGraph/remove-dead-function-spurious-ref-edge.ll b/llvm/test/Analysis/LazyCallGraph/remove-dead-function-spurious-ref-edge.ll
index 2bc486f541c71f..4f16c02b1473ff 100644
--- a/llvm/test/Analysis/LazyCallGraph/remove-dead-function-spurious-ref-edge.ll
+++ b/llvm/test/Analysis/LazyCallGraph/remove-dead-function-spurious-ref-edge.ll
@@ -9,7 +9,7 @@ define internal void @a() alwaysinline {
}
define internal void @b(ptr) noinline {
-; CHECK-LABEL: @b(
+; CHECK-LABEL: @b.argprom(
; CHECK-NEXT: ret void
;
ret void
@@ -17,7 +17,7 @@ define internal void @b(ptr) noinline {
define internal void @c() noinline {
; CHECK-LABEL: @c(
-; CHECK-NEXT: call void @b()
+; CHECK-NEXT: call void @b.argprom()
; CHECK-NEXT: ret void
;
call void @b(ptr @a)
diff --git a/llvm/test/BugPoint/remove_arguments_test.ll b/llvm/test/BugPoint/remove_arguments_test.ll
index 9e9c51eaafc383..bb93e45e4b46ef 100644
--- a/llvm/test/BugPoint/remove_arguments_test.ll
+++ b/llvm/test/BugPoint/remove_arguments_test.ll
@@ -11,7 +11,7 @@
declare i32 @test2()
-; CHECK: define void @test() {
+; CHECK: define void @test.argelim() {
define i32 @test(i32 %A, ptr %B, float %C) {
call i32 @test2()
ret i32 %1
diff --git a/llvm/test/CodeGen/AArch64/arg_promotion.ll b/llvm/test/CodeGen/AArch64/arg_promotion.ll
index cc37d230c6cbe4..724a7f109f1e29 100644
--- a/llvm/test/CodeGen/AArch64/arg_promotion.ll
+++ b/llvm/test/CodeGen/AArch64/arg_promotion.ll
@@ -38,16 +38,16 @@ define dso_local void @caller_4xi32(ptr noalias %src, ptr noalias %dst) #1 {
; CHECK-LABEL: define dso_local void @caller_4xi32(
; CHECK-NEXT: entry:
; CHECK-NEXT: [[SRC_VAL:%.*]] = load <4 x i32>, ptr [[SRC:%.*]], align 16
-; CHECK-NEXT: call fastcc void @callee_4xi32(<4 x i32> [[SRC_VAL]], ptr noalias [[DST:%.*]])
+; CHECK-NEXT: call fastcc void @callee_4xi32.argprom.argprom(<4 x i32> [[SRC_VAL]], ptr noalias [[DST:%.*]])
; CHECK-NEXT: ret void
;
entry:
- call fastcc void @callee_4xi32(ptr noalias %src, ptr noalias %dst)
+ call fastcc void @callee_4xi32.argprom(ptr noalias %src, ptr noalias %dst)
ret void
}
-define internal fastcc void @callee_4xi32(ptr noalias %src, ptr noalias %dst) #1 {
-; CHECK-LABEL: define internal fastcc void @callee_4xi32(
+define internal fastcc void @callee_4xi32.argprom(ptr noalias %src, ptr noalias %dst) #1 {
+; CHECK-LABEL: define internal fastcc void @callee_4xi32.argprom.argprom(
; CHECK-NEXT: entry:
; CHECK-NEXT: store <4 x i32> [[SRC_0_VAL:%.*]], ptr [[DST:%.*]], align 16
; CHECK-NEXT: ret void
@@ -65,7 +65,7 @@ define dso_local void @caller_i256(ptr noalias %src, ptr noalias %dst) #0 {
; CHECK-LABEL: define dso_local void @caller_i256(
; CHECK-NEXT: entry:
; CHECK-NEXT: [[SRC_VAL:%.*]] = load i256, ptr [[SRC:%.*]], align 16
-; CHECK-NEXT: call fastcc void @callee_i256(i256 [[SRC_VAL]], ptr noalias [[DST:%.*]])
+; CHECK-NEXT: call fastcc void @callee_i256.argprom(i256 [[SRC_VAL]], ptr noalias [[DST:%.*]])
; CHECK-NEXT: ret void
;
entry:
@@ -74,7 +74,7 @@ entry:
}
define internal fastcc void @callee_i256(ptr noalias %src, ptr noalias %dst) #0 {
-; CHECK-LABEL: define internal fastcc void @callee_i256(
+; CHECK-LABEL: define internal fastcc void @callee_i256.argprom(
; CHECK-NEXT: entry:
; CHECK-NEXT: store i256 [[SRC_0_VAL:%.*]], ptr [[DST:%.*]], align 16
; CHECK-NEXT: ret void
@@ -159,7 +159,7 @@ define dso_local void @caller_struct4xi32(ptr noalias %src, ptr noalias %dst) #1
; CHECK-NEXT: [[SRC_VAL:%.*]] = load <4 x i32>, ptr [[SRC:%.*]], align 16
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr i8, ptr [[SRC]], i64 16
; CHECK-NEXT: [[SRC_VAL1:%.*]] = load <4 x i32>, ptr [[TMP0]], align 16
-; CHECK-NEXT: call fastcc void @callee_struct4xi32(<4 x i32> [[SRC_VAL]], <4 x i32> [[SRC_VAL1]], ptr noalias [[DST:%.*]])
+; CHECK-NEXT: call fastcc void @callee_struct4xi32.argprom(<4 x i32> [[SRC_VAL]], <4 x i32> [[SRC_VAL1]], ptr noalias [[DST:%.*]])
; CHECK-NEXT: ret void
;
entry:
@@ -168,7 +168,7 @@ entry:
}
define internal fastcc void @callee_struct4xi32(ptr noalias %src, ptr noalias %dst) #1 {
-; CHECK-LABEL: define internal fastcc void @callee_struct4xi32(
+; CHECK-LABEL: define internal fastcc void @callee_struct4xi32.argprom(
; CHECK-NEXT: entry:
; CHECK-NEXT: store <4 x i32> [[SRC_0_VAL:%.*]], ptr [[DST:%.*]], align 16
; CHECK-NEXT: [[DST2:%.*]] = getelementptr inbounds [[STRUCT_4XI32:%.*]], ptr [[DST]], i64 0, i32 1
diff --git a/llvm/test/CodeGen/AMDGPU/internalize.ll b/llvm/test/CodeGen/AMDGPU/internalize.ll
index 6b2a4d5fc328b4..08b42f93bf5f47 100644
--- a/llvm/test/CodeGen/AMDGPU/internalize.ll
+++ b/llvm/test/CodeGen/AMDGPU/internalize.ll
@@ -10,7 +10,7 @@
; ALL: gvar_used
@gvar_used = addrspace(1) global i32 undef, align 4
-; OPT: define internal fastcc void @func_used_noinline(
+; OPT: define internal fastcc void @func_used_noinline.argelim(
; OPT-NONE: define fastcc void @func_used_noinline(
define fastcc void @func_used_noinline(ptr addrspace(1) %out, i32 %tid) #1 {
entry:
diff --git a/llvm/test/ThinLTO/X86/memprof-aliased-location1.ll b/llvm/test/ThinLTO/X86/memprof-aliased-location1.ll
index 42819d5421ca0f..8be9727b316d28 100644
--- a/llvm/test/ThinLTO/X86/memprof-aliased-location1.ll
+++ b/llvm/test/ThinLTO/X86/memprof-aliased-location1.ll
@@ -84,22 +84,22 @@ attributes #0 = { noinline optnone }
;; The first call to foo does not allocate cold memory. It should call the
;; original functions, which ultimately call the original allocation decorated
;; with a "notcold" attribute.
-; IR: call {{.*}} @_Z3foov()
+; IR: call {{.*}} @_Z3foov.retelim()
;; The second call to foo allocates cold memory. It should call cloned functions
;; which ultimately call a cloned allocation decorated with a "cold" attribute.
-; IR: call {{.*}} @_Z3foov.memprof.1()
-; IR: define internal {{.*}} @_Z3barv()
+; IR: call {{.*}} @_Z3foov.memprof.1.retelim()
+; IR: define internal {{.*}} @_Z3barv.retelim()
; IR: call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
-; IR: define internal {{.*}} @_Z3bazv()
-; IR: call {{.*}} @_Z3barv()
-; IR: define internal {{.*}} @_Z3foov()
-; IR: call {{.*}} @_Z3bazv()
-; IR: define internal {{.*}} @_Z3barv.memprof.1()
+; IR: define internal {{.*}} @_Z3bazv.retelim()
+; IR: call {{.*}} @_Z3barv.retelim()
+; IR: define internal {{.*}} @_Z3foov.retelim()
+; IR: call {{.*}} @_Z3bazv.retelim()
+; IR: define internal {{.*}} @_Z3barv.memprof.1.retelim()
; IR: call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
-; IR: define internal {{.*}} @_Z3bazv.memprof.1()
-; IR: call {{.*}} @_Z3barv.memprof.1()
-; IR: define internal {{.*}} @_Z3foov.memprof.1()
-; IR: call {{.*}} @_Z3bazv.memprof.1()
+; IR: define internal {{.*}} @_Z3bazv.memprof.1.retelim()
+; IR: call {{.*}} @_Z3barv.memprof.1.retelim()
+; IR: define internal {{.*}} @_Z3foov.memprof.1.retelim()
+; IR: call {{.*}} @_Z3bazv.memprof.1.retelim()
; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IR: attributes #[[COLD]] = { "memprof"="cold" }
diff --git a/llvm/test/ThinLTO/X86/memprof-aliased-location2.ll b/llvm/test/ThinLTO/X86/memprof-aliased-location2.ll
index 663f8525043c2f..4c18cf8226c8bb 100644
--- a/llvm/test/ThinLTO/X86/memprof-aliased-location2.ll
+++ b/llvm/test/ThinLTO/X86/memprof-aliased-location2.ll
@@ -84,22 +84,22 @@ attributes #0 = { noinline optnone }
;; The first call to foo does not allocate cold memory. It should call the
;; original functions, which ultimately call the original allocation decorated
;; with a "notcold" attribute.
-; IR: call {{.*}} @_Z3foov()
+; IR: call {{.*}} @_Z3foov.retelim()
;; The second call to foo allocates cold memory. It should call cloned functions
;; which ultimately call a cloned allocation decorated with a "cold" attribute.
-; IR: call {{.*}} @_Z3foov.memprof.1()
-; IR: define internal {{.*}} @_Z3barv()
+; IR: call {{.*}} @_Z3foov.memprof.1.retelim()
+; IR: define internal {{.*}} @_Z3barv.retelim()
; IR: call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
-; IR: define internal {{.*}} @_Z3bazv()
-; IR: call {{.*}} @_Z3barv()
-; IR: define internal {{.*}} @_Z3foov()
-; IR: call {{.*}} @_Z3bazv()
-; IR: define internal {{.*}} @_Z3barv.memprof.1()
+; IR: define internal {{.*}} @_Z3bazv.retelim()
+; IR: call {{.*}} @_Z3barv.retelim()
+; IR: define internal {{.*}} @_Z3foov.retelim()
+; IR: call {{.*}} @_Z3bazv.retelim()
+; IR: define internal {{.*}} @_Z3barv.memprof.1.retelim()
; IR: call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
-; IR: define internal {{.*}} @_Z3bazv.memprof.1()
-; IR: call {{.*}} @_Z3barv.memprof.1()
-; IR: define internal {{.*}} @_Z3foov.memprof.1()
-; IR: call {{.*}} @_Z3bazv.memprof.1()
+; IR: define internal {{.*}} @_Z3bazv.memprof.1.retelim()
+; IR: call {{.*}} @_Z3barv.memprof.1.retelim()
+; IR: define internal {{.*}} @_Z3foov.memprof.1.retelim()
+; IR: call {{.*}} @_Z3bazv.memprof.1.retelim()
; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IR: attributes #[[COLD]] = { "memprof"="cold" }
diff --git a/llvm/test/ThinLTO/X86/memprof-basic.ll b/llvm/test/ThinLTO/X86/memprof-basic.ll
index 6922dbfd368467..b7aadf8e32a771 100644
--- a/llvm/test/ThinLTO/X86/memprof-basic.ll
+++ b/llvm/test/ThinLTO/X86/memprof-basic.ll
@@ -53,7 +53,7 @@
;; We should have cloned bar, baz, and foo, for the cold memory allocation.
; RUN: cat %t.ccg.cloned.dot | FileCheck %s --check-prefix=DOTCLONED
-; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST
;; Try again but with distributed ThinLTO
@@ -303,6 +303,23 @@ attributes #0 = { noinline optnone }
; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IR: attributes #[[COLD]] = { "memprof"="cold" }
+; IRNODIST: define {{.*}} @main
+; IRNODIST: call {{.*}} @_Z3foov.retelim()
+; IRNODIST: call {{.*}} @_Z3foov.memprof.1.retelim()
+; IRNODIST: define internal {{.*}} @_Z3barv.retelim()
+; IRNODIST: call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
+; IRNODIST: define internal {{.*}} @_Z3bazv.retelim()
+; IRNODIST: call {{.*}} @_Z3barv.retelim()
+; IRNODIST: define internal {{.*}} @_Z3foov.retelim()
+; IRNODIST: call {{.*}} @_Z3bazv.retelim()
+; IRNODIST: define internal {{.*}} @_Z3barv.memprof.1.retelim()
+; IRNODIST: call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
+; IRNODIST: define internal {{.*}} @_Z3bazv.memprof.1.retelim()
+; IRNODIST: call {{.*}} @_Z3barv.memprof.1.retelim()
+; IRNODIST: define internal {{.*}} @_Z3foov.memprof.1.retelim()
+; IRNODIST: call {{.*}} @_Z3bazv.memprof.1.retelim()
+; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
+; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }
; STATS: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
; STATS-BE: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
diff --git a/llvm/test/ThinLTO/X86/memprof-duplicate-context-ids.ll b/llvm/test/ThinLTO/X86/memprof-duplicate-context-ids.ll
index 65d794e9cba87c..bfc7b02a956c6f 100644
--- a/llvm/test/ThinLTO/X86/memprof-duplicate-context-ids.ll
+++ b/llvm/test/ThinLTO/X86/memprof-duplicate-context-ids.ll
@@ -68,7 +68,7 @@
; RUN: -o %t.out 2>&1 | FileCheck %s --check-prefix=DUMP \
; RUN: --check-prefix=STATS --check-prefix=STATS-BE --check-prefix=REMARKS
-; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST
;; Try again but with distributed ThinLTO
@@ -247,6 +247,18 @@ attributes #0 = { noinline optnone}
; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IR: attributes #[[COLD]] = { "memprof"="cold" }
+; IRNODIST: define internal {{.*}} @_Z1Dv.retelim()
+; IRNODIST: call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
+; IRNODIST: define internal {{.*}} @_Z1Fv.retelim()
+; IRNODIST: call {{.*}} @_Z1Dv.retelim()
+; IRNODIST: define internal {{.*}} @_Z1Bv.retelim()
+; IRNODIST: call {{.*}} @_Z1Dv.memprof.1.retelim()
+; IRNODIST: define internal {{.*}} @_Z1Ev.retelim()
+; IRNODIST: call {{.*}} @_Z1Dv.memprof.1.retelim()
+; IRNODIST: define internal {{.*}} @_Z1Dv.memprof.1.retelim()
+; IRNODIST: call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
+; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
+; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }
; STATS: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
; STATS-BE: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
diff --git a/llvm/test/ThinLTO/X86/memprof-funcassigncloning.ll b/llvm/test/ThinLTO/X86/memprof-funcassigncloning.ll
index f1a494d077fefc..4153524bf44706 100644
--- a/llvm/test/ThinLTO/X86/memprof-funcassigncloning.ll
+++ b/llvm/test/ThinLTO/X86/memprof-funcassigncloning.ll
@@ -61,7 +61,7 @@
; RUN: -o %t.out 2>&1 | FileCheck %s --check-prefix=DUMP \
; RUN: --check-prefix=STATS --check-prefix=STATS-BE --check-prefix=REMARKS
-; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST
;; Try again but with distributed ThinLTO
@@ -283,6 +283,23 @@ attributes #0 = { noinline optnone }
; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IR: attributes #[[COLD]] = { "memprof"="cold" }
+; IRNODIST: define internal {{.*}} @_Z1EPPcS0_.argelim(
+; IRNODIST: call {{.*}} @_Znam(i64 noundef 10) #[[NOTCOLD:[0-9]+]]
+; IRNODIST: call {{.*}} @_Znam(i64 noundef 10) #[[NOTCOLD]]
+; IRNODIST: define internal {{.*}} @_Z1BPPcS0_(
+; IRNODIST: call {{.*}} @_Z1EPPcS0_.argelim(
+; IRNODIST: define internal {{.*}} @_Z1CPPcS0_(
+; IRNODIST: call {{.*}} @_Z1EPPcS0_.memprof.3.argelim(
+; IRNODIST: define internal {{.*}} @_Z1DPPcS0_(
+; IRNODIST: call {{.*}} @_Z1EPPcS0_.memprof.2.argelim(
+; IRNODIST: define internal {{.*}} @_Z1EPPcS0_.memprof.2.argelim(
+; IRNODIST: call {{.*}} @_Znam(i64 noundef 10) #[[COLD:[0-9]+]]
+; IRNODIST: call {{.*}} @_Znam(i64 noundef 10) #[[NOTCOLD]]
+; IRNODIST: define internal {{.*}} @_Z1EPPcS0_.memprof.3.argelim(
+; IRNODIST: call {{.*}} @_Znam(i64 noundef 10) #[[NOTCOLD]]
+; IRNODIST: call {{.*}} @_Znam(i64 noundef 10) #[[COLD]]
+; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
+; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }
; STATS: 2 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
; STATS-BE: 2 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
diff --git a/llvm/test/ThinLTO/X86/memprof-indirectcall.ll b/llvm/test/ThinLTO/X86/memprof-indirectcall.ll
index 07a52f441ca278..ba8811b46175e3 100644
--- a/llvm/test/ThinLTO/X86/memprof-indirectcall.ll
+++ b/llvm/test/ThinLTO/X86/memprof-indirectcall.ll
@@ -74,7 +74,7 @@
;; from main allocating cold memory.
; RUN: cat %t.ccg.cloned.dot | FileCheck %s --check-prefix=DOTCLONED
-; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST
;; Try again but with distributed ThinLTO
@@ -419,6 +419,19 @@ attributes #0 = { noinline optnone }
; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IR: attributes #[[COLD]] = { "memprof"="cold" }
+; IRNODIST: define {{.*}} @main(
+; IRNODIST: call {{.*}} @_Z3foov.argelim()
+; IRNODIST: call {{.*}} @_Z3foov.memprof.1.argelim()
+; IRNODIST: call {{.*}} @_Z3barP1A.argelim(
+; IRNODIST: call {{.*}} @_Z3barP1A.argelim(
+; IRNODIST: call {{.*}} @_Z3barP1A.argelim(
+; IRNODIST: call {{.*}} @_Z3barP1A.argelim(
+; IRNODIST: define internal {{.*}} @_Z3foov.argelim()
+; IRNODIST: call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
+; IRNODIST: define internal {{.*}} @_Z3foov.memprof.1.argelim()
+; IRNODIST: call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
+; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
+; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }
; STATS: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
; STATS-BE: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
diff --git a/llvm/test/ThinLTO/X86/memprof-inlined.ll b/llvm/test/ThinLTO/X86/memprof-inlined.ll
index 89df345b220423..7111a536a3110a 100644
--- a/llvm/test/ThinLTO/X86/memprof-inlined.ll
+++ b/llvm/test/ThinLTO/X86/memprof-inlined.ll
@@ -63,7 +63,7 @@
;; cold memory.
; RUN: cat %t.ccg.cloned.dot | FileCheck %s --check-prefix=DOTCLONED
-; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
+; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST
;; Try again but with distributed ThinLTO
@@ -323,6 +323,19 @@ attributes #0 = { noinline optnone }
; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IR: attributes #[[COLD]] = { "memprof"="cold" }
+; IRNODIST: define internal {{.*}} @_Z3barv.retelim()
+; IRNODIST: call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
+; IRNODIST: define internal {{.*}} @_Z3foov.retelim()
+; IRNODIST: call {{.*}} @_Z3barv.retelim()
+; IRNODIST: define {{.*}} @main()
+; IRNODIST: call {{.*}} @_Z3foov.retelim()
+; IRNODIST: call {{.*}} @_Z3foov.memprof.1.retelim()
+; IRNODIST: define internal {{.*}} @_Z3barv.memprof.1.retelim()
+; IRNODIST: call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
+; IRNODIST: define internal {{.*}} @_Z3foov.memprof.1.retelim()
+; IRNODIST: call {{.*}} @_Z3barv.memprof.1.retelim()
+; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
+; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }
; STATS: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
; STATS-BE: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
diff --git a/llvm/test/Transforms/ArgumentPromotion/2008-02-01-ReturnAttrs.ll b/llvm/test/Transforms/ArgumentPromotion/2008-02-01-ReturnAttrs.ll
index daa4e1fb757d21..51839033177034 100644
--- a/llvm/test/Transforms/ArgumentPromotion/2008-02-01-ReturnAttrs.ll
+++ b/llvm/test/Transforms/ArgumentPromotion/2008-02-01-ReturnAttrs.ll
@@ -3,7 +3,7 @@
; RUN: cat %t | FileChe...
[truncated]
|
This pull request includes 4 commits. The first 3 commits are from the previous reviewed pull request: @efriedma-quic could you take a look? If everything looks good, could you approve it? Thanks! |
You can just rebase it to get rid of them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to cause SamplePGO tooling (both in the compiler and out of it) to need updating. Here's a place in the compiler that needs updating, e.g.:
const char *KnownSuffixes[] = {LLVMSuffix, PartSuffix, UniqSuffix}; |
To avoid affecting profile handling, and avoid a lot of test churn, can you put this under an option (ideally defaulted off)?
@@ -303,6 +303,23 @@ attributes #0 = { noinline optnone } | |||
; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" } | |||
; IR: attributes #[[COLD]] = { "memprof"="cold" } | |||
|
|||
; IRNODIST: define {{.*}} @main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all the memprof tests, probably better to just loosen up the original matching a bit (by removing the ()
and/or adding {{.*}}
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. I can do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks (I still see the test churn but assume you haven't had a chance to update those yet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above 'IRNODIST' thing is already gone in the latest patch. The above code is marked as 'Outdated'.
define internal fastcc void @callee_4xi32(ptr noalias %src, ptr noalias %dst) #1 { | ||
; CHECK-LABEL: define internal fastcc void @callee_4xi32( | ||
define internal fastcc void @callee_4xi32.argprom(ptr noalias %src, ptr noalias %dst) #1 { | ||
; CHECK-LABEL: define internal fastcc void @callee_4xi32.argprom.argprom( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change it to avoid adding cascading suffixes? This gets a little verbose and potentially even harder for e.g. profile tooling that tries to ignore suffixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this for two reasons. First, gcc has cascading suffixes, e.g. when I compiled llvm with gcc, I got the following:
_ZN5clang19RecursiveASTVisitorIN12_GLOBAL__N_119PluralMisuseChecker13MethodCrawlerEE14TraverseIfStmtEPNS_6IfStmtEPN4llvm15SmallVectorImplINS7_14PointerIntPairIPNS_4StmtELj1EbNS7_21PointerLikeTypeTraitsISB_EENS7_18PointerIntPairInfoISB_Lj1ESD_EEEEEE.part.0.constprop.0.isra.0
Second, cascading the suffix can give a hint what signature-changing transformation has done so it would be easier for people to find the changed func signature.
I recommend having a RFC for this. First, names are important in a number of scenarios, currently - @xur-llvm can detail cases where the linux kernel wouldn't build because of name suffixes. Second, I'd like to take a step back and understand alternatives (for which a more detailed description of the scenario, in a RFC, would be a good/necessary idea). For example, and in the absence of more information, I wonder why not leave the names use function level metadata, and save it into a section in the binary? |
Generally users wouldn't need to know that; but they would know from the signature itself? |
Thanks for the pointer. I will take a look. IIUC, besides '.llvm.' suffix, llvm has some other suffixes as well, e.g., '.' for FULL LTO, '.specailized' (in Transforms/IPO/FunctionSpecialization.cpp). Are they handled properly? |
Full LTO already have lots of suffixes, how profiling handle this?
A lot of discussion already in #105742. Ultimately, what we want is the precise func signature for every func. What you proposed is okay, save func -> signature in a section of the binary. I am wondering how this can be done. |
In kernel tracing, if func name is not changed, the func signature will be assumed to be based on source code. If compiler silently changes signature, then kernel tracing could get incorrect result. So we either need to change func name to indicate func signature have changed or we need additional information in the binary which will tell signature has changed and better what is the new signature. |
The signature won't change if it's externally visible. |
@arsenm Indeed, you are right. We only talk about static functions whose signatures may change. |
I tried an example with bpftool (https://github.com/torvalds/linux/tree/master/tools/bpf/bpftool). I build the libbpf/bpftool with additional flags -gline-tables-only -fdebug-info-for-profiling -funique-internal-linkage-names. I also intentionally modified one of static function 'btf_new' so 'btf_new' function eventually will have .argelim suffix. I then used the following command to generate the training data.
I checked sample.perfscript.txt, the 'btf_new' symbol indeed in the training data:
Another example
I did some code inspection and find that llvm-profgen uses symbol table to find the code and do disassemble. But eventual write to the training data is based on dwarf (address range -> func name in dwarf). In dwarf, the func name is
You can see the above linkage name which is the one in the training data. So I think the .argelim suffix should not impact sampling based profile. The same for .argprom suffix:
|
…rgumentElimination ArgumentPromotion and DeadArgumentElimination passes could change function signatures but the function name remains the same as before the transformation. This makes it hard for tracing with bpf programs where user tends to use function signature in the source. See discussion [1] for details. This patch added suffix to functions whose signatures are changed. The suffix lets users know that function signature has changed and they need to impact the IR or binary to find modified signature before tracing those functions. The suffix for ArgumentPromotion is ".argprom" and the suffix for DeadArgumentElimination is ".argelim". The suffix also gives user hints about what kind of transformation has been done. With this patch, I built a recent linux kernel with full LTO enabled. I got 4 functions with only argpromotion like set_track_update.argelim.argprom pmd_trans_huge_lock.argprom ... I got 1058 functions with only deadargelim like process_bit0.argelim pci_io_ecs_init.argelim ... I got 3 functions with both argpromotion and deadargelim set_track_update.argelim.argprom zero_pud_populate.argelim.argprom zero_pmd_populate.argelim.argprom There are some concerns about func suffix may impact sample based profiling. I did some experiments and show that this is not the case. The sample profiling gets func name from dwarf and those func names in dwarf does not have suffixes added by this patch and sample profiling works fine with this patch. [1] llvm#104678
e4082c3
to
c4a3ffd
Compare
@teresajohnson @arsenm I tried sampling based profiling with one of bpf applications and it looks like the added suffixes are not affecting sampling based profiling. See the details in the above. I also marked the patch as RFC as you suggested. For func suffixes (or more than one suffixes), gcc already has precedences. The below are some examples when build clang with gcc:
And there are even some cases having three suffixes:
So if new suffixes in clang won't affect functionality, then it should be okay for clang as well to allow multiple suffixes. Please let me know what you think. |
I am currently OOO so added a couple reviewers familiar with SamplePGO and other profile matching (e.g. memprof) that might be affected. |
Regarding that analysis, I just want to clarify: you are showing that the profiled binary has the new suffixes in its symbol table, but that the dwarf data for the same binary does not have the new suffixes, and that llvm-profgen will construct the profile from the dwarf so not contain the suffixes? I am not very familiar with llvm-profgen so defer to @huangjd. It would be good to confirm with a round trip through the feedback path that things work as expected.
Except we don't tend to feed back profiles collected from gcc built binaries to clang for SamplePGO, etc, so we need to ensure it will still work in clang. |
That's a good question, we don't tend to use Full LTO so I don't know in practice |
Sound good to me. I will double check memprof as well. |
There are downstream (internal) usages which rely on function names as they exist in the symbol table. Memprof relies on the dwarf linkage name so it's usage is similar to llvm-profgen. In the past, suffixes after a period in the symbol were meant to be interpreted as clones of the original function, however it is not well defined. Personally, I'm not in favour of overloading the existing usage by appending suffixes to indicate the optimizations performed. This seems to be a hacky approach and instead we should adopt a more formal mechanism to communicate such information to external tools. As @mtrofin suggested it would be a good to start an RFC discussion on discourse to gather the different usages from a broader set of users than those on this patch before proceeding. Since not all suffixes are the same, key considerations for the discourse RFC could be -
Some of these were discussed piecemeal in #105742 however a broader discussion could be beneficial. What do you think? |
What seems to be missing in those discussions, and I'd hope to see more spelled out in a RFC, is the user scenario: why does the function name not reflecting argument changes matter, what user scenario breaks? |
Thanks @snehasish I will initiate a RFC discussion in discourse then. |
@mtrofin This has been mentioned in the original pull request (#105742). The main reason is to make it easy for bpf tracing. For a func without name change, the kernel tracing community will assume its signature to be the same as defined in the source code (or dwarf). In gcc, this is honored since if func signature changed there will be a suffix and then tracing won't work. Currently user will need additional effort to try alternative solution for tracing. The following are some BPF tracing examples:
Those arguments are used by BPF so user can write tracing bpf prog by using argument names. This is much more user friendly then using registers. For clang however, since func name remains the same users will assume the same function signature. However if function signature actually changed, the above tracing with (declared func signatures) won't work any more and user then may get incorrect result. User may eventually find why it won't work and try more heavy-weight or less efficient alternatives. But if we have indication to user that func signature has been changed then user will not spend time to even try the default approach (assuming signature the same). |
Wouldn't it be more desirable to disable argument promotion on functions you may want to enable tracing? Alternatively, if you had the actual signature of the function (post-argpromotion) available, would that enable a user write a correct probe? |
We cannot disable argument promotion when building the kernel. Typically the kernel is built by some distro (ubuntu, fedora, etc.) or by a kernel team in google, meta, etc. for their inside-company use case. Once a kernel is built, it will be deployed to thousands or millions of machines. From user perspective, rebuilding the kernel by themself is not an option since the actual tracing happens in production system and it is typically not allowed for you to install your own kernel to production machine as there could be many services running on the machine and you do not want to disrupt those services. In some cases, people need to collect tracing data in many machines (e.g. thousand's or more machines) to collect some metrics. If we indeed have actual signatures if the func signature got changed, that will be even better. We need to record this info in the binary so people can use that. Dwarf somehow had some information, but not always. |
I have created a discourse discussion. The link is @mtrofin @arsenm @teresajohnson @snehasish It would be great if you can check the above topic and provide some suggestions. Hopefully other llvm people can help as well. Thanks! |
The goal is to add additional tag/attribute to dwarf so users will know that function signatures get changed. See [1] for motivation. Otherwise, users may assume function signature remaining the same as its source, and bpf tracing may get wrong results. With explicit tag/attribute in dwarf to indicate a func signature change, for bpf tracing, users will either go into the asm code to find the exact signature or go to find another non-signature-change function for tracing, instead of debugging the otherwise rong results. Earlier I have a pull request [2] attempts to add suffix to indicate signature change as gcc did this already. But later upstream suggested to use dwarf to encode such suffix change info ([1]). This patch introduced a new tag LLVM_func_args_changed and a new attr LLVM_func_retval_removed. In DeadArgumentElimination pass, if a function return value is removed, LLVM_func_retval_removed attr will be added to that func in the dwarf. In DeadArgumentElimination and ArgumentPromotion passes, if the function signature is changed, LLVM_func_args_changed tag is added to dwarf. Here, LLVM_func_args_changed tag is used so later on, we could add more debug info about what changes. Regarding to potential more info under LLVM_func_args_changed, we might need the following info. 1. Trying to have a new set of formal argument types. The existing types should be available in related Transforms passes, but will need DIBuilder to build DIType's and looks like there is not easy DIBuilder API to do this. 2. Trying to relate old func signature (from source) to new func signature. For example, original arg index 2 becomes new arg index 1, etc. More complexity will come from argument promotion and struct arguments where struct argument has size greater than an arch register size. [1] https://discourse.llvm.org/t/rfc-identify-func-signature-change-in-llvm-compiled-kernel-image/82609 [2] llvm#109899
The goal is to add additional tag/attribute to dwarf so users will know that function signatures get changed. See [1] for motivation. Otherwise, users may assume function signature remaining the same as its source, and bpf tracing may get wrong results. With explicit tag/attribute in dwarf to indicate a func signature change, for bpf tracing, users will either go into the asm code to find the exact signature or go to find another non-signature-change function for tracing, instead of debugging the otherwise rong results. Earlier I have a pull request [2] attempts to add suffix to indicate signature change as gcc did this already. But later upstream suggested to use dwarf to encode such suffix change info ([1]). This patch introduced a new tag LLVM_func_args_changed and a new attr LLVM_func_retval_removed. In DeadArgumentElimination pass, if a function return value is removed, LLVM_func_retval_removed attr will be added to that func in the dwarf. In DeadArgumentElimination and ArgumentPromotion passes, if the function signature is changed, LLVM_func_args_changed tag is added to dwarf. Here, LLVM_func_args_changed tag is used so later on, we could add more debug info about what changes. Regarding to potential more info under LLVM_func_args_changed, we might need the following info. 1. Trying to have a new set of formal argument types. The existing types should be available in related Transforms passes, but will need DIBuilder to build DIType's and looks like there is not easy DIBuilder API to do this. 2. Trying to relate old func signature (from source) to new func signature. For example, original arg index 2 becomes new arg index 1, etc. More complexity will come from argument promotion and struct arguments where struct argument has size greater than an arch register size. [1] https://discourse.llvm.org/t/rfc-identify-func-signature-change-in-llvm-compiled-kernel-image/82609 [2] llvm#109899
The goal is to add additional tag/attribute to dwarf so users will know that function signatures get changed. See [1] for motivation. Otherwise, users may assume function signature remaining the same as its source, and bpf tracing may get wrong results. With explicit tag/attribute in dwarf to indicate a func signature change, for bpf tracing, users will either go into the asm code to find the exact signature or go to find another non-signature-change function for tracing, instead of debugging the otherwise rong results. Earlier I have a pull request [2] attempts to add suffix to indicate signature change as gcc did this already. But later upstream suggested to use dwarf to encode such suffix change info ([1]). This patch introduced a new tag LLVM_func_args_changed and a new attr LLVM_func_retval_removed. In DeadArgumentElimination pass, if a function return value is removed, LLVM_func_retval_removed attr will be added to that func in the dwarf. In DeadArgumentElimination and ArgumentPromotion passes, if the function signature is changed, LLVM_func_args_changed tag is added to dwarf. Here, LLVM_func_args_changed tag is used so later on, we could add more debug info about what changes. Regarding to potential more info under LLVM_func_args_changed, we might need the following info. 1. Trying to have a new set of formal argument types. The existing types should be available in related Transforms passes, but will need DIBuilder to build DIType's and looks like there is not easy DIBuilder API to do this. 2. Trying to relate old func signature (from source) to new func signature. For example, original arg index 2 becomes new arg index 1, etc. More complexity will come from argument promotion and struct arguments where struct argument has size greater than an arch register size. [1] https://discourse.llvm.org/t/rfc-identify-func-signature-change-in-llvm-compiled-kernel-image/82609 [2] llvm#109899
The goal is to add additional tag/attribute to dwarf so users will know that function signatures get changed. See [1] for motivation. Otherwise, users may assume function signature remaining the same as its source, and bpf tracing may get wrong results. With explicit tag/attribute in dwarf to indicate a func signature change, for bpf tracing, users will either go into the asm code to find the exact signature or go to find another non-signature-change function for tracing, instead of debugging the otherwise rong results. Earlier I have a pull request [2] attempts to add suffix to indicate signature change as gcc did this already. But later upstream suggested to use dwarf to encode such suffix change info ([1]). This patch introduced a new tag LLVM_func_args_changed and a new attr LLVM_func_retval_removed. In DeadArgumentElimination pass, if a function return value is removed, LLVM_func_retval_removed attr will be added to that func in the dwarf. In DeadArgumentElimination and ArgumentPromotion passes, if the function signature is changed, LLVM_func_args_changed tag is added to dwarf. Here, LLVM_func_args_changed tag is used so later on, we could add more debug info about what changes. Regarding to potential more info under LLVM_func_args_changed, we might need the following info. 1. Trying to have a new set of formal argument types. The existing types should be available in related Transforms passes, but will need DIBuilder to build DIType's and looks like there is not easy DIBuilder API to do this. 2. Trying to relate old func signature (from source) to new func signature. For example, original arg index 2 becomes new arg index 1, etc. More complexity will come from argument promotion and struct arguments where struct argument has size greater than an arch register size. [1] https://discourse.llvm.org/t/rfc-identify-func-signature-change-in-llvm-compiled-kernel-image/82609 [2] llvm#109899
The goal is to add suffix to Argument Promotion and Dead Argument Elimination passes. So users will know that function signature get changed. One of motivation is to help kernel tracing with bpf technology.
Previous patch is [1] and it is reverted due to some test failures. This patch fixed a test failure on top of [1].
There are some concerns about func suffix may impact sample based profiling. I did some experiments and show that this is not the case. The sample profiling gets func name from dwarf and those func names in dwarf does not have suffixes added by this patch and sample profiling works fine with this patch.
For details of the description for the patch, see [1].
[1] #105742 for details