-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[InstrPGO] Support cold function coverage instrumentation #109837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
36a747b
to
436ab3b
Compare
436ab3b
to
07a2cab
Compare
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-pgo Author: Lei Wang (wlei-llvm) ChangesThis patch adds support for cold function coverage instrumentation based on sampling PGO counts. The major motivation is to detect dead functions for the services that are optimized with sampling PGO. If a function is covered by sampling profile count (e.g., those with an entry count > 0), we choose to skip instrumenting those functions, which significantly reduces the instrumentation overhead. More details about the implementation and flags:
Overall, the full command is like:
Full diff: https://github.com/llvm/llvm-project/pull/109837.diff 8 Files Affected:
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 002f60350543d9..6bb92427f2d53f 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -1784,6 +1784,12 @@ defm debug_info_for_profiling : BoolFOption<"debug-info-for-profiling",
PosFlag<SetTrue, [], [ClangOption, CC1Option],
"Emit extra debug info to make sample profile more accurate">,
NegFlag<SetFalse>>;
+def fprofile_generate_cold_function_coverage : Flag<["-"], "fprofile-generate-cold-function-coverage">,
+ Group<f_Group>, Visibility<[ClangOption, CLOption]>,
+ HelpText<"Generate instrumented code to cold functions into default.profraw file (overridden by '=' form of option or LLVM_PROFILE_FILE env var)">;
+def fprofile_generate_cold_function_coverage_EQ : Joined<["-"], "fprofile-generate-cold-function-coverage=">,
+ Group<f_Group>, Visibility<[ClangOption, CLOption]>, MetaVarName<"<directory>">,
+ HelpText<"Generate instrumented code to cold functions into <directory>/default.profraw (overridden by LLVM_PROFILE_FILE env var)">;
def fprofile_instr_generate : Flag<["-"], "fprofile-instr-generate">,
Group<f_Group>, Visibility<[ClangOption, CLOption]>,
HelpText<"Generate instrumented code to collect execution counts into default.profraw file (overridden by '=' form of option or LLVM_PROFILE_FILE env var)">;
diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp
index 16f9b629fc538c..e56db150c6b47e 100644
--- a/clang/lib/Driver/ToolChain.cpp
+++ b/clang/lib/Driver/ToolChain.cpp
@@ -889,7 +889,9 @@ bool ToolChain::needsProfileRT(const ArgList &Args) {
Args.hasArg(options::OPT_fprofile_instr_generate) ||
Args.hasArg(options::OPT_fprofile_instr_generate_EQ) ||
Args.hasArg(options::OPT_fcreate_profile) ||
- Args.hasArg(options::OPT_forder_file_instrumentation);
+ Args.hasArg(options::OPT_forder_file_instrumentation) ||
+ Args.hasArg(options::OPT_fprofile_generate_cold_function_coverage) ||
+ Args.hasArg(options::OPT_fprofile_generate_cold_function_coverage_EQ);
}
bool ToolChain::needsGCovInstrumentation(const llvm::opt::ArgList &Args) {
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 0bab48caf1a5e2..00602a08232ba2 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -649,6 +649,24 @@ static void addPGOAndCoverageFlags(const ToolChain &TC, Compilation &C,
}
}
+ if (auto *ColdFuncCoverageArg = Args.getLastArg(
+ options::OPT_fprofile_generate_cold_function_coverage,
+ options::OPT_fprofile_generate_cold_function_coverage_EQ)) {
+ SmallString<128> Path(
+ ColdFuncCoverageArg->getOption().matches(
+ options::OPT_fprofile_generate_cold_function_coverage_EQ)
+ ? ColdFuncCoverageArg->getValue()
+ : "");
+ llvm::sys::path::append(Path, "default_%m.profraw");
+ CmdArgs.push_back("-mllvm");
+ CmdArgs.push_back(Args.MakeArgString(
+ Twine("--instrument-sample-cold-function-path=") + Path));
+ CmdArgs.push_back("-mllvm");
+ CmdArgs.push_back("--instrument-cold-function-coverage");
+ CmdArgs.push_back("-mllvm");
+ CmdArgs.push_back("--pgo-function-entry-coverage");
+ }
+
Arg *PGOGenArg = nullptr;
if (PGOGenerateArg) {
assert(!CSPGOGenerateArg);
diff --git a/clang/test/CodeGen/Inputs/pgo-cold-func.prof b/clang/test/CodeGen/Inputs/pgo-cold-func.prof
new file mode 100644
index 00000000000000..e50be02e0a8545
--- /dev/null
+++ b/clang/test/CodeGen/Inputs/pgo-cold-func.prof
@@ -0,0 +1,2 @@
+foo:1:1
+ 1: 1
diff --git a/clang/test/CodeGen/pgo-cold-function-coverage.c b/clang/test/CodeGen/pgo-cold-function-coverage.c
new file mode 100644
index 00000000000000..0d2767c022b9f1
--- /dev/null
+++ b/clang/test/CodeGen/pgo-cold-function-coverage.c
@@ -0,0 +1,12 @@
+// Test -fprofile-generate-cold-function-coverage
+// RUN: %clang -O2 -fprofile-generate-cold-function-coverage=/xxx/yyy/ -fprofile-sample-accurate -fprofile-sample-use=%S/Inputs/pgo-cold-func.prof -S -emit-llvm -o - %s | FileCheck %s
+
+// CHECK: @__llvm_profile_filename = {{.*}} c"/xxx/yyy/default_%m.profraw\00"
+
+// CHECK: @__profc_bar
+// CHECK-NOT: @__profc_foo
+
+int bar(int x) { return x;}
+int foo(int x) {
+ return x;
+}
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 8f151a99b11709..f75abe0c7c7649 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -296,7 +296,13 @@ static cl::opt<bool> UseLoopVersioningLICM(
"enable-loop-versioning-licm", cl::init(false), cl::Hidden,
cl::desc("Enable the experimental Loop Versioning LICM pass"));
+static cl::opt<std::string> InstrumentSampleColdFuncPath(
+ "instrument-sample-cold-function-path", cl::init(""),
+ cl::desc("File path for instrumenting sampling PGO guided cold functions"),
+ cl::Hidden);
+
extern cl::opt<std::string> UseCtxProfile;
+extern cl::opt<bool> InstrumentColdFunctionCoverage;
namespace llvm {
extern cl::opt<bool> EnableMemProfContextDisambiguation;
@@ -1119,6 +1125,18 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
// removed.
MPM.addPass(
PGOIndirectCallPromotion(true /* IsInLTO */, true /* SamplePGO */));
+
+ if (InstrumentSampleColdFuncPath.getNumOccurrences() &&
+ Phase != ThinOrFullLTOPhase::ThinLTOPostLink) {
+ assert(!InstrumentSampleColdFuncPath.empty() &&
+ "File path is requeired for instrumentation generation");
+ InstrumentColdFunctionCoverage = true;
+ addPreInlinerPasses(MPM, Level, Phase);
+ addPGOInstrPasses(MPM, Level, /* RunProfileGen */ true,
+ /* IsCS */ false, /* AtomicCounterUpdate */ false,
+ InstrumentSampleColdFuncPath, "",
+ IntrusiveRefCntPtr<vfs::FileSystem>());
+ }
}
// Try to perform OpenMP specific optimizations on the module. This is a
diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
index 10442fa0bb9003..890ad853d86461 100644
--- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
+++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
@@ -319,6 +319,18 @@ static cl::opt<unsigned> PGOFunctionCriticalEdgeThreshold(
cl::desc("Do not instrument functions with the number of critical edges "
" greater than this threshold."));
+cl::opt<bool> InstrumentColdFunctionCoverage(
+ "instrument-cold-function-coverage", cl::init(false), cl::Hidden,
+ cl::desc("Enable cold function coverage instrumentation (currently only "
+ "used under sampling "
+ " PGO pipeline))"));
+
+static cl::opt<uint64_t> ColdFuncCoverageMaxEntryCount(
+ "cold-function-coverage-max-entry-count", cl::init(0), cl::Hidden,
+ cl::desc("When enabling cold function coverage instrumentation, skip "
+ "instrumenting the "
+ "function whose entry count is above the given value"));
+
extern cl::opt<unsigned> MaxNumVTableAnnotations;
namespace llvm {
@@ -1891,6 +1903,10 @@ static bool skipPGOGen(const Function &F) {
return true;
if (F.getInstructionCount() < PGOFunctionSizeThreshold)
return true;
+ if (InstrumentColdFunctionCoverage &&
+ (!F.getEntryCount() ||
+ F.getEntryCount()->getCount() > ColdFuncCoverageMaxEntryCount))
+ return true;
return false;
}
diff --git a/llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll b/llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll
new file mode 100644
index 00000000000000..ba1a11f7184356
--- /dev/null
+++ b/llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll
@@ -0,0 +1,24 @@
+; RUN: opt < %s --passes=pgo-instr-gen -instrument-cold-function-coverage -S | FileCheck --check-prefixes=COLD %s
+; RUN: opt < %s --passes=pgo-instr-gen -instrument-cold-function-coverage -cold-function-coverage-max-entry-count=1 -S | FileCheck --check-prefixes=ENTRY-COUNT %s
+
+; COLD: call void @llvm.instrprof.increment(ptr @__profn_foo, i64 [[#]], i32 1, i32 0)
+; COLD-NOT: __profn_main
+
+; ENTRY-COUNT: call void @llvm.instrprof.increment(ptr @__profn_foo, i64 [[#]], i32 1, i32 0)
+; ENTRY-COUNT: call void @llvm.instrprof.increment(ptr @__profn_main, i64 [[#]], i32 1, i32 0)
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define void @foo() !prof !0 {
+entry:
+ ret void
+}
+
+define i32 @main() !prof !1 {
+entry:
+ ret i32 0
+}
+
+!0 = !{!"function_entry_count", i64 0}
+!1 = !{!"function_entry_count", i64 1}
|
@llvm/pr-subscribers-clang Author: Lei Wang (wlei-llvm) ChangesThis patch adds support for cold function coverage instrumentation based on sampling PGO counts. The major motivation is to detect dead functions for the services that are optimized with sampling PGO. If a function is covered by sampling profile count (e.g., those with an entry count > 0), we choose to skip instrumenting those functions, which significantly reduces the instrumentation overhead. More details about the implementation and flags:
Overall, the full command is like:
Full diff: https://github.com/llvm/llvm-project/pull/109837.diff 8 Files Affected:
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 002f60350543d9..6bb92427f2d53f 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -1784,6 +1784,12 @@ defm debug_info_for_profiling : BoolFOption<"debug-info-for-profiling",
PosFlag<SetTrue, [], [ClangOption, CC1Option],
"Emit extra debug info to make sample profile more accurate">,
NegFlag<SetFalse>>;
+def fprofile_generate_cold_function_coverage : Flag<["-"], "fprofile-generate-cold-function-coverage">,
+ Group<f_Group>, Visibility<[ClangOption, CLOption]>,
+ HelpText<"Generate instrumented code to cold functions into default.profraw file (overridden by '=' form of option or LLVM_PROFILE_FILE env var)">;
+def fprofile_generate_cold_function_coverage_EQ : Joined<["-"], "fprofile-generate-cold-function-coverage=">,
+ Group<f_Group>, Visibility<[ClangOption, CLOption]>, MetaVarName<"<directory>">,
+ HelpText<"Generate instrumented code to cold functions into <directory>/default.profraw (overridden by LLVM_PROFILE_FILE env var)">;
def fprofile_instr_generate : Flag<["-"], "fprofile-instr-generate">,
Group<f_Group>, Visibility<[ClangOption, CLOption]>,
HelpText<"Generate instrumented code to collect execution counts into default.profraw file (overridden by '=' form of option or LLVM_PROFILE_FILE env var)">;
diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp
index 16f9b629fc538c..e56db150c6b47e 100644
--- a/clang/lib/Driver/ToolChain.cpp
+++ b/clang/lib/Driver/ToolChain.cpp
@@ -889,7 +889,9 @@ bool ToolChain::needsProfileRT(const ArgList &Args) {
Args.hasArg(options::OPT_fprofile_instr_generate) ||
Args.hasArg(options::OPT_fprofile_instr_generate_EQ) ||
Args.hasArg(options::OPT_fcreate_profile) ||
- Args.hasArg(options::OPT_forder_file_instrumentation);
+ Args.hasArg(options::OPT_forder_file_instrumentation) ||
+ Args.hasArg(options::OPT_fprofile_generate_cold_function_coverage) ||
+ Args.hasArg(options::OPT_fprofile_generate_cold_function_coverage_EQ);
}
bool ToolChain::needsGCovInstrumentation(const llvm::opt::ArgList &Args) {
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 0bab48caf1a5e2..00602a08232ba2 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -649,6 +649,24 @@ static void addPGOAndCoverageFlags(const ToolChain &TC, Compilation &C,
}
}
+ if (auto *ColdFuncCoverageArg = Args.getLastArg(
+ options::OPT_fprofile_generate_cold_function_coverage,
+ options::OPT_fprofile_generate_cold_function_coverage_EQ)) {
+ SmallString<128> Path(
+ ColdFuncCoverageArg->getOption().matches(
+ options::OPT_fprofile_generate_cold_function_coverage_EQ)
+ ? ColdFuncCoverageArg->getValue()
+ : "");
+ llvm::sys::path::append(Path, "default_%m.profraw");
+ CmdArgs.push_back("-mllvm");
+ CmdArgs.push_back(Args.MakeArgString(
+ Twine("--instrument-sample-cold-function-path=") + Path));
+ CmdArgs.push_back("-mllvm");
+ CmdArgs.push_back("--instrument-cold-function-coverage");
+ CmdArgs.push_back("-mllvm");
+ CmdArgs.push_back("--pgo-function-entry-coverage");
+ }
+
Arg *PGOGenArg = nullptr;
if (PGOGenerateArg) {
assert(!CSPGOGenerateArg);
diff --git a/clang/test/CodeGen/Inputs/pgo-cold-func.prof b/clang/test/CodeGen/Inputs/pgo-cold-func.prof
new file mode 100644
index 00000000000000..e50be02e0a8545
--- /dev/null
+++ b/clang/test/CodeGen/Inputs/pgo-cold-func.prof
@@ -0,0 +1,2 @@
+foo:1:1
+ 1: 1
diff --git a/clang/test/CodeGen/pgo-cold-function-coverage.c b/clang/test/CodeGen/pgo-cold-function-coverage.c
new file mode 100644
index 00000000000000..0d2767c022b9f1
--- /dev/null
+++ b/clang/test/CodeGen/pgo-cold-function-coverage.c
@@ -0,0 +1,12 @@
+// Test -fprofile-generate-cold-function-coverage
+// RUN: %clang -O2 -fprofile-generate-cold-function-coverage=/xxx/yyy/ -fprofile-sample-accurate -fprofile-sample-use=%S/Inputs/pgo-cold-func.prof -S -emit-llvm -o - %s | FileCheck %s
+
+// CHECK: @__llvm_profile_filename = {{.*}} c"/xxx/yyy/default_%m.profraw\00"
+
+// CHECK: @__profc_bar
+// CHECK-NOT: @__profc_foo
+
+int bar(int x) { return x;}
+int foo(int x) {
+ return x;
+}
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 8f151a99b11709..f75abe0c7c7649 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -296,7 +296,13 @@ static cl::opt<bool> UseLoopVersioningLICM(
"enable-loop-versioning-licm", cl::init(false), cl::Hidden,
cl::desc("Enable the experimental Loop Versioning LICM pass"));
+static cl::opt<std::string> InstrumentSampleColdFuncPath(
+ "instrument-sample-cold-function-path", cl::init(""),
+ cl::desc("File path for instrumenting sampling PGO guided cold functions"),
+ cl::Hidden);
+
extern cl::opt<std::string> UseCtxProfile;
+extern cl::opt<bool> InstrumentColdFunctionCoverage;
namespace llvm {
extern cl::opt<bool> EnableMemProfContextDisambiguation;
@@ -1119,6 +1125,18 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
// removed.
MPM.addPass(
PGOIndirectCallPromotion(true /* IsInLTO */, true /* SamplePGO */));
+
+ if (InstrumentSampleColdFuncPath.getNumOccurrences() &&
+ Phase != ThinOrFullLTOPhase::ThinLTOPostLink) {
+ assert(!InstrumentSampleColdFuncPath.empty() &&
+ "File path is requeired for instrumentation generation");
+ InstrumentColdFunctionCoverage = true;
+ addPreInlinerPasses(MPM, Level, Phase);
+ addPGOInstrPasses(MPM, Level, /* RunProfileGen */ true,
+ /* IsCS */ false, /* AtomicCounterUpdate */ false,
+ InstrumentSampleColdFuncPath, "",
+ IntrusiveRefCntPtr<vfs::FileSystem>());
+ }
}
// Try to perform OpenMP specific optimizations on the module. This is a
diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
index 10442fa0bb9003..890ad853d86461 100644
--- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
+++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
@@ -319,6 +319,18 @@ static cl::opt<unsigned> PGOFunctionCriticalEdgeThreshold(
cl::desc("Do not instrument functions with the number of critical edges "
" greater than this threshold."));
+cl::opt<bool> InstrumentColdFunctionCoverage(
+ "instrument-cold-function-coverage", cl::init(false), cl::Hidden,
+ cl::desc("Enable cold function coverage instrumentation (currently only "
+ "used under sampling "
+ " PGO pipeline))"));
+
+static cl::opt<uint64_t> ColdFuncCoverageMaxEntryCount(
+ "cold-function-coverage-max-entry-count", cl::init(0), cl::Hidden,
+ cl::desc("When enabling cold function coverage instrumentation, skip "
+ "instrumenting the "
+ "function whose entry count is above the given value"));
+
extern cl::opt<unsigned> MaxNumVTableAnnotations;
namespace llvm {
@@ -1891,6 +1903,10 @@ static bool skipPGOGen(const Function &F) {
return true;
if (F.getInstructionCount() < PGOFunctionSizeThreshold)
return true;
+ if (InstrumentColdFunctionCoverage &&
+ (!F.getEntryCount() ||
+ F.getEntryCount()->getCount() > ColdFuncCoverageMaxEntryCount))
+ return true;
return false;
}
diff --git a/llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll b/llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll
new file mode 100644
index 00000000000000..ba1a11f7184356
--- /dev/null
+++ b/llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll
@@ -0,0 +1,24 @@
+; RUN: opt < %s --passes=pgo-instr-gen -instrument-cold-function-coverage -S | FileCheck --check-prefixes=COLD %s
+; RUN: opt < %s --passes=pgo-instr-gen -instrument-cold-function-coverage -cold-function-coverage-max-entry-count=1 -S | FileCheck --check-prefixes=ENTRY-COUNT %s
+
+; COLD: call void @llvm.instrprof.increment(ptr @__profn_foo, i64 [[#]], i32 1, i32 0)
+; COLD-NOT: __profn_main
+
+; ENTRY-COUNT: call void @llvm.instrprof.increment(ptr @__profn_foo, i64 [[#]], i32 1, i32 0)
+; ENTRY-COUNT: call void @llvm.instrprof.increment(ptr @__profn_main, i64 [[#]], i32 1, i32 0)
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define void @foo() !prof !0 {
+entry:
+ ret void
+}
+
+define i32 @main() !prof !1 {
+entry:
+ ret i32 0
+}
+
+!0 = !{!"function_entry_count", i64 0}
+!1 = !{!"function_entry_count", i64 1}
|
@@ -1784,6 +1784,12 @@ defm debug_info_for_profiling : BoolFOption<"debug-info-for-profiling", | |||
PosFlag<SetTrue, [], [ClangOption, CC1Option], | |||
"Emit extra debug info to make sample profile more accurate">, | |||
NegFlag<SetFalse>>; | |||
def fprofile_generate_cold_function_coverage : Flag<["-"], "fprofile-generate-cold-function-coverage">, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to expose this through the clang frontend?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if my PR description is not clear. Note that there is no use for -fprofile-generate
and -fprofile-instr-generate
here, so a driver flag here is to configure the instr file path and make linker to link the compiler_rt.profile object files (just similar to -fprofile-[instr]-generate=
).
The reason for not using -fprofile-[instr]-generate
is because those flags conflict with -fprofile-sample-use
, see PGOOptions, ProfileFile
is a shared file path which means the two flags are actually mutually exclusive.
Another option is to make -fprofile-generate
compatible with -fprofile-samle-use
(like refactoring PGOOptions or adding another flag to configure the file path things), this seems to me they are more complicated than the current one. But I’m open to any suggestions on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant, why not just use clang ... -mllvm -instrument-cold-function-coverage
? Is this a clang - only feature (i.e. rust can't use it?) Is it just for symmetry with the current PGO flags?
(This is not a pushback, mainly curious. Also the patch would be quite smaller if you didn't need to pass through the flags from clang to llvm)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant, why not just use
clang ... -mllvm -instrument-cold-function-coverage
? Is this a clang - only feature (i.e. rust can't use it?) Is it just for symmetry with the current PGO flags?(This is not a pushback, mainly curious. Also the patch would be quite smaller if you didn't need to pass through the flags from clang to llvm)
I see, thanks for the suggestion! We also need to link runtime lib(compiler_rt.profile.a
)
but yeah, I agree it's possible to pass directly to llvm and linker without through clang. Then the full command line would be like
clang ... -mllvm -instrument-cold-function-coverage -mllvm -instrument-sample-cold-function-path=<file-path> -mllvm --pgo-function-entry-coverage
ld.lld ... --u__llvm_runtime_variable .../lib/x86_64-unknown-linux-gnu/libclang_rt.profile.a
So yes, adding the clang driver flag is kind of to mirror current IRPGO flags, for easy maintenance purpose(given that -fprofile-generate
doesn't work with -fprofile-sample-use
) and also to centralize the configuration for the convenience. IMO, the compiler_rt
is probably the main blocker here, I didn't find an easy way to bundle it with a llvm flag.
Appreciate any further suggestions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would -Wl,-lclang_rt.profile
work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also to centralize the configuration for the convenience
+1 I think a frontend flag is useful. It allows us to identify incompatibilities early and provide clear error messages instead of more obscure failures down the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would incompatibilities and ability to provide useful error messages be dependent on the flag being in the frontend? Is it that the backend can't actually reliably point the finger at the specific flag causing the conflict, so a message would be diluted to "sampling won't work with this"?
(probably a subject for a different forum tho)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would -Wl,-lclang_rt.profile work?
Got it, I think it should work, except it requires another linker flag: the --u__llvm_runtime_variable
, we can configure it to linker too, but basically those instr PGO flags control addProfileRTLibs (which seems not equivalent to -Wl,-lclang_rt.profile
), I just wanted to mirror those flags so that we don't need to maintain if anything changes to addProfileRTLibs
in the future. (Though I feel this code should be very stable, should not be a big problem, so mainly for the convenience for compiler user to use one flag instead of using/maintaining multiple flags )
Overall, I agree that it's feasible to do it without clang flag. I'm not very familiar with the convention for adding driver flags, so if you think this doesn't justify it, I will drop it from this patch. Thanks for the discussion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it that the backend can't actually reliably point the finger at the specific flag causing the conflict
Yes, if we know that this instrumentation mode should not be mixed with e.g. sanitizers or something else we can enforce these checks early. I don't see a particular downside to adding a frontend flag. The convenience of bundling the 2 lld flags and 3 mllvm flags into a single frontend flag seems like a good enough motivation to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on single driver flag for convenience.
For your use case, can you use ProfileSymbolList to identify very cold functions (executed but unsampled) or are you indeed looking for functions that are never executed? Will this be used to guide developers with diagnostics or more aggressive compiler driven optimizations? |
Why not use the existing |
We are indeed focusing on functions that are never executed. The ProfileSymbolList is already used to identify functions with a “0” entry count, which is the target of the instrumentation in this work. We then aim to further distinguish the dead functions from them.
We haven’t considered to use them for any compiler optimization. The primary goal is to improve the developer experience, just to take the general benefits from removing dead code: improving readability, good codebase maintainability, reducing build time, etc. Another potential use might be to verify the sampling PGO profile coverage/quality, say to check if we missed any functions that are actually hot. |
Thanks for the context!
We do use the
Yeah, as I commented above, unfortunately currently the IRPGO main flag is incompatible with the Sampling PGO flag(also a bit diverged for the pipeline passes), that's why we have to add extra instr passes under sampling PGO pipeline and added a new driver flag. |
Oh, I missed that and your earlier comment. Makes more sense, thanks! |
; RUN: opt < %s --passes=pgo-instr-gen -instrument-cold-function-coverage -S | FileCheck --check-prefixes=COLD %s | ||
; RUN: opt < %s --passes=pgo-instr-gen -instrument-cold-function-coverage -cold-function-coverage-max-entry-count=1 -S | FileCheck --check-prefixes=ENTRY-COUNT %s | ||
|
||
; COLD: call void @llvm.instrprof.increment(ptr @__profn_foo, i64 [[#]], i32 1, i32 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan to support -instrument-cold-function-coverage
without -pgo-function-entry-coverage
? If not, I think we should add the flag here so we get the llvm.instrprof.cover
intrinsic to more closely align with expected behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!(it should still work without -pgo-function-entry-coverage
, but agree to test the most common case)
if (InstrumentSampleColdFuncPath.getNumOccurrences() && | ||
Phase != ThinOrFullLTOPhase::ThinLTOPostLink) { | ||
assert(!InstrumentSampleColdFuncPath.empty() && | ||
"File path is requeired for instrumentation generation"); | ||
InstrumentColdFunctionCoverage = true; | ||
addPreInlinerPasses(MPM, Level, Phase); | ||
addPGOInstrPasses(MPM, Level, /* RunProfileGen */ true, | ||
/* IsCS */ false, /* AtomicCounterUpdate */ false, | ||
InstrumentSampleColdFuncPath, "", | ||
IntrusiveRefCntPtr<vfs::FileSystem>()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking at this a bit closer, I'm not sure why it needs to be tied to closely to SamplePGO. Couldn't we move this out of the LoadSampleProfile
and move it after IsPGOInstrUse/IsCtxProfUse
? That way we can use IRPGO, CSIRPGO, and SamplePGO profile counts to block instrumentation hot functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking at this a bit closer, I'm not sure why it needs to be tied to closely to SamplePGO. Couldn't we move this out of the
LoadSampleProfile
and move it afterIsPGOInstrUse/IsCtxProfUse
? That way we can use IRPGO, CSIRPGO, and SamplePGO profile counts to block instrumentation hot functions.
Ah, good point, moved them closer to other IRPGO passes. Thanks for the suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Now I think this should be independent of SamplePGO. Can we remove "sample" from the flag and variable names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean the sample
in InstrumentSampleColdFuncPath
flag or all other variables(like IsSampleColdFuncCovGen
)? I thought something like:
if (IsSampleColdFuncCovGen || IsXXXColdFuncCovGen) {
addPGOInstrPasses(.., .., InstrumentColdFuncCoveragePath, .. )
}
InstrumentColdFuncCoveragePath
would be a general flag that can be used for all cold function coverage case. but currently IsSampleColdFuncCovGen
only represents for sampling PGO cold func. And If we want to extend it in future, then we can add more bool flag IsXXXColdFuncCovGen...
I also added an assertion:
assert((InstrumentColdFuncCoveragePath.empty() || LoadSampleProfile) &&
"Sampling-based cold functon coverage is not supported without "
"providing sampling PGO profile");
Seems I have to remove that if we want it to be general. (just curious) do you plan to extend it for your case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how there would be any functional difference between using IsSampleColdFuncCovGen
and IsXXXColdFuncCovGen
, but I could be missing something. If we can have a single flag for all cases, I think that would be cleaner and I can also try to use it for my case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how there would be any functional difference between using
IsSampleColdFuncCovGen
andIsXXXColdFuncCovGen
, but I could be missing something. If we can have a single flag for all cases, I think that would be cleaner and I can also try to use it for my case.
I see, sounds good.
My previous thought is: there could be functional difference for using the -instrument-cold-function-coverage
flag with sampling PGO flag(-fprofile-sample-use
) vs without sampling PGO flags. With the sampling PGO flag, if a function symbol shows in the binary but not in the profile, we can treat it as cold function(set 0 entry count), we then would instrument those function. But without sampling PGO flag(also no other PGO options), the entry count is unknown(-1), then we don't instrument them.
So user needs to be aware of the combination use of those flags, I was then trying to prevent the misuse, to disallow the use of --instrument-cold-function-coverage
without (sampling) PGO use options(use the assertion in the version).
But on a second thought, if we want to extend for other PGO options, say if the profile annotation and instrumentation are not in the same pipeline, it's probably hard to check this locally, we can leave it to the build system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a function symbol shows in the binary but not in the profile, we can treat it as cold function(set 0 entry count)
I wasn't aware that SamplePGO did this. This is indeed different than IRPGO. Perhaps it makes more sense to use a separate flag to control this case. Maybe something like -instrument-cold-function-coverage-mode={optimistic,conservative}
where optimistic
means we optimistically treat unprofiled functions as cold and conservative
means we conservatively assume unprofiled functions is hot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, that's more flexible, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The backend looks good to me! (after resolving my last comments of course)
Thanks for adding this feature!
Twine("--instrument-cold-function-coverage-path=") + Path)); | ||
CmdArgs.push_back("-mllvm"); | ||
CmdArgs.push_back("--instrument-cold-function-coverage"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two flags seem duplicative. Path is guaranteed to be not empty here, so the boolean flag seems unnecessary?
Relatedly, I know this might not be easy. But while it's convenient to accept a new driver flag, I still kept wondering if we can communicate the single driver flag to LLVM using existing and orthogonal flags as much as possible. Like this:
-fprofile-instrument-path= // reuse existing flag to communicate file path
-pgo-function-entry-coverage // reuse existing flag to communicate coverage only
-pgo-instrument-cold-function-only // add new flag to communicate cold function only
I know that the current implementation has assumption when InstrProfileOutput
is not empty, but I still wonder is there a way recognize the use of these three flags together, and special case for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this idea, however, this seems indeed not easy(needs a lot of refactoring).
As you said, the InstrProfileOutput
filed is shared with other PGO flag(in our case, it's the fprofile-sample-use=
) , see PGOOptions.ProfileFIle
. One thing we can do is to split this field into two: ProfileUseFile
and ProfileGenFole
, let all profile-<instr>-gererate
go to ProfileGenFile
and let sample-use/instr-use go
to ProfileUseFile
. I think that's doable and not a big change(maybe a separate diff). Then in sample-use case, we can use the ProfileGenFIle
path for cold coverage work. WDYT?
Though it would still not be perfect, the logic here https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/BackendUtil.cpp#L811-L841 is complicated/ would still need more refactoring. I was stuck here when thinking if -fprofile-sample-use
and --fprofile-instrument-path=
are both used, which branch(CodeGenOpts.hasProfileIRInstr()
or !CodeGenOpts.SampleProfileFile.empty()
) should it belongs to.
For this version, I just removed the duplicated flag(--pgo-instrument-cold-function-only
) and added a FIXME comment to explain this.
@@ -1784,6 +1784,12 @@ defm debug_info_for_profiling : BoolFOption<"debug-info-for-profiling", | |||
PosFlag<SetTrue, [], [ClangOption, CC1Option], | |||
"Emit extra debug info to make sample profile more accurate">, | |||
NegFlag<SetFalse>>; | |||
def fprofile_generate_cold_function_coverage : Flag<["-"], "fprofile-generate-cold-function-coverage">, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on single driver flag for convenience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good enough to me with the FIXME :) Thanks!
llvm/include/llvm/Transforms/Instrumentation/PGOInstrumentation.h
Outdated
Show resolved
Hide resolved
Hi, this PR is causing a regression on the AIX bot here https://lab.llvm.org/buildbot/#/builders/64/builds/1321/steps/6/logs/FAIL__Clang__pgo-cold-function-coverage_c. Would you be able to take a look? I think it can be resolved by using clang_cc1 in the testcase of marking it unsupported for AIX (see 951919e) |
Sorry for the breakage, looks like we can bypass it by adding |
Thank you for the quick fix! |
@@ -1182,8 +1187,13 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level, | |||
const bool IsCtxProfUse = | |||
!UseCtxProfile.empty() && Phase == ThinOrFullLTOPhase::ThinLTOPreLink; | |||
|
|||
// Enable cold function coverage instrumentation if | |||
// InstrumentColdFuncOnlyPath is provided. | |||
const bool IsColdFuncOnlyInstrGen = PGOInstrumentColdFunctionOnly = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting this global variable here has introduced a data race detected by TSan when different threads set up different pass pipelines. Was this intentional? Did you mean to use ==
instead of =
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out. It's intentional to set the global variable, I wasn't aware it will cause data race. I will try to change it to local one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes TSAN tests failures all over our infrastructure in anything that uses LLVM jit, i.e. Tensorflow/XLA
One of the opens-source examples is https://github.com/google-deepmind/mujoco_menagerie/blob/main/test/model_test.py
Traces look like:
WARNING: ThreadSanitizer: data race (pid=4749)
Write of size 1 at 0x7f83f3bd09f8 by thread T96 (mutexes: write M0):
#0 setValue<bool> [third_party/llvm/llvm-project/llvm/include/llvm/Support/CommandLine.h:1401](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/Support/CommandLine.h?l=1401&ws=dmitryc/38224&snapshot=3):11 (libthird_Uparty_Sllvm_Sllvm-project_Sllvm_SlibPasses.so+0x2eb43b) (BuildId: a0d2dfe7395d03fede80b0393e3acc55)
#1 operator=<bool> [third_party/llvm/llvm-project/llvm/include/llvm/Support/CommandLine.h:1492](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/Support/CommandLine.h?l=1492&ws=dmitryc/38224&snapshot=3):11 (libthird_Uparty_Sllvm_Sllvm-project_Sllvm_SlibPasses.so+0x2eb43b)
#2 llvm::PassBuilder::buildModuleSimplificationPipeline(llvm::OptimizationLevel, llvm::ThinOrFullLTOPhase) [third_party/llvm/llvm-project/llvm/lib/Passes/PassBuilderPipelines.cpp:1192](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/lib/Passes/PassBuilderPipelines.cpp?l=1192&ws=dmitryc/38224&snapshot=3):69 (libthird_Uparty_Sllvm_Sllvm-project_Sllvm_SlibPasses.so+0x2eb43b)
#3 llvm::PassBuilder::buildPerModuleDefaultPipeline(llvm::OptimizationLevel, bool) [third_party/llvm/llvm-project/llvm/lib/Passes/PassBuilderPipelines.cpp:1620](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/lib/Passes/PassBuilderPipelines.cpp?l=1620&ws=dmitryc/38224&snapshot=3):15 (libthird_Uparty_Sllvm_Sllvm-project_Sllvm_SlibPasses.so+0x2f1ef4) (BuildId: a0d2dfe7395d03fede80b0393e3acc55)
#4 xla::cpu::CompilerFunctor::operator()(llvm::Module&) [third_party/tensorflow/compiler/xla/service/cpu/compiler_functor.cc:181](https://cs.corp.google.com/piper///depot/google3/third_party/tensorflow/compiler/xla/service/cpu/compiler_functor.cc?l=181&ws=dmitryc/38224&snapshot=3):19 (libthird_Uparty_Stensorflow_Scompiler_Sxla_Sservice_Scpu_Slibcompiler_Ufunctor.so+0xafde) (BuildId: 036848c7284f8ba35b41d1e7f52e58ff)
...
WARNING: ThreadSanitizer: data race (pid=4749)
Write of size 1 at 0x7f83f3bd09f8 by thread T96 (mutexes: write M0):
#0 setValue<bool> [third_party/llvm/llvm-project/llvm/include/llvm/Support/CommandLine.h:1401](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/Support/CommandLine.h?l=1401&ws=dmitryc/38224&snapshot=3):11 (libthird_Uparty_Sllvm_Sllvm-project_Sllvm_SlibPasses.so+0x2eb43b) (BuildId: a0d2dfe7395d03fede80b0393e3acc55)
#1 operator=<bool> [third_party/llvm/llvm-project/llvm/include/llvm/Support/CommandLine.h:1492](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/Support/CommandLine.h?l=1492&ws=dmitryc/38224&snapshot=3):11 (libthird_Uparty_Sllvm_Sllvm-project_Sllvm_SlibPasses.so+0x2eb43b)
#2 llvm::PassBuilder::buildModuleSimplificationPipeline(llvm::OptimizationLevel, llvm::ThinOrFullLTOPhase) [third_party/llvm/llvm-project/llvm/lib/Passes/PassBuilderPipelines.cpp:1192](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/lib/Passes/PassBuilderPipelines.cpp?l=1192&ws=dmitryc/38224&snapshot=3):69 (libthird_Uparty_Sllvm_Sllvm-project_Sllvm_SlibPasses.so+0x2eb43b)
#3 llvm::PassBuilder::buildPerModuleDefaultPipeline(llvm::OptimizationLevel, bool) [third_party/llvm/llvm-project/llvm/lib/Passes/PassBuilderPipelines.cpp:1620](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/lib/Passes/PassBuilderPipelines.cpp?l=1620&ws=dmitryc/38224&snapshot=3):15 (libthird_Uparty_Sllvm_Sllvm-project_Sllvm_SlibPasses.so+0x2f1ef4) (BuildId: a0d2dfe7395d03fede80b0393e3acc55)
#4 xla::cpu::CompilerFunctor::operator()(llvm::Module&) [third_party/tensorflow/compiler/xla/service/cpu/compiler_functor.cc:181](https://cs.corp.google.com/piper///depot/google3/third_party/tensorflow/compiler/xla/service/cpu/compiler_functor.cc?l=181&ws=dmitryc/38224&snapshot=3):19 (libthird_Uparty_Stensorflow_Scompiler_Sxla_Sservice_Scpu_Slibcompiler_Ufunctor.so+0xafde) (BuildId: 036848c7284f8ba35b41d1e7f52e58ff)
....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed by #114364
…lvm#109837)" This reverts commit d924a9b.
…14364) In #109837, it sets a global variable(`PGOInstrumentColdFunctionOnly`) in PassBuilderPipelines.cpp which introduced a data race detected by TSan. To fix this, I decouple the flag setting, the flags are now set separately(`instrument-cold-function-only-path` is required to be used with `--pgo-instrument-cold-function-only`).
…vm#109837)" This reverts commit e517cfc.
…vm#114364) In llvm#109837, it sets a global variable(`PGOInstrumentColdFunctionOnly`) in PassBuilderPipelines.cpp which introduced a data race detected by TSan. To fix this, I decouple the flag setting, the flags are now set separately(`instrument-cold-function-only-path` is required to be used with `--pgo-instrument-cold-function-only`).
This patch adds support for cold function coverage instrumentation based on sampling PGO counts. The major motivation is to detect dead functions for the services that are optimized with sampling PGO. If a function is covered by sampling profile count (e.g., those with an entry count > 0), we choose to skip instrumenting those functions, which significantly reduces the instrumentation overhead. More details about the implementation and flags: - Added a flag `--pgo-instrument-cold-function-only` in `PGOInstrumentation.cpp` as the main switch to control skipping the instrumentation. - Built the extra instrumentation passes(a bundle of passes in `addPGOInstrPasses`) under sampling PGO pipeline. This is controlled by `--instrument-cold-function-only-path` flag. - Added a driver flag `-fprofile-generate-cold-function-coverage`: - 1) Config the flags in one place, i,e. adding `--instrument-cold-function-only-path=<...>` and `--pgo-function-entry-coverage`. Note that the instrumentation file path is passed through `--instrument-sample-cold-function-path`, because we cannot use the `PGOOptions.ProfileFile` as it's already used by `-fprofile-sample-use=<...>`. - 2) makes linker to link `compiler_rt.profile` lib(see [ToolChain.cpp#L1125-L1131](https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChain.cpp#L1125-L1131) ). - Added a flag(`--pgo-cold-instrument-entry-threshold`) to config entry count to determine cold function. Overall, the full command is like: ``` clang++ -O2 -fprofile-generate-cold-function-coverage=<...> -fprofile-sample-use=<...> code.cc -o code ```
…vm#109837)" This reverts commit e517cfc.
…vm#114364) In llvm#109837, it sets a global variable(`PGOInstrumentColdFunctionOnly`) in PassBuilderPipelines.cpp which introduced a data race detected by TSan. To fix this, I decouple the flag setting, the flags are now set separately(`instrument-cold-function-only-path` is required to be used with `--pgo-instrument-cold-function-only`).
This patch adds support for cold function coverage instrumentation based on sampling PGO counts. The major motivation is to detect dead functions for the services that are optimized with sampling PGO. If a function is covered by sampling profile count (e.g., those with an entry count > 0), we choose to skip instrumenting those functions, which significantly reduces the instrumentation overhead.
More details about the implementation and flags:
--pgo-instrument-cold-function-only
inPGOInstrumentation.cpp
as the main switch to control skipping the instrumentation.addPGOInstrPasses
) under sampling PGO pipeline. This is controlled by--instrument-cold-function-only-path
flag.-fprofile-generate-cold-function-coverage
:--instrument-cold-function-only-path=<...>
and--pgo-function-entry-coverage
. Note that the instrumentation file path is passed through--instrument-sample-cold-function-path
, because we cannot use thePGOOptions.ProfileFile
as it's already used by-fprofile-sample-use=<...>
.compiler_rt.profile
lib(see ToolChain.cpp#L1125-L1131 ).--pgo-cold-instrument-entry-threshold
) to config entry count to determine cold function.Overall, the full command is like: