[InstrPGO] Support cold function coverage instrumentation #109837

wlei-llvm · 2024-09-24T17:28:26Z

This patch adds support for cold function coverage instrumentation based on sampling PGO counts. The major motivation is to detect dead functions for the services that are optimized with sampling PGO. If a function is covered by sampling profile count (e.g., those with an entry count > 0), we choose to skip instrumenting those functions, which significantly reduces the instrumentation overhead.

More details about the implementation and flags:

Added a flag --pgo-instrument-cold-function-only in PGOInstrumentation.cpp as the main switch to control skipping the instrumentation.
Built the extra instrumentation passes(a bundle of passes in addPGOInstrPasses) under sampling PGO pipeline. This is controlled by --instrument-cold-function-only-path flag.
Added a driver flag -fprofile-generate-cold-function-coverage:
- 1. Config the flags in one place, i,e. adding --instrument-cold-function-only-path=<...> and --pgo-function-entry-coverage. Note that the instrumentation file path is passed through --instrument-sample-cold-function-path, because we cannot use the PGOOptions.ProfileFile as it's already used by -fprofile-sample-use=<...>.
- 1. makes linker to link compiler_rt.profile lib(see ToolChain.cpp#L1125-L1131 ).
Added a flag(--pgo-cold-instrument-entry-threshold) to config entry count to determine cold function.

Overall, the full command is like:

clang++ -O2 -fprofile-generate-cold-function-coverage=<...> -fprofile-sample-use=<...>  code.cc -o code

github-actions · 2024-09-24T17:32:17Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvmbot · 2024-10-01T17:47:03Z

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-pgo

Author: Lei Wang (wlei-llvm)

Changes

This patch adds support for cold function coverage instrumentation based on sampling PGO counts. The major motivation is to detect dead functions for the services that are optimized with sampling PGO. If a function is covered by sampling profile count (e.g., those with an entry count > 0), we choose to skip instrumenting those functions, which significantly reduces the instrumentation overhead.

More details about the implementation and flags:

Added a flag --instrument-cold-function-coverage in PGOInstrumentation.cpp as the main switch to control skipping the instrumentation.
Built the extra instrumentation passes(a bundle of passes in addPGOInstrPasses) under sampling PGO pipeline. This is controlled by --instrument-sample-cold-function-path flag.
Added a driver flag -fprofile-generate-cold-function-coverage:
- 1. Config the flags in one place, i,e. adding --instrument-sample-cold-function-path=<...> and --instrument-cold-function-coverage, and --pgo-function-entry-coverage. Note that the instrumentation file path is passed through --instrument-sample-cold-function-path, because we cannot use the PGOOptions.ProfileFile as it's already used by -fprofile-sample-use=<...>.
- 1. makes linker to link compiler_rt.profile lib(see ToolChain.cpp#L1125-L1131 ).
Added a flag(--cold-function-coverage-max-entry-count) to config entry count to determine cold function.

Overall, the full command is like:

clang++ -O2 -fprofile-generate-cold-function-coverage=&lt;...&gt; -fprofile-sample-use=&lt;...&gt;  code.cc -o code

Full diff: https://github.com/llvm/llvm-project/pull/109837.diff

8 Files Affected:

(modified) clang/include/clang/Driver/Options.td (+6)
(modified) clang/lib/Driver/ToolChain.cpp (+3-1)
(modified) clang/lib/Driver/ToolChains/Clang.cpp (+18)
(added) clang/test/CodeGen/Inputs/pgo-cold-func.prof (+2)
(added) clang/test/CodeGen/pgo-cold-function-coverage.c (+12)
(modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+18)
(modified) llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (+16)
(added) llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll (+24)

diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 002f60350543d9..6bb92427f2d53f 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -1784,6 +1784,12 @@ defm debug_info_for_profiling : BoolFOption<"debug-info-for-profiling",
   PosFlag<SetTrue, [], [ClangOption, CC1Option],
           "Emit extra debug info to make sample profile more accurate">,
   NegFlag<SetFalse>>;
+def fprofile_generate_cold_function_coverage : Flag<["-"], "fprofile-generate-cold-function-coverage">, 
+    Group<f_Group>, Visibility<[ClangOption, CLOption]>,
+    HelpText<"Generate instrumented code to cold functions into default.profraw file (overridden by '=' form of option or LLVM_PROFILE_FILE env var)">;
+def fprofile_generate_cold_function_coverage_EQ : Joined<["-"], "fprofile-generate-cold-function-coverage=">, 
+    Group<f_Group>, Visibility<[ClangOption, CLOption]>, MetaVarName<"<directory>">,
+    HelpText<"Generate instrumented code to cold functions into <directory>/default.profraw (overridden by LLVM_PROFILE_FILE env var)">; 
 def fprofile_instr_generate : Flag<["-"], "fprofile-instr-generate">,
     Group<f_Group>, Visibility<[ClangOption, CLOption]>,
     HelpText<"Generate instrumented code to collect execution counts into default.profraw file (overridden by '=' form of option or LLVM_PROFILE_FILE env var)">;
diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp
index 16f9b629fc538c..e56db150c6b47e 100644
--- a/clang/lib/Driver/ToolChain.cpp
+++ b/clang/lib/Driver/ToolChain.cpp
@@ -889,7 +889,9 @@ bool ToolChain::needsProfileRT(const ArgList &Args) {
          Args.hasArg(options::OPT_fprofile_instr_generate) ||
          Args.hasArg(options::OPT_fprofile_instr_generate_EQ) ||
          Args.hasArg(options::OPT_fcreate_profile) ||
-         Args.hasArg(options::OPT_forder_file_instrumentation);
+         Args.hasArg(options::OPT_forder_file_instrumentation) ||
+         Args.hasArg(options::OPT_fprofile_generate_cold_function_coverage) ||
+         Args.hasArg(options::OPT_fprofile_generate_cold_function_coverage_EQ);
 }
 
 bool ToolChain::needsGCovInstrumentation(const llvm::opt::ArgList &Args) {
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 0bab48caf1a5e2..00602a08232ba2 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -649,6 +649,24 @@ static void addPGOAndCoverageFlags(const ToolChain &TC, Compilation &C,
     }
   }
 
+  if (auto *ColdFuncCoverageArg = Args.getLastArg(
+          options::OPT_fprofile_generate_cold_function_coverage,
+          options::OPT_fprofile_generate_cold_function_coverage_EQ)) {
+    SmallString<128> Path(
+        ColdFuncCoverageArg->getOption().matches(
+            options::OPT_fprofile_generate_cold_function_coverage_EQ)
+            ? ColdFuncCoverageArg->getValue()
+            : "");
+    llvm::sys::path::append(Path, "default_%m.profraw");
+    CmdArgs.push_back("-mllvm");
+    CmdArgs.push_back(Args.MakeArgString(
+        Twine("--instrument-sample-cold-function-path=") + Path));
+    CmdArgs.push_back("-mllvm");
+    CmdArgs.push_back("--instrument-cold-function-coverage");
+    CmdArgs.push_back("-mllvm");
+    CmdArgs.push_back("--pgo-function-entry-coverage");
+  }
+
   Arg *PGOGenArg = nullptr;
   if (PGOGenerateArg) {
     assert(!CSPGOGenerateArg);
diff --git a/clang/test/CodeGen/Inputs/pgo-cold-func.prof b/clang/test/CodeGen/Inputs/pgo-cold-func.prof
new file mode 100644
index 00000000000000..e50be02e0a8545
--- /dev/null
+++ b/clang/test/CodeGen/Inputs/pgo-cold-func.prof
@@ -0,0 +1,2 @@
+foo:1:1
+ 1: 1
diff --git a/clang/test/CodeGen/pgo-cold-function-coverage.c b/clang/test/CodeGen/pgo-cold-function-coverage.c
new file mode 100644
index 00000000000000..0d2767c022b9f1
--- /dev/null
+++ b/clang/test/CodeGen/pgo-cold-function-coverage.c
@@ -0,0 +1,12 @@
+// Test -fprofile-generate-cold-function-coverage 
+// RUN: %clang -O2 -fprofile-generate-cold-function-coverage=/xxx/yyy/ -fprofile-sample-accurate -fprofile-sample-use=%S/Inputs/pgo-cold-func.prof  -S -emit-llvm -o - %s | FileCheck %s
+
+// CHECK: @__llvm_profile_filename = {{.*}} c"/xxx/yyy/default_%m.profraw\00"
+
+// CHECK: @__profc_bar
+// CHECK-NOT: @__profc_foo
+
+int bar(int x) { return x;}
+int foo(int x) { 
+    return x;
+}
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 8f151a99b11709..f75abe0c7c7649 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -296,7 +296,13 @@ static cl::opt<bool> UseLoopVersioningLICM(
     "enable-loop-versioning-licm", cl::init(false), cl::Hidden,
     cl::desc("Enable the experimental Loop Versioning LICM pass"));
 
+static cl::opt<std::string> InstrumentSampleColdFuncPath(
+    "instrument-sample-cold-function-path", cl::init(""),
+    cl::desc("File path for instrumenting sampling PGO guided cold functions"),
+    cl::Hidden);
+
 extern cl::opt<std::string> UseCtxProfile;
+extern cl::opt<bool> InstrumentColdFunctionCoverage;
 
 namespace llvm {
 extern cl::opt<bool> EnableMemProfContextDisambiguation;
@@ -1119,6 +1125,18 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
       // removed.
       MPM.addPass(
           PGOIndirectCallPromotion(true /* IsInLTO */, true /* SamplePGO */));
+
+    if (InstrumentSampleColdFuncPath.getNumOccurrences() &&
+        Phase != ThinOrFullLTOPhase::ThinLTOPostLink) {
+      assert(!InstrumentSampleColdFuncPath.empty() &&
+             "File path is requeired for instrumentation generation");
+      InstrumentColdFunctionCoverage = true;
+      addPreInlinerPasses(MPM, Level, Phase);
+      addPGOInstrPasses(MPM, Level, /* RunProfileGen */ true,
+                        /* IsCS */ false, /* AtomicCounterUpdate */ false,
+                        InstrumentSampleColdFuncPath, "",
+                        IntrusiveRefCntPtr<vfs::FileSystem>());
+    }
   }
 
   // Try to perform OpenMP specific optimizations on the module. This is a
diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
index 10442fa0bb9003..890ad853d86461 100644
--- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
+++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
@@ -319,6 +319,18 @@ static cl::opt<unsigned> PGOFunctionCriticalEdgeThreshold(
     cl::desc("Do not instrument functions with the number of critical edges "
              " greater than this threshold."));
 
+cl::opt<bool> InstrumentColdFunctionCoverage(
+    "instrument-cold-function-coverage", cl::init(false), cl::Hidden,
+    cl::desc("Enable cold function coverage instrumentation (currently only "
+             "used under sampling "
+             " PGO pipeline))"));
+
+static cl::opt<uint64_t> ColdFuncCoverageMaxEntryCount(
+    "cold-function-coverage-max-entry-count", cl::init(0), cl::Hidden,
+    cl::desc("When enabling cold function coverage instrumentation, skip "
+             "instrumenting the "
+             "function whose entry count is above the given value"));
+
 extern cl::opt<unsigned> MaxNumVTableAnnotations;
 
 namespace llvm {
@@ -1891,6 +1903,10 @@ static bool skipPGOGen(const Function &F) {
     return true;
   if (F.getInstructionCount() < PGOFunctionSizeThreshold)
     return true;
+  if (InstrumentColdFunctionCoverage &&
+      (!F.getEntryCount() ||
+       F.getEntryCount()->getCount() > ColdFuncCoverageMaxEntryCount))
+    return true;
   return false;
 }
 
diff --git a/llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll b/llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll
new file mode 100644
index 00000000000000..ba1a11f7184356
--- /dev/null
+++ b/llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll
@@ -0,0 +1,24 @@
+; RUN: opt < %s  --passes=pgo-instr-gen -instrument-cold-function-coverage -S  | FileCheck --check-prefixes=COLD %s
+; RUN: opt < %s  --passes=pgo-instr-gen -instrument-cold-function-coverage -cold-function-coverage-max-entry-count=1 -S  | FileCheck --check-prefixes=ENTRY-COUNT %s
+
+; COLD: call void @llvm.instrprof.increment(ptr @__profn_foo, i64  [[#]], i32 1, i32 0)
+; COLD-NOT: __profn_main
+
+; ENTRY-COUNT: call void @llvm.instrprof.increment(ptr @__profn_foo, i64  [[#]], i32 1, i32 0)
+; ENTRY-COUNT: call void @llvm.instrprof.increment(ptr @__profn_main, i64 [[#]], i32 1, i32 0)
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define void @foo() !prof !0 {
+entry:
+  ret void
+}
+
+define i32 @main() !prof !1 {
+entry:
+  ret i32 0
+}
+
+!0 = !{!"function_entry_count", i64 0}
+!1 = !{!"function_entry_count", i64 1}

llvmbot · 2024-10-01T17:47:04Z

@llvm/pr-subscribers-clang

Author: Lei Wang (wlei-llvm)

Changes

This patch adds support for cold function coverage instrumentation based on sampling PGO counts. The major motivation is to detect dead functions for the services that are optimized with sampling PGO. If a function is covered by sampling profile count (e.g., those with an entry count > 0), we choose to skip instrumenting those functions, which significantly reduces the instrumentation overhead.

More details about the implementation and flags:

Added a flag --instrument-cold-function-coverage in PGOInstrumentation.cpp as the main switch to control skipping the instrumentation.
Built the extra instrumentation passes(a bundle of passes in addPGOInstrPasses) under sampling PGO pipeline. This is controlled by --instrument-sample-cold-function-path flag.
Added a driver flag -fprofile-generate-cold-function-coverage:
- 1. Config the flags in one place, i,e. adding --instrument-sample-cold-function-path=<...> and --instrument-cold-function-coverage, and --pgo-function-entry-coverage. Note that the instrumentation file path is passed through --instrument-sample-cold-function-path, because we cannot use the PGOOptions.ProfileFile as it's already used by -fprofile-sample-use=<...>.
- 1. makes linker to link compiler_rt.profile lib(see ToolChain.cpp#L1125-L1131 ).
Added a flag(--cold-function-coverage-max-entry-count) to config entry count to determine cold function.

Overall, the full command is like:

clang++ -O2 -fprofile-generate-cold-function-coverage=&lt;...&gt; -fprofile-sample-use=&lt;...&gt;  code.cc -o code

Full diff: https://github.com/llvm/llvm-project/pull/109837.diff

8 Files Affected:

(modified) clang/include/clang/Driver/Options.td (+6)
(modified) clang/lib/Driver/ToolChain.cpp (+3-1)
(modified) clang/lib/Driver/ToolChains/Clang.cpp (+18)
(added) clang/test/CodeGen/Inputs/pgo-cold-func.prof (+2)
(added) clang/test/CodeGen/pgo-cold-function-coverage.c (+12)
(modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+18)
(modified) llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp (+16)
(added) llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll (+24)

diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 002f60350543d9..6bb92427f2d53f 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -1784,6 +1784,12 @@ defm debug_info_for_profiling : BoolFOption<"debug-info-for-profiling",
   PosFlag<SetTrue, [], [ClangOption, CC1Option],
           "Emit extra debug info to make sample profile more accurate">,
   NegFlag<SetFalse>>;
+def fprofile_generate_cold_function_coverage : Flag<["-"], "fprofile-generate-cold-function-coverage">, 
+    Group<f_Group>, Visibility<[ClangOption, CLOption]>,
+    HelpText<"Generate instrumented code to cold functions into default.profraw file (overridden by '=' form of option or LLVM_PROFILE_FILE env var)">;
+def fprofile_generate_cold_function_coverage_EQ : Joined<["-"], "fprofile-generate-cold-function-coverage=">, 
+    Group<f_Group>, Visibility<[ClangOption, CLOption]>, MetaVarName<"<directory>">,
+    HelpText<"Generate instrumented code to cold functions into <directory>/default.profraw (overridden by LLVM_PROFILE_FILE env var)">; 
 def fprofile_instr_generate : Flag<["-"], "fprofile-instr-generate">,
     Group<f_Group>, Visibility<[ClangOption, CLOption]>,
     HelpText<"Generate instrumented code to collect execution counts into default.profraw file (overridden by '=' form of option or LLVM_PROFILE_FILE env var)">;
diff --git a/clang/lib/Driver/ToolChain.cpp b/clang/lib/Driver/ToolChain.cpp
index 16f9b629fc538c..e56db150c6b47e 100644
--- a/clang/lib/Driver/ToolChain.cpp
+++ b/clang/lib/Driver/ToolChain.cpp
@@ -889,7 +889,9 @@ bool ToolChain::needsProfileRT(const ArgList &Args) {
          Args.hasArg(options::OPT_fprofile_instr_generate) ||
          Args.hasArg(options::OPT_fprofile_instr_generate_EQ) ||
          Args.hasArg(options::OPT_fcreate_profile) ||
-         Args.hasArg(options::OPT_forder_file_instrumentation);
+         Args.hasArg(options::OPT_forder_file_instrumentation) ||
+         Args.hasArg(options::OPT_fprofile_generate_cold_function_coverage) ||
+         Args.hasArg(options::OPT_fprofile_generate_cold_function_coverage_EQ);
 }
 
 bool ToolChain::needsGCovInstrumentation(const llvm::opt::ArgList &Args) {
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 0bab48caf1a5e2..00602a08232ba2 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -649,6 +649,24 @@ static void addPGOAndCoverageFlags(const ToolChain &TC, Compilation &C,
     }
   }
 
+  if (auto *ColdFuncCoverageArg = Args.getLastArg(
+          options::OPT_fprofile_generate_cold_function_coverage,
+          options::OPT_fprofile_generate_cold_function_coverage_EQ)) {
+    SmallString<128> Path(
+        ColdFuncCoverageArg->getOption().matches(
+            options::OPT_fprofile_generate_cold_function_coverage_EQ)
+            ? ColdFuncCoverageArg->getValue()
+            : "");
+    llvm::sys::path::append(Path, "default_%m.profraw");
+    CmdArgs.push_back("-mllvm");
+    CmdArgs.push_back(Args.MakeArgString(
+        Twine("--instrument-sample-cold-function-path=") + Path));
+    CmdArgs.push_back("-mllvm");
+    CmdArgs.push_back("--instrument-cold-function-coverage");
+    CmdArgs.push_back("-mllvm");
+    CmdArgs.push_back("--pgo-function-entry-coverage");
+  }
+
   Arg *PGOGenArg = nullptr;
   if (PGOGenerateArg) {
     assert(!CSPGOGenerateArg);
diff --git a/clang/test/CodeGen/Inputs/pgo-cold-func.prof b/clang/test/CodeGen/Inputs/pgo-cold-func.prof
new file mode 100644
index 00000000000000..e50be02e0a8545
--- /dev/null
+++ b/clang/test/CodeGen/Inputs/pgo-cold-func.prof
@@ -0,0 +1,2 @@
+foo:1:1
+ 1: 1
diff --git a/clang/test/CodeGen/pgo-cold-function-coverage.c b/clang/test/CodeGen/pgo-cold-function-coverage.c
new file mode 100644
index 00000000000000..0d2767c022b9f1
--- /dev/null
+++ b/clang/test/CodeGen/pgo-cold-function-coverage.c
@@ -0,0 +1,12 @@
+// Test -fprofile-generate-cold-function-coverage 
+// RUN: %clang -O2 -fprofile-generate-cold-function-coverage=/xxx/yyy/ -fprofile-sample-accurate -fprofile-sample-use=%S/Inputs/pgo-cold-func.prof  -S -emit-llvm -o - %s | FileCheck %s
+
+// CHECK: @__llvm_profile_filename = {{.*}} c"/xxx/yyy/default_%m.profraw\00"
+
+// CHECK: @__profc_bar
+// CHECK-NOT: @__profc_foo
+
+int bar(int x) { return x;}
+int foo(int x) { 
+    return x;
+}
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 8f151a99b11709..f75abe0c7c7649 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -296,7 +296,13 @@ static cl::opt<bool> UseLoopVersioningLICM(
     "enable-loop-versioning-licm", cl::init(false), cl::Hidden,
     cl::desc("Enable the experimental Loop Versioning LICM pass"));
 
+static cl::opt<std::string> InstrumentSampleColdFuncPath(
+    "instrument-sample-cold-function-path", cl::init(""),
+    cl::desc("File path for instrumenting sampling PGO guided cold functions"),
+    cl::Hidden);
+
 extern cl::opt<std::string> UseCtxProfile;
+extern cl::opt<bool> InstrumentColdFunctionCoverage;
 
 namespace llvm {
 extern cl::opt<bool> EnableMemProfContextDisambiguation;
@@ -1119,6 +1125,18 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
       // removed.
       MPM.addPass(
           PGOIndirectCallPromotion(true /* IsInLTO */, true /* SamplePGO */));
+
+    if (InstrumentSampleColdFuncPath.getNumOccurrences() &&
+        Phase != ThinOrFullLTOPhase::ThinLTOPostLink) {
+      assert(!InstrumentSampleColdFuncPath.empty() &&
+             "File path is requeired for instrumentation generation");
+      InstrumentColdFunctionCoverage = true;
+      addPreInlinerPasses(MPM, Level, Phase);
+      addPGOInstrPasses(MPM, Level, /* RunProfileGen */ true,
+                        /* IsCS */ false, /* AtomicCounterUpdate */ false,
+                        InstrumentSampleColdFuncPath, "",
+                        IntrusiveRefCntPtr<vfs::FileSystem>());
+    }
   }
 
   // Try to perform OpenMP specific optimizations on the module. This is a
diff --git a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
index 10442fa0bb9003..890ad853d86461 100644
--- a/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
+++ b/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp
@@ -319,6 +319,18 @@ static cl::opt<unsigned> PGOFunctionCriticalEdgeThreshold(
     cl::desc("Do not instrument functions with the number of critical edges "
              " greater than this threshold."));
 
+cl::opt<bool> InstrumentColdFunctionCoverage(
+    "instrument-cold-function-coverage", cl::init(false), cl::Hidden,
+    cl::desc("Enable cold function coverage instrumentation (currently only "
+             "used under sampling "
+             " PGO pipeline))"));
+
+static cl::opt<uint64_t> ColdFuncCoverageMaxEntryCount(
+    "cold-function-coverage-max-entry-count", cl::init(0), cl::Hidden,
+    cl::desc("When enabling cold function coverage instrumentation, skip "
+             "instrumenting the "
+             "function whose entry count is above the given value"));
+
 extern cl::opt<unsigned> MaxNumVTableAnnotations;
 
 namespace llvm {
@@ -1891,6 +1903,10 @@ static bool skipPGOGen(const Function &F) {
     return true;
   if (F.getInstructionCount() < PGOFunctionSizeThreshold)
     return true;
+  if (InstrumentColdFunctionCoverage &&
+      (!F.getEntryCount() ||
+       F.getEntryCount()->getCount() > ColdFuncCoverageMaxEntryCount))
+    return true;
   return false;
 }
 
diff --git a/llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll b/llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll
new file mode 100644
index 00000000000000..ba1a11f7184356
--- /dev/null
+++ b/llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll
@@ -0,0 +1,24 @@
+; RUN: opt < %s  --passes=pgo-instr-gen -instrument-cold-function-coverage -S  | FileCheck --check-prefixes=COLD %s
+; RUN: opt < %s  --passes=pgo-instr-gen -instrument-cold-function-coverage -cold-function-coverage-max-entry-count=1 -S  | FileCheck --check-prefixes=ENTRY-COUNT %s
+
+; COLD: call void @llvm.instrprof.increment(ptr @__profn_foo, i64  [[#]], i32 1, i32 0)
+; COLD-NOT: __profn_main
+
+; ENTRY-COUNT: call void @llvm.instrprof.increment(ptr @__profn_foo, i64  [[#]], i32 1, i32 0)
+; ENTRY-COUNT: call void @llvm.instrprof.increment(ptr @__profn_main, i64 [[#]], i32 1, i32 0)
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define void @foo() !prof !0 {
+entry:
+  ret void
+}
+
+define i32 @main() !prof !1 {
+entry:
+  ret i32 0
+}
+
+!0 = !{!"function_entry_count", i64 0}
+!1 = !{!"function_entry_count", i64 1}

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp

mtrofin · 2024-10-01T18:09:35Z

clang/include/clang/Driver/Options.td

@@ -1784,6 +1784,12 @@ defm debug_info_for_profiling : BoolFOption<"debug-info-for-profiling",
  PosFlag<SetTrue, [], [ClangOption, CC1Option],
          "Emit extra debug info to make sample profile more accurate">,
  NegFlag<SetFalse>>;
+def fprofile_generate_cold_function_coverage : Flag<["-"], "fprofile-generate-cold-function-coverage">, 


Is it necessary to expose this through the clang frontend?

Sorry if my PR description is not clear. Note that there is no use for -fprofile-generate and -fprofile-instr-generate here, so a driver flag here is to configure the instr file path and make linker to link the compiler_rt.profile object files (just similar to -fprofile-[instr]-generate=).

The reason for not using -fprofile-[instr]-generate is because those flags conflict with -fprofile-sample-use, see PGOOptions, ProfileFile is a shared file path which means the two flags are actually mutually exclusive.

Another option is to make -fprofile-generate compatible with -fprofile-samle-use(like refactoring PGOOptions or adding another flag to configure the file path things), this seems to me they are more complicated than the current one. But I’m open to any suggestions on this.

I meant, why not just use clang ... -mllvm -instrument-cold-function-coverage? Is this a clang - only feature (i.e. rust can't use it?) Is it just for symmetry with the current PGO flags?

(This is not a pushback, mainly curious. Also the patch would be quite smaller if you didn't need to pass through the flags from clang to llvm)

I meant, why not just use clang ... -mllvm -instrument-cold-function-coverage? Is this a clang - only feature (i.e. rust can't use it?) Is it just for symmetry with the current PGO flags?

(This is not a pushback, mainly curious. Also the patch would be quite smaller if you didn't need to pass through the flags from clang to llvm)

I see, thanks for the suggestion! We also need to link runtime lib(compiler_rt.profile.a)
but yeah, I agree it's possible to pass directly to llvm and linker without through clang. Then the full command line would be like

clang ... -mllvm -instrument-cold-function-coverage -mllvm -instrument-sample-cold-function-path=<file-path> -mllvm --pgo-function-entry-coverage ld.lld ... --u__llvm_runtime_variable .../lib/x86_64-unknown-linux-gnu/libclang_rt.profile.a

So yes, adding the clang driver flag is kind of to mirror current IRPGO flags, for easy maintenance purpose(given that -fprofile-generate doesn't work with -fprofile-sample-use) and also to centralize the configuration for the convenience. IMO, the compiler_rt is probably the main blocker here, I didn't find an easy way to bundle it with a llvm flag.

Appreciate any further suggestions!

Would -Wl,-lclang_rt.profile work?

also to centralize the configuration for the convenience

+1 I think a frontend flag is useful. It allows us to identify incompatibilities early and provide clear error messages instead of more obscure failures down the line.

Why would incompatibilities and ability to provide useful error messages be dependent on the flag being in the frontend? Is it that the backend can't actually reliably point the finger at the specific flag causing the conflict, so a message would be diluted to "sampling won't work with this"?

(probably a subject for a different forum tho)

Would -Wl,-lclang_rt.profile work?

Got it, I think it should work, except it requires another linker flag: the --u__llvm_runtime_variable, we can configure it to linker too, but basically those instr PGO flags control addProfileRTLibs (which seems not equivalent to -Wl,-lclang_rt.profile), I just wanted to mirror those flags so that we don't need to maintain if anything changes to addProfileRTLibs in the future. (Though I feel this code should be very stable, should not be a big problem, so mainly for the convenience for compiler user to use one flag instead of using/maintaining multiple flags )
Overall, I agree that it's feasible to do it without clang flag. I'm not very familiar with the convention for adding driver flags, so if you think this doesn't justify it, I will drop it from this patch. Thanks for the discussion!

Is it that the backend can't actually reliably point the finger at the specific flag causing the conflict

Yes, if we know that this instrumentation mode should not be mixed with e.g. sanitizers or something else we can enforce these checks early. I don't see a particular downside to adding a frontend flag. The convenience of bundling the 2 lld flags and 3 mllvm flags into a single frontend flag seems like a good enough motivation to do so.

+1 on single driver flag for convenience.

snehasish · 2024-10-01T19:12:51Z

The major motivation is to detect dead functions for the services that are optimized with sampling PGO.

For your use case, can you use ProfileSymbolList to identify very cold functions (executed but unsampled) or are you indeed looking for functions that are never executed?

Will this be used to guide developers with diagnostics or more aggressive compiler driven optimizations?

ellishg · 2024-10-01T20:48:42Z

Why not use the existing -pgo-function-entry-coverage (https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113/14?u=ellishg) LLVM flag? It takes advantage of the llvm.instrprof.cover intrinsic which has less size and runtime overhead than llvm.instrprof.increment. It also supports IRPGO and CSIRPGO while it seems that this PR requires a sample PGO input.

wlei-llvm · 2024-10-01T22:19:18Z

For your use case, can you use ProfileSymbolList to identify very cold functions (executed but unsampled) or are you indeed looking for functions that are never executed?

We are indeed focusing on functions that are never executed. The ProfileSymbolList is already used to identify functions with a “0” entry count, which is the target of the instrumentation in this work. We then aim to further distinguish the dead functions from them.

Will this be used to guide developers with diagnostics or more aggressive compiler driven optimizations?

We haven’t considered to use them for any compiler optimization. The primary goal is to improve the developer experience, just to take the general benefits from removing dead code: improving readability, good codebase maintainability, reducing build time, etc. Another potential use might be to verify the sampling PGO profile coverage/quality, say to check if we missed any functions that are actually hot.

wlei-llvm · 2024-10-01T22:37:22Z

Thanks for the context!

Why not use the existing -pgo-function-entry-coverage (https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113/14?u=ellishg) LLVM flag? It takes advantage of the llvm.instrprof.cover intrinsic which has less size and runtime overhead than llvm.instrprof.increment.

We do use the -pgo-function-entry-coverage in this PR, see here. but furthermore, we skip instrumenting the functions that are covered by sampling PGO profile.

It also supports IRPGO and CSIRPGO while it seems that this PR requires a sample PGO input.

Yeah, as I commented above, unfortunately currently the IRPGO main flag is incompatible with the Sampling PGO flag(also a bit diverged for the pipeline passes), that's why we have to add extra instr passes under sampling PGO pipeline and added a new driver flag.

ellishg · 2024-10-01T23:02:38Z

Why not use the existing -pgo-function-entry-coverage (https://discourse.llvm.org/t/instrprofiling-lightweight-instrumentation/59113/14?u=ellishg) LLVM flag? It takes advantage of the llvm.instrprof.cover intrinsic which has less size and runtime overhead than llvm.instrprof.increment.

We do use the -pgo-function-entry-coverage in this PR, see here. but furthermore, we skip instrumenting the functions that are covered by sampling PGO profile.

Oh, I missed that and your earlier comment. Makes more sense, thanks!

clang/test/CodeGen/pgo-cold-function-coverage.c

clang/lib/Driver/ToolChains/Clang.cpp

ellishg · 2024-10-01T23:04:32Z

llvm/test/Transforms/PGOProfile/instr-gen-cold-function.ll

+; RUN: opt < %s  --passes=pgo-instr-gen -instrument-cold-function-coverage -S  | FileCheck --check-prefixes=COLD %s
+; RUN: opt < %s  --passes=pgo-instr-gen -instrument-cold-function-coverage -cold-function-coverage-max-entry-count=1 -S  | FileCheck --check-prefixes=ENTRY-COUNT %s
+
+; COLD: call void @llvm.instrprof.increment(ptr @__profn_foo, i64  [[#]], i32 1, i32 0)


Do you plan to support -instrument-cold-function-coverage without -pgo-function-entry-coverage? If not, I think we should add the flag here so we get the llvm.instrprof.cover intrinsic to more closely align with expected behavior.

Sounds good!(it should still work without -pgo-function-entry-coverage, but agree to test the most common case)

ellishg · 2024-10-01T23:12:34Z

llvm/lib/Passes/PassBuilderPipelines.cpp

+    if (InstrumentSampleColdFuncPath.getNumOccurrences() &&
+        Phase != ThinOrFullLTOPhase::ThinLTOPostLink) {
+      assert(!InstrumentSampleColdFuncPath.empty() &&
+             "File path is requeired for instrumentation generation");
+      InstrumentColdFunctionCoverage = true;
+      addPreInlinerPasses(MPM, Level, Phase);
+      addPGOInstrPasses(MPM, Level, /* RunProfileGen */ true,
+                        /* IsCS */ false, /* AtomicCounterUpdate */ false,
+                        InstrumentSampleColdFuncPath, "",
+                        IntrusiveRefCntPtr<vfs::FileSystem>());
+    }


After looking at this a bit closer, I'm not sure why it needs to be tied to closely to SamplePGO. Couldn't we move this out of the LoadSampleProfile and move it after IsPGOInstrUse/IsCtxProfUse? That way we can use IRPGO, CSIRPGO, and SamplePGO profile counts to block instrumentation hot functions.

After looking at this a bit closer, I'm not sure why it needs to be tied to closely to SamplePGO. Couldn't we move this out of the LoadSampleProfile and move it after IsPGOInstrUse/IsCtxProfUse? That way we can use IRPGO, CSIRPGO, and SamplePGO profile counts to block instrumentation hot functions.

Ah, good point, moved them closer to other IRPGO passes. Thanks for the suggestion!

Thanks! Now I think this should be independent of SamplePGO. Can we remove "sample" from the flag and variable names?

do you mean the sample in InstrumentSampleColdFuncPathflag or all other variables(like IsSampleColdFuncCovGen)? I thought something like:

if (IsSampleColdFuncCovGen || IsXXXColdFuncCovGen) { addPGOInstrPasses(.., .., InstrumentColdFuncCoveragePath, .. ) }

InstrumentColdFuncCoveragePath would be a general flag that can be used for all cold function coverage case. but currently IsSampleColdFuncCovGen only represents for sampling PGO cold func. And If we want to extend it in future, then we can add more bool flag IsXXXColdFuncCovGen...

I also added an assertion:

assert((InstrumentColdFuncCoveragePath.empty() || LoadSampleProfile) && "Sampling-based cold functon coverage is not supported without " "providing sampling PGO profile");

Seems I have to remove that if we want it to be general. (just curious) do you plan to extend it for your case?

I don't see how there would be any functional difference between using IsSampleColdFuncCovGen and IsXXXColdFuncCovGen, but I could be missing something. If we can have a single flag for all cases, I think that would be cleaner and I can also try to use it for my case.

I don't see how there would be any functional difference between using IsSampleColdFuncCovGen and IsXXXColdFuncCovGen, but I could be missing something. If we can have a single flag for all cases, I think that would be cleaner and I can also try to use it for my case.

I see, sounds good.

My previous thought is: there could be functional difference for using the -instrument-cold-function-coverage flag with sampling PGO flag(-fprofile-sample-use) vs without sampling PGO flags. With the sampling PGO flag, if a function symbol shows in the binary but not in the profile, we can treat it as cold function(set 0 entry count), we then would instrument those function. But without sampling PGO flag(also no other PGO options), the entry count is unknown(-1), then we don't instrument them.

So user needs to be aware of the combination use of those flags, I was then trying to prevent the misuse, to disallow the use of --instrument-cold-function-coverage without (sampling) PGO use options(use the assertion in the version).

But on a second thought, if we want to extend for other PGO options, say if the profile annotation and instrumentation are not in the same pipeline, it's probably hard to check this locally, we can leave it to the build system.

if a function symbol shows in the binary but not in the profile, we can treat it as cold function(set 0 entry count)

I wasn't aware that SamplePGO did this. This is indeed different than IRPGO. Perhaps it makes more sense to use a separate flag to control this case. Maybe something like -instrument-cold-function-coverage-mode={optimistic,conservative} where optimistic means we optimistically treat unprofiled functions as cold and conservative means we conservatively assume unprofiled functions is hot.

Good point, that's more flexible, thanks!

…OInstrPasses

ellishg

The backend looks good to me! (after resolving my last comments of course)
Thanks for adding this feature!

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp

clang/include/clang/Driver/Options.td

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp

WenleiHe · 2024-10-17T01:00:02Z

clang/lib/Driver/ToolChains/Clang.cpp

+        Twine("--instrument-cold-function-coverage-path=") + Path));
+    CmdArgs.push_back("-mllvm");
+    CmdArgs.push_back("--instrument-cold-function-coverage");


These two flags seem duplicative. Path is guaranteed to be not empty here, so the boolean flag seems unnecessary?

Relatedly, I know this might not be easy. But while it's convenient to accept a new driver flag, I still kept wondering if we can communicate the single driver flag to LLVM using existing and orthogonal flags as much as possible. Like this:

-fprofile-instrument-path= // reuse existing flag to communicate file path -pgo-function-entry-coverage // reuse existing flag to communicate coverage only -pgo-instrument-cold-function-only // add new flag to communicate cold function only

I know that the current implementation has assumption when InstrProfileOutput is not empty, but I still wonder is there a way recognize the use of these three flags together, and special case for that.

I really like this idea, however, this seems indeed not easy(needs a lot of refactoring).

As you said, the InstrProfileOutput filed is shared with other PGO flag(in our case, it's the fprofile-sample-use=) , see PGOOptions.ProfileFIle. One thing we can do is to split this field into two: ProfileUseFile and ProfileGenFole, let all profile-<instr>-gererate go to ProfileGenFile and let sample-use/instr-use go to ProfileUseFile. I think that's doable and not a big change(maybe a separate diff). Then in sample-use case, we can use the ProfileGenFIle path for cold coverage work. WDYT?

Though it would still not be perfect, the logic here https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/BackendUtil.cpp#L811-L841 is complicated/ would still need more refactoring. I was stuck here when thinking if -fprofile-sample-use and --fprofile-instrument-path= are both used, which branch(CodeGenOpts.hasProfileIRInstr() or !CodeGenOpts.SampleProfileFile.empty()) should it belongs to.

For this version, I just removed the duplicated flag(--pgo-instrument-cold-function-only) and added a FIXME comment to explain this.

WenleiHe · 2024-10-17T01:00:47Z

clang/include/clang/Driver/Options.td

@@ -1784,6 +1784,12 @@ defm debug_info_for_profiling : BoolFOption<"debug-info-for-profiling",
  PosFlag<SetTrue, [], [ClangOption, CC1Option],
          "Emit extra debug info to make sample profile more accurate">,
  NegFlag<SetFalse>>;
+def fprofile_generate_cold_function_coverage : Flag<["-"], "fprofile-generate-cold-function-coverage">, 


+1 on single driver flag for convenience.

WenleiHe

Looks good enough to me with the FIXME :) Thanks!

llvm/include/llvm/Transforms/Instrumentation/PGOInstrumentation.h

abhina-sree · 2024-10-29T19:40:00Z

Hi, this PR is causing a regression on the AIX bot here https://lab.llvm.org/buildbot/#/builders/64/builds/1321/steps/6/logs/FAIL__Clang__pgo-cold-function-coverage_c. Would you be able to take a look? I think it can be resolved by using clang_cc1 in the testcase of marking it unsupported for AIX (see 951919e)

wlei-llvm · 2024-10-29T20:27:02Z

Hi, this PR is causing a regression on the AIX bot here https://lab.llvm.org/buildbot/#/builders/64/builds/1321/steps/6/logs/FAIL__Clang__pgo-cold-function-coverage_c. Would you be able to take a look? I think it can be resolved by using clang_cc1 in the testcase of marking it unsupported for AIX (see 951919e)

Sorry for the breakage, looks like we can bypass it by adding --target= like f92551b

abhina-sree · 2024-10-30T12:32:50Z

Hi, this PR is causing a regression on the AIX bot here https://lab.llvm.org/buildbot/#/builders/64/builds/1321/steps/6/logs/FAIL__Clang__pgo-cold-function-coverage_c. Would you be able to take a look? I think it can be resolved by using clang_cc1 in the testcase of marking it unsupported for AIX (see 951919e)

Sorry for the breakage, looks like we can bypass it by adding --target= like f92551b

Thank you for the quick fix!

rnk · 2024-10-30T19:56:58Z

llvm/lib/Passes/PassBuilderPipelines.cpp

@@ -1182,8 +1187,13 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
  const bool IsCtxProfUse =
      !UseCtxProfile.empty() && Phase == ThinOrFullLTOPhase::ThinLTOPreLink;

+  // Enable cold function coverage instrumentation if
+  // InstrumentColdFuncOnlyPath is provided.
+  const bool IsColdFuncOnlyInstrGen = PGOInstrumentColdFunctionOnly =


Setting this global variable here has introduced a data race detected by TSan when different threads set up different pass pipelines. Was this intentional? Did you mean to use == instead of =?

Thanks for pointing out. It's intentional to set the global variable, I wasn't aware it will cause data race. I will try to change it to local one.

This causes TSAN tests failures all over our infrastructure in anything that uses LLVM jit, i.e. Tensorflow/XLA

One of the opens-source examples is https://github.com/google-deepmind/mujoco_menagerie/blob/main/test/model_test.py

Traces look like:

WARNING: ThreadSanitizer: data race (pid=4749) Write of size 1 at 0x7f83f3bd09f8 by thread T96 (mutexes: write M0): #0 setValue<bool> [third_party/llvm/llvm-project/llvm/include/llvm/Support/CommandLine.h:1401](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/Support/CommandLine.h?l=1401&ws=dmitryc/38224&snapshot=3):11 (libthird_Uparty_Sllvm_Sllvm-project_Sllvm_SlibPasses.so+0x2eb43b) (BuildId: a0d2dfe7395d03fede80b0393e3acc55) #1 operator=<bool> [third_party/llvm/llvm-project/llvm/include/llvm/Support/CommandLine.h:1492](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/Support/CommandLine.h?l=1492&ws=dmitryc/38224&snapshot=3):11 (libthird_Uparty_Sllvm_Sllvm-project_Sllvm_SlibPasses.so+0x2eb43b) #2 llvm::PassBuilder::buildModuleSimplificationPipeline(llvm::OptimizationLevel, llvm::ThinOrFullLTOPhase) [third_party/llvm/llvm-project/llvm/lib/Passes/PassBuilderPipelines.cpp:1192](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/lib/Passes/PassBuilderPipelines.cpp?l=1192&ws=dmitryc/38224&snapshot=3):69 (libthird_Uparty_Sllvm_Sllvm-project_Sllvm_SlibPasses.so+0x2eb43b) #3 llvm::PassBuilder::buildPerModuleDefaultPipeline(llvm::OptimizationLevel, bool) [third_party/llvm/llvm-project/llvm/lib/Passes/PassBuilderPipelines.cpp:1620](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/lib/Passes/PassBuilderPipelines.cpp?l=1620&ws=dmitryc/38224&snapshot=3):15 (libthird_Uparty_Sllvm_Sllvm-project_Sllvm_SlibPasses.so+0x2f1ef4) (BuildId: a0d2dfe7395d03fede80b0393e3acc55) #4 xla::cpu::CompilerFunctor::operator()(llvm::Module&) [third_party/tensorflow/compiler/xla/service/cpu/compiler_functor.cc:181](https://cs.corp.google.com/piper///depot/google3/third_party/tensorflow/compiler/xla/service/cpu/compiler_functor.cc?l=181&ws=dmitryc/38224&snapshot=3):19 (libthird_Uparty_Stensorflow_Scompiler_Sxla_Sservice_Scpu_Slibcompiler_Ufunctor.so+0xafde) (BuildId: 036848c7284f8ba35b41d1e7f52e58ff) ... WARNING: ThreadSanitizer: data race (pid=4749) Write of size 1 at 0x7f83f3bd09f8 by thread T96 (mutexes: write M0): #0 setValue<bool> [third_party/llvm/llvm-project/llvm/include/llvm/Support/CommandLine.h:1401](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/Support/CommandLine.h?l=1401&ws=dmitryc/38224&snapshot=3):11 (libthird_Uparty_Sllvm_Sllvm-project_Sllvm_SlibPasses.so+0x2eb43b) (BuildId: a0d2dfe7395d03fede80b0393e3acc55) #1 operator=<bool> [third_party/llvm/llvm-project/llvm/include/llvm/Support/CommandLine.h:1492](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/include/llvm/Support/CommandLine.h?l=1492&ws=dmitryc/38224&snapshot=3):11 (libthird_Uparty_Sllvm_Sllvm-project_Sllvm_SlibPasses.so+0x2eb43b) #2 llvm::PassBuilder::buildModuleSimplificationPipeline(llvm::OptimizationLevel, llvm::ThinOrFullLTOPhase) [third_party/llvm/llvm-project/llvm/lib/Passes/PassBuilderPipelines.cpp:1192](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/lib/Passes/PassBuilderPipelines.cpp?l=1192&ws=dmitryc/38224&snapshot=3):69 (libthird_Uparty_Sllvm_Sllvm-project_Sllvm_SlibPasses.so+0x2eb43b) #3 llvm::PassBuilder::buildPerModuleDefaultPipeline(llvm::OptimizationLevel, bool) [third_party/llvm/llvm-project/llvm/lib/Passes/PassBuilderPipelines.cpp:1620](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/lib/Passes/PassBuilderPipelines.cpp?l=1620&ws=dmitryc/38224&snapshot=3):15 (libthird_Uparty_Sllvm_Sllvm-project_Sllvm_SlibPasses.so+0x2f1ef4) (BuildId: a0d2dfe7395d03fede80b0393e3acc55) #4 xla::cpu::CompilerFunctor::operator()(llvm::Module&) [third_party/tensorflow/compiler/xla/service/cpu/compiler_functor.cc:181](https://cs.corp.google.com/piper///depot/google3/third_party/tensorflow/compiler/xla/service/cpu/compiler_functor.cc?l=181&ws=dmitryc/38224&snapshot=3):19 (libthird_Uparty_Stensorflow_Scompiler_Sxla_Sservice_Scpu_Slibcompiler_Ufunctor.so+0xafde) (BuildId: 036848c7284f8ba35b41d1e7f52e58ff) ....

I'm sorry but had to revert it in 06e28ed and d924a9b

Should be fixed by #114364

…09837)" This reverts commit e517cfc.

…lvm#109837)" This reverts commit d924a9b.

…14364) In #109837, it sets a global variable(`PGOInstrumentColdFunctionOnly`) in PassBuilderPipelines.cpp which introduced a data race detected by TSan. To fix this, I decouple the flag setting, the flags are now set separately(`instrument-cold-function-only-path` is required to be used with `--pgo-instrument-cold-function-only`).

…vm#109837)" This reverts commit e517cfc.

…vm#114364) In llvm#109837, it sets a global variable(`PGOInstrumentColdFunctionOnly`) in PassBuilderPipelines.cpp which introduced a data race detected by TSan. To fix this, I decouple the flag setting, the flags are now set separately(`instrument-cold-function-only-path` is required to be used with `--pgo-instrument-cold-function-only`).

This patch adds support for cold function coverage instrumentation based on sampling PGO counts. The major motivation is to detect dead functions for the services that are optimized with sampling PGO. If a function is covered by sampling profile count (e.g., those with an entry count > 0), we choose to skip instrumenting those functions, which significantly reduces the instrumentation overhead. More details about the implementation and flags: - Added a flag `--pgo-instrument-cold-function-only` in `PGOInstrumentation.cpp` as the main switch to control skipping the instrumentation. - Built the extra instrumentation passes(a bundle of passes in `addPGOInstrPasses`) under sampling PGO pipeline. This is controlled by `--instrument-cold-function-only-path` flag. - Added a driver flag `-fprofile-generate-cold-function-coverage`: - 1) Config the flags in one place, i,e. adding `--instrument-cold-function-only-path=<...>` and `--pgo-function-entry-coverage`. Note that the instrumentation file path is passed through `--instrument-sample-cold-function-path`, because we cannot use the `PGOOptions.ProfileFile` as it's already used by `-fprofile-sample-use=<...>`. - 2) makes linker to link `compiler_rt.profile` lib(see [ToolChain.cpp#L1125-L1131](https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChain.cpp#L1125-L1131) ). - Added a flag(`--pgo-cold-instrument-entry-threshold`) to config entry count to determine cold function. Overall, the full command is like: ``` clang++ -O2 -fprofile-generate-cold-function-coverage=<...> -fprofile-sample-use=<...> code.cc -o code ```

…vm#109837)" This reverts commit e517cfc.

…vm#114364) In llvm#109837, it sets a global variable(`PGOInstrumentColdFunctionOnly`) in PassBuilderPipelines.cpp which introduced a data race detected by TSan. To fix this, I decouple the flag setting, the flags are now set separately(`instrument-cold-function-only-path` is required to be used with `--pgo-instrument-cold-function-only`).

wlei-llvm force-pushed the sample-cold branch 3 times, most recently from 36a747b to 436ab3b Compare September 27, 2024 20:29

wlei-llvm mentioned this pull request Sep 27, 2024

[InstrPGO] Instrument sampling profile based cold function v2 #110330

Closed

[InstrPGO] Instrument sampling profile based cold function

07a2cab

wlei-llvm force-pushed the sample-cold branch from 436ab3b to 07a2cab Compare September 30, 2024 17:11

wlei-llvm requested review from mtrofin, ellishg, WenleiHe, david-xl, snehasish, xur-llvm and MatzeB October 1, 2024 17:40

wlei-llvm marked this pull request as ready for review October 1, 2024 17:46

llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' PGO Profile Guided Optimizations llvm:transforms labels Oct 1, 2024

mtrofin reviewed Oct 1, 2024

View reviewed changes

fix lint in cl desc msg

b3054f0

ellishg reviewed Oct 1, 2024

View reviewed changes

address comments, ad fix test, add driver test, refactoring the addPG…

5a2848c

…OInstrPasses

wlei-llvm added 2 commits October 8, 2024 10:07

make it general, remove sample from vaiable name

f07457f

add a flag to control unprofiled function

66f837a

ellishg approved these changes Oct 9, 2024

View reviewed changes

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp Outdated Show resolved Hide resolved

address comments

b1f0105

WenleiHe reviewed Oct 17, 2024

View reviewed changes

addressing comments

3c4e85b

WenleiHe approved these changes Oct 24, 2024

View reviewed changes

llvm/include/llvm/Transforms/Instrumentation/PGOInstrumentation.h Outdated Show resolved Hide resolved

wlei-llvm added 2 commits October 27, 2024 10:53

remove InstrColdFuncCovMode

d54fbe0

Merge branch 'main' into sample-cold

5e0d106

wlei-llvm changed the title ~~[InstrPGO] Instrument sampling profile based cold function~~ [InstrPGO] Support cold function coverage instrumentation Oct 28, 2024

wlei-llvm merged commit e517cfc into llvm:main Oct 28, 2024
8 checks passed

wlei-llvm deleted the sample-cold branch October 29, 2024 20:32

rnk reviewed Oct 30, 2024

View reviewed changes

wlei-llvm mentioned this pull request Oct 31, 2024

[InstrPGO] Avoid using global variable to fix potential data race #114364

Merged

steelannelida pushed a commit that referenced this pull request Oct 31, 2024

Revert "[InstrPGO] Support cold function coverage instrumentation (#1…

d924a9b

…09837)" This reverts commit e517cfc.

wlei-llvm added a commit to wlei-llvm/llvm-project that referenced this pull request Oct 31, 2024

Reapply "[InstrPGO] Support cold function coverage instrumentation (l…

a7b444b

…lvm#109837)" This reverts commit d924a9b.

smallp-o-p pushed a commit to smallp-o-p/llvm-project that referenced this pull request Nov 3, 2024

Revert "[InstrPGO] Support cold function coverage instrumentation (ll…

de7a5e9

…vm#109837)" This reverts commit e517cfc.

NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024

Revert "[InstrPGO] Support cold function coverage instrumentation (ll…

ebcd4e4

…vm#109837)" This reverts commit e517cfc.

ellishg mentioned this pull request Jan 8, 2025

Deprecate order file instrumentation #121514

Merged

[InstrPGO] Support cold function coverage instrumentation #109837

[InstrPGO] Support cold function coverage instrumentation #109837

Uh oh!

Conversation

wlei-llvm commented Sep 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 1, 2024

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wlei-llvm Oct 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mtrofin Oct 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

snehasish commented Oct 1, 2024

Uh oh!

ellishg commented Oct 1, 2024

Uh oh!

wlei-llvm commented Oct 1, 2024

Uh oh!

wlei-llvm commented Oct 1, 2024

Uh oh!

ellishg commented Oct 1, 2024

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ellishg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wlei-llvm commented Sep 24, 2024 •

edited

Loading

github-actions bot commented Sep 24, 2024 •

edited

Loading

llvmbot commented Oct 1, 2024 •

edited

Loading

wlei-llvm Oct 1, 2024 •

edited

Loading

mtrofin Oct 2, 2024 •

edited

Loading