Skip to content

[ThinLTO][TypeProf] Import local-linkage global var for mod1:func_foo-> mod2:local-var edge #100448

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 25, 2024

Conversation

mingmingl-llvm
Copy link
Contributor

@mingmingl-llvm mingmingl-llvm commented Jul 24, 2024

VTable value profiling can create reference edges from mod1:func_foo to mod2:local-vtable. Indirect call profiling can create reference edges from mod1:func_foo to mod2:local_func_bar.

Given a ref chain mod1:func_foo -> mod2:local-var,local-var doesn't get imported by default.

Compiler checks / requires the module of 'local-var' is the same as the function that referenced it(mod1:func_foo). This is to prevent mis-compilation when both mod1 and mod2 has local-var of the same name, and cpp files are compiled without full path.

This patch allows the import when one of the following conditions happen:

  1. Introduce an option import-assume-local-unique. When the compiler user can guarantee that all files are compiled with full paths, they can set this option.
  2. When there is one instance of value summary.

Test:

  • A/B testing this option alone gives -0.16% statistically consistent cpu cycle reduction on one search workload (no throughput increase)
  • Testing it together with existing more-efficient ICP bumps the throughput increase by a margin (0.05%~0.1%)
  • No regressions observed.

@mingmingl-llvm mingmingl-llvm marked this pull request as ready for review July 24, 2024 18:48
@llvmbot llvmbot added compiler-rt PGO Profile Guided Optimizations LTO Link time optimization (regular/full LTO or ThinLTO) llvm:transforms labels Jul 24, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 24, 2024

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-pgo

@llvm/pr-subscribers-lto

Author: Mingming Liu (minglotus-6)

Changes

Given a ref chain mod1:func_foo -> mod2:func_bar -> mod2:local-var, when func_bar gets imported into mod1, local-var doesn't get imported by default. Compiler checks / requires the module of 'local-var' is the same as the function that starts the import (mod1:func_foo). This is to prevent mis-compilation when both mod1 and mod2 has local-var of the same name, and cpp files are compiled without full path (//path//to/file.cpp).

This patch introduces an option for compiler user. If compiler user can guarantee that all files are compiled with full paths, they can set this option on to get mod2:local-var imported into mod1 for the case above.

Test:

  • A/B testing this option alone gives -0.16% statistically consistent cpu cycle reduction on one search workload (no throughput increase)
  • Testing it together with existing more-efficient ICP bumps the throughput increase by a margin (0.05%~0.1%)

Full diff: https://github.com/llvm/llvm-project/pull/100448.diff

2 Files Affected:

  • (modified) compiler-rt/test/profile/Linux/instrprof-vtable-value-prof.cpp (+22-8)
  • (modified) llvm/lib/Transforms/IPO/FunctionImport.cpp (+14)
diff --git a/compiler-rt/test/profile/Linux/instrprof-vtable-value-prof.cpp b/compiler-rt/test/profile/Linux/instrprof-vtable-value-prof.cpp
index 411640fc5176d..4ef7b2da2f572 100644
--- a/compiler-rt/test/profile/Linux/instrprof-vtable-value-prof.cpp
+++ b/compiler-rt/test/profile/Linux/instrprof-vtable-value-prof.cpp
@@ -1,3 +1,4 @@
+
 // REQUIRES: lld, lld-available
 
 // Building the instrumented binary will fail because lld doesn't support
@@ -5,7 +6,7 @@
 // ld.lld: error: /lib/../lib64/Scrt1.o: ABI version 1 is not supported
 // UNSUPPORTED: ppc && host-byteorder-big-endian
 
-// RUN: rm -rf %t && mkdir %t && cd %t
+// RUN: rm -rf %t && split-file %s %t && cd %t
 
 // RUN: %clangxx_pgogen -fuse-ld=lld -O2 -fprofile-generate=. -mllvm -enable-vtable-value-profiling %s -o test
 // RUN: env LLVM_PROFILE_FILE=test.profraw ./test
@@ -141,18 +142,19 @@
 // RUN:    -g -flto=thin -fwhole-program-vtables -fno-split-lto-unit -O2 \
 // RUN:    -mllvm -enable-vtable-value-profiling -Wl,-mllvm,-enable-vtable-value-profiling \
 // RUN:    -mllvm -enable-vtable-profile-use \
+// RUN:    -Wl,-plugin-opt,-import-assume-unique-local \
 // RUN:    -Wl,-mllvm,-enable-vtable-profile-use -Rpass=pgo-icall-prom \
 // RUN:    -Wl,-mllvm,-print-after=pgo-icall-prom \
 // RUN:    -Wl,-mllvm,-filter-print-funcs=main %s 2>&1 \
 // RUN:    | FileCheck %s --check-prefixes=REMARK,IR --implicit-check-not="!VP"
 
 // For the indirect call site `ptr->func`
-// REMARK: instrprof-vtable-value-prof.cpp:226:19: Promote indirect call to _ZN12_GLOBAL__N_18Derived24funcEii with count 150 out of 200, sink 1 instruction(s) and compare 1 vtable(s): {_ZTVN12_GLOBAL__N_18Derived2E}
-// REMARK: instrprof-vtable-value-prof.cpp:226:19: Promote indirect call to _ZN8Derived14funcEii with count 50 out of 50, sink 1 instruction(s) and compare 1 vtable(s): {_ZTV8Derived1}
+// REMARK: instrprof-vtable-value-prof.cpp:240:19: Promote indirect call to _ZN12_GLOBAL__N_18Derived24funcEii with count 150 out of 200, sink 1 instruction(s) and compare 1 vtable(s): {_ZTVN12_GLOBAL__N_18Derived2E}
+// REMARK: instrprof-vtable-value-prof.cpp:240:19: Promote indirect call to _ZN8Derived14funcEii with count 50 out of 50, sink 1 instruction(s) and compare 1 vtable(s): {_ZTV8Derived1}
 //
 // For the indirect call site `delete ptr`
-// REMARK: instrprof-vtable-value-prof.cpp:228:5: Promote indirect call to _ZN12_GLOBAL__N_18Derived2D0Ev with count 750 out of 1000, sink 2 instruction(s) and compare 1 vtable(s): {_ZTVN12_GLOBAL__N_18Derived2E}
-// REMARK: instrprof-vtable-value-prof.cpp:228:5: Promote indirect call to _ZN8Derived1D0Ev with count 250 out of 250, sink 2 instruction(s) and compare 1 vtable(s): {_ZTV8Derived1}
+// REMARK: instrprof-vtable-value-prof.cpp:242:5: Promote indirect call to _ZN12_GLOBAL__N_18Derived2D0Ev with count 750 out of 1000, sink 2 instruction(s) and compare 1 vtable(s): {_ZTVN12_GLOBAL__N_18Derived2E}
+// REMARK: instrprof-vtable-value-prof.cpp:242:5: Promote indirect call to _ZN8Derived1D0Ev with count 250 out of 250, sink 2 instruction(s) and compare 1 vtable(s): {_ZTV8Derived1}
 
 // The IR matchers for indirect callsite `ptr->func`.
 // IR-LABEL: @main
@@ -185,6 +187,7 @@
 // IR: [[MERGE0]]:
 // IR:    [[RES2:%.*]] = phi i32 [ [[RES1]], %[[MERGE1]] ], [ [[RESBB1]], %[[BB1]] ]
 
+//--- lib.h
 #include <stdio.h>
 #include <stdlib.h>
 class Base {
@@ -193,21 +196,31 @@ class Base {
 
   virtual ~Base() {};
 };
+
 class Derived1 : public Base {
 public:
-  int func(int a, int b) override { return a * b; }
+  int func(int a, int b) override;
 
   ~Derived1() {}
 };
 namespace {
 class Derived2 : public Base {
 public:
-  int func(int a, int b) override { return a * (a - b); }
+  int func(int a, int b) override;
 
   ~Derived2() {}
 };
 } // namespace
-__attribute__((noinline)) Base *createType(int a) {
+
+__attribute__((noinline)) Base *createType(int a);
+
+//-- lib.cpp
+
+int Derived1::func(int a, int b) { return a * b; }
+
+int Derived2::func(int a, int b) { return a * (a - b); }
+
+Base *createType(int a) {
   Base *base = nullptr;
   if (a % 4 == 0)
     base = new Derived1();
@@ -216,6 +229,7 @@ __attribute__((noinline)) Base *createType(int a) {
   return base;
 }
 
+//--- main.cpp
 int main(int argc, char **argv) {
   int sum = 0;
   for (int i = 0; i < 1000; i++) {
diff --git a/llvm/lib/Transforms/IPO/FunctionImport.cpp b/llvm/lib/Transforms/IPO/FunctionImport.cpp
index 038785114a0cf..839c09fa9d586 100644
--- a/llvm/lib/Transforms/IPO/FunctionImport.cpp
+++ b/llvm/lib/Transforms/IPO/FunctionImport.cpp
@@ -174,6 +174,18 @@ static cl::opt<std::string> WorkloadDefinitions(
              "}"),
     cl::Hidden);
 
+static cl::opt<bool> ImportAssumeUniqueLocal(
+    "import-assume-unique-local", cl::init(false),
+    cl::desc(
+        "By default, a local-linkage global variable won't be imported in the "
+        "edge mod1:func -> mod2:func_bar -> mod2:local-var since compiler "
+        "cannot assume mod2 is compiled with full path or local-var has a "
+        "unique GUID. "
+        "Set this option to true will help cross-module import of such "
+        "variables. But it is only safe if the compiler user specify the full "
+        "module path."),
+    cl::Hidden);
+
 namespace llvm {
 extern cl::opt<bool> EnableMemProfContextDisambiguation;
 }
@@ -367,6 +379,8 @@ class GlobalsImporter final {
       // was not distinguishing path.
       auto LocalNotInModule =
           [&](const GlobalValueSummary *RefSummary) -> bool {
+        if (ImportAssumeUniqueLocal)
+          return false;
         return GlobalValue::isLocalLinkage(RefSummary->linkage()) &&
                RefSummary->modulePath() != Summary.modulePath();
       };

static cl::opt<bool> ImportAssumeUniqueLocal(
"import-assume-unique-local", cl::init(false),
cl::desc(
"By default, a local-linkage global variable won't be imported in the "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the local var be promoted to global to enable importing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As shown in the updated test, with this option on, when the local-var gets imported, it will have a global name <vtable>.llvm.<mod-hash>.

@teresajohnson
Copy link
Contributor

Mentioned offline to Mingming, but this code is looking at the module of the immediate referencing function, so that would be mod2. Mingming will dig a bit to see why this isn't working.

@mingmingl-llvm
Copy link
Contributor Author

Mentioned offline to Mingming, but this code is looking at the module of the immediate referencing function, so that would be mod2. Mingming will dig a bit to see why this isn't working.

I think you are correct that local-var would get imported with the ref chain mod1:func_foo -> mod2:func_bar -> mod2:local-var.

Here is why import of local-var doesn't happen for the test case without this option

  1. local-vtable for Derived2 get annotated on main function in main.cpp with -enable-vtable-value-profiling on.
  2. createType is noinline so it doesn't get imported from lib.cpp to main.cpp
  3. main wants to import Derived2 and won't do it without this option.

The diff in [1] gives this debugging log

./bin/clang --driver-mode=g++ -m64 -fuse-ld=lld -ldl -m64 -fprofile-use=test.profdata -Wl,--lto-whole-program-visibility -mllvm -disable-icp=true -Wl,-mllvm,-disable-icp=false -fuse-ld=lld -g -flto=thin -fwhole-program-vtables -fno-split-lto-unit -O2 -mllvm -enable-vtable-value-profiling -Wl,-mllvm,-enable-vtable-value-profiling -mllvm -enable-vtable-profile-use -Wl,-mllvm,-enable-vtable-profile-use -Rpass=pgo-icall-prom -Wl,-mllvm,-print-after=pgo-icall-prom -Wl,-mllvm,-filter-print-funcs=main lib.cpp main.cpp

Ref VI is 10484448250040508243 (_ZTVN12_GLOBAL__N_18Derived2E)
/tmp/lib-ec0ef8.o
/tmp/main-c0fd89.o
	Local 10484448250040508243 (_ZTVN12_GLOBAL__N_18Derived2E) is in module /tmp/lib-ec0ef8.obut NOT in module /tmp/main-c0fd89.o

[1]

--- a/llvm/lib/Transforms/IPO/FunctionImport.cpp
+++ b/llvm/lib/Transforms/IPO/FunctionImport.cpp
@@ -385,8 +385,15 @@ class GlobalsImporter final {
         errs() << "Ref VI is " << VI << "\n";
         errs() << RefSummary->modulePath() << "\n";
         errs() << Summary.modulePath() << "\n";
-        return GlobalValue::isLocalLinkage(RefSummary->linkage()) &&
-               RefSummary->modulePath() != Summary.modulePath();
+
+        const bool LocalNotInMod =
+            GlobalValue::isLocalLinkage(RefSummary->linkage()) &&
+            RefSummary->modulePath() != Summary.modulePath();
+        if (LocalNotInMod)
+          errs() << "\tLocal " << VI << " is in module "
+                 << RefSummary->modulePath() << "but NOT in module "
+                 << Summary.modulePath() << "\n";
+        return LocalNotInMod;
       };
 
       for (const auto &RefSummary : VI.getSummaryList()) {

@mingmingl-llvm mingmingl-llvm changed the title [ThinLTO][TypeProf] Allow importing of local-linkage global variables in mod1:func_foo->mod2:func_bar -> mod2:local-var chain [ThinLTO][TypeProf] Allow importing of local-linkage global variables in mod1:func_foo-> mod2:local-var chain Jul 24, 2024
@mingmingl-llvm
Copy link
Contributor Author

Note the local-linkage indirect callee get imported without turning on the option in this patch. Teresa pointed me to

CalleeSummaryList.size() > 1 &&

I'll update the PR to make the indirect references (vtables or functions) handling consistent.

@mingmingl-llvm mingmingl-llvm changed the title [ThinLTO][TypeProf] Allow importing of local-linkage global variables in mod1:func_foo-> mod2:local-var chain [ThinLTO][TypeProf] Allow importing of local-linkage global variables in mod1:func_foo-> mod2:local-var chain under an option Jul 24, 2024
@mingmingl-llvm mingmingl-llvm requested a review from david-xl July 24, 2024 21:23
@mingmingl-llvm
Copy link
Contributor Author

Note the local-linkage indirect callee get imported without turning on the option in this patch. Teresa pointed me to

CalleeSummaryList.size() > 1 &&

I'll update the PR to make the indirect references (vtables or functions) handling consistent.

This is implemented. PTAL, thanks!

Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with a couple of small suggestions below. Before you submit please update the patch description to fix the example and description about it needing to be the "same as the function that starts the import" and to instead describe the value profile case.

@@ -196,6 +207,23 @@ static std::unique_ptr<Module> loadFile(const std::string &FileName,
return Result;
}

static bool shouldSkipLocalInAnotherModule(const GlobalVarSummary *RefSummary,
size_t NumRefs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the number of defs, not refs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

size_t NumRefs,
StringRef ImporterModule) {
// We can import a local from another module if all inputs are compiled
// with full paths or when there is one entry in the list.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"when there is one definition"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@mingmingl-llvm mingmingl-llvm changed the title [ThinLTO][TypeProf] Allow importing of local-linkage global variables in mod1:func_foo-> mod2:local-var chain under an option [ThinLTO][TypeProf] Import local-linkage global var for mod1:func_foo-> mod2:local-var edge under an option Jul 25, 2024
@mingmingl-llvm
Copy link
Contributor Author

lgtm with a couple of small suggestions below. Before you submit please update the patch description to fix the example and description about it needing to be the "same as the function that starts the import" and to instead describe the value profile case.

I updated the description to mention value profiles creates the reference edges, and that vars with one definition gets imported

@mingmingl-llvm mingmingl-llvm changed the title [ThinLTO][TypeProf] Import local-linkage global var for mod1:func_foo-> mod2:local-var edge under an option [ThinLTO][TypeProf] Import local-linkage global var for mod1:func_foo-> mod2:local-var edge Jul 25, 2024
@mingmingl-llvm mingmingl-llvm merged commit ac1a1e5 into llvm:main Jul 25, 2024
4 of 6 checks passed
@dyung
Copy link
Collaborator

dyung commented Jul 25, 2024

Hi @minglotus-6, your change seems to be causing around 11 test failures on at least the following bots:

Can you take a look and revert if you need time to investigate?

@mingmingl-llvm mingmingl-llvm deleted the vtableref branch July 25, 2024 04:12
@mingmingl-llvm
Copy link
Contributor Author

Hi @minglotus-6, your change seems to be causing around 11 test failures on at least the following bots:

Can you take a look and revert if you need time to investigate?

Hi! Sorry for the breakage. #100508 should fix the test failure (running ninja check-llvm now)

yuxuanchen1997 pushed a commit that referenced this pull request Jul 25, 2024
…-> mod2:local-var edge (#100448)

Summary:
VTable value profiling can create reference edges from `mod1:func_foo`
to `mod2:local-vtable`. Indirect call profiling can create reference
edges from `mod1:func_foo` to `mod2:local_func_bar`.

Given a ref chain `mod1:func_foo -> mod2:local-var`,`local-var` doesn't
get imported by default.

Compiler checks / requires the module of 'local-var' is the same as the
function that referenced it(`mod1:func_foo`). This is to prevent
mis-compilation when both `mod1` and `mod2` has `local-var` of the same
name, and cpp files are compiled without full path.

This patch allows the import when one of the following conditions
happen:
1) Introduce an option `import-assume-local-unique`. When the compiler
user can guarantee that all files are compiled with full paths, they can
set this option.
2) When there is one instance of value summary.

Test:
* A/B testing this option alone gives -0.16% statistically consistent
cpu cycle reduction on one search workload (no throughput increase)
* Testing it together with existing more-efficient ICP bumps the
throughput increase by a margin (0.05%~0.1%)
* No regressions observed.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60250563
"edge mod1:func -> mod2:local-var (from value profiles) since compiler "
"cannot assume mod2 is compiled with full path which gives local-var a "
"program-wide unique GUID. Set this option to true will help cross "
"module import of such variables. This is only safe if the compiler "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to learn more about this space. Can you elaborate on "This is only safe if the compiler user specify the full module path." How does user specify it? I looked at the test and it's not clear to me from it.

Copy link
Contributor Author

@mingmingl-llvm mingmingl-llvm Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does user specify it?

In a monolithic repository (https://research.google/pubs/why-google-stores-billions-of-lines-of-code-in-a-single-repository/) of source code, user specifies the full path in the compile options. In practice, bazel (https://bazel.build/) configures the build rules and guarantees full path is used. Users invoke bazel to compile an executable.

Using the folder structure below as an example [1], clang -O2 <other-options> /tmp/src1/a.cpp /tmp/src1/b.cpp /tmp/src2/a.cpp /tmp/src2/b.cpp /tmp/src2/c.cpp gives the full path, and cd /tmp/src1; clang -O2 a.cpp b.cpp <other arguments to specify paths in src2> doesn't.

I looked at the test and it's not clear to me from it.

The purpose of the test is to illustrate and provide coverage for profile-gen and its use, so I saved the extra work to use full path there.

[1]

/tmp/
|── src1/
│   ├── a.cpp
│   ├── b.cpp
├── src2/
│   ├── a.cpp
│   ├── b.cpp
│   └──c.cpp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test program itself is correct as there are only two files and they don't have conflict names.

I'm open to update it to use the full path if that is preferred or better.

Copy link
Contributor Author

@mingmingl-llvm mingmingl-llvm Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I double checked whether compiler correctness can rely on the fact that full file paths are used. My takeaway is that bazel doesn't fail the executable compilation if local paths are used (if any in a unconventional way) so I don't plan to turn on import-assume-unique-local.

Yet still NumDefs == 1 condition (https://github.com/llvm/llvm-project/pull/100448/files#diff-e7cb370fee46f0f773f2b5429dfab36b75126d3909ae98ee87ff3d0e3f75c6e9R215) should hold true for many internal-linkage vtables as long as full paths are indeed used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, thx for explanation.

mingmingl-llvm added a commit that referenced this pull request Aug 9, 2024
While manual compiles can specify full file paths and build automation
tools use full, unique paths in practice, it's not clear whether it's a
general good practice to enforce full paths (fail a build if relative
paths are used).

`NumDefs == 1` condition [1] should hold true for many internal-linkage
vtables as long as full paths are indeed used to salvage the marginal
performance when local-linkage vtables are imported due to indirect
reference.
#100448 (comment)
has more details.

[1]
https://github.com/llvm/llvm-project/pull/100448/files#diff-e7cb370fee46f0f773f2b5429dfab36b75126d3909ae98ee87ff3d0e3f75c6e9R215
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler-rt llvm:transforms LTO Link time optimization (regular/full LTO or ThinLTO) PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants