Skip to content

[ThinLTO]Record import type in GlobalValueSummary::GVFlags #87597

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 11, 2024

Conversation

mingmingl-llvm
Copy link
Contributor

@mingmingl-llvm mingmingl-llvm commented Apr 4, 2024

The motivating use case is to support import the function declaration across modules to construct call graph edges for indirect calls (mentioned in this rfc) when importing the function definition costs too much compile time (e.g., the function is too large has no noinline attribute).

  1. Currently, when the compiled IR module doesn't have a function definition but its postlink combined summary contains the function summary or a global alias summary with this function as aliasee, the function definition will be imported from source module by IRMover. The implementation is in FunctionImporter::importFunctions.
  2. In order for FunctionImporter to import a declaration of a function, both function summary and alias summary need to carry the def / decl state. Specifically, all existing summary fields doesn't differ across import modules, but the def / decl state of is decided by <ImportModule, Function>.

This change encodes the def/decl state in GlobalValueSummary::GVFlags.

In the subsequent changes

  1. The indexing step (computeImportForModule) will compute the set of definitions and the set of declarations for each module, and passing on the information to bitcode writer.
  2. Bitcode writer will look up the def/decl state and sets the state when it writes out the flag value. This is demonstrated in [ThinLTO][Bitcode] Generate import type in bitcode #87600
  3. Function importer will read the def/decl state when reading the combined summary to figure out two sets of global values, and IRMover will be updated to import the declaration (aka linkGlobalValuePrototype) into the destination module.

…eSummary::GVFlags

- This change doesn't set the bit in the summaries, just make the code changes.
@mingmingl-llvm mingmingl-llvm marked this pull request as ready for review April 4, 2024 20:04
@llvmbot
Copy link
Member

llvmbot commented Apr 4, 2024

@llvm/pr-subscribers-lld
@llvm/pr-subscribers-lld-elf
@llvm/pr-subscribers-clang
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-lto
@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-ir

Author: Mingming Liu (minglotus-6)

Changes
  • This change doesn't set the bit in the summaries, just make the code changes. Follow-up changes will update the ThinLTO indexing step to compute the import type for functions (and function aliases) and set the bit in bitcode writer.

    • The use case is to support import the function prototype to construct call graph edges for indirect calls (elaborated in this rfc) when importing the function definition costs too much compile time (e.g., the function is too large, or it is noinline).
  • The next change is [ThinLTO][Bitcode] Generate import type in bitcode #87600


Patch is 26.35 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/87597.diff

11 Files Affected:

  • (modified) llvm/include/llvm/AsmParser/LLToken.h (+1)
  • (modified) llvm/include/llvm/IR/ModuleSummaryIndex.h (+11-3)
  • (modified) llvm/include/llvm/IR/ModuleSummaryIndexYAML.h (+4-3)
  • (modified) llvm/lib/Analysis/ModuleSummaryAnalysis.cpp (+8-4)
  • (modified) llvm/lib/AsmParser/LLLexer.cpp (+1)
  • (modified) llvm/lib/AsmParser/LLParser.cpp (+12-3)
  • (modified) llvm/lib/Bitcode/Reader/BitcodeReader.cpp (+2-1)
  • (modified) llvm/lib/Bitcode/Writer/BitcodeWriter.cpp (+4-1)
  • (modified) llvm/lib/IR/AsmWriter.cpp (+1)
  • (modified) llvm/lib/IR/ModuleSummaryIndex.cpp (+2)
  • (modified) llvm/test/Assembler/thinlto-summary.ll (+23-23)
diff --git a/llvm/include/llvm/AsmParser/LLToken.h b/llvm/include/llvm/AsmParser/LLToken.h
index 65ccb1b81b3a87..21229683f2496b 100644
--- a/llvm/include/llvm/AsmParser/LLToken.h
+++ b/llvm/include/llvm/AsmParser/LLToken.h
@@ -370,6 +370,7 @@ enum Kind {
   kw_live,
   kw_dsoLocal,
   kw_canAutoHide,
+  kw_importAsDec,
   kw_function,
   kw_insts,
   kw_funcFlags,
diff --git a/llvm/include/llvm/IR/ModuleSummaryIndex.h b/llvm/include/llvm/IR/ModuleSummaryIndex.h
index d5ff15063671d8..7eba77540fa7ea 100644
--- a/llvm/include/llvm/IR/ModuleSummaryIndex.h
+++ b/llvm/include/llvm/IR/ModuleSummaryIndex.h
@@ -472,14 +472,20 @@ class GlobalValueSummary {
     /// means the symbol was externally visible.
     unsigned CanAutoHide : 1;
 
+    /// This field is written by the ThinLTO indexing step to postlink
+    /// per-module summary. It indicates whether a global value (function or
+    /// function alias, etc) should be imported as a definition or declaration.
+    unsigned ImportAsDec : 1;
+
     /// Convenience Constructors
     explicit GVFlags(GlobalValue::LinkageTypes Linkage,
                      GlobalValue::VisibilityTypes Visibility,
                      bool NotEligibleToImport, bool Live, bool IsLocal,
-                     bool CanAutoHide)
+                     bool CanAutoHide, bool ImportAsDec)
         : Linkage(Linkage), Visibility(Visibility),
           NotEligibleToImport(NotEligibleToImport), Live(Live),
-          DSOLocal(IsLocal), CanAutoHide(CanAutoHide) {}
+          DSOLocal(IsLocal), CanAutoHide(CanAutoHide),
+          ImportAsDec(ImportAsDec) {}
   };
 
 private:
@@ -564,6 +570,8 @@ class GlobalValueSummary {
 
   bool canAutoHide() const { return Flags.CanAutoHide; }
 
+  bool shouldImportAsDec() const { return Flags.ImportAsDec; }
+
   GlobalValue::VisibilityTypes getVisibility() const {
     return (GlobalValue::VisibilityTypes)Flags.Visibility;
   }
@@ -813,7 +821,7 @@ class FunctionSummary : public GlobalValueSummary {
             GlobalValue::LinkageTypes::AvailableExternallyLinkage,
             GlobalValue::DefaultVisibility,
             /*NotEligibleToImport=*/true, /*Live=*/true, /*IsLocal=*/false,
-            /*CanAutoHide=*/false),
+            /*CanAutoHide=*/false, /*ImportAsDec=*/false),
         /*NumInsts=*/0, FunctionSummary::FFlags{}, /*EntryCount=*/0,
         std::vector<ValueInfo>(), std::move(Edges),
         std::vector<GlobalValue::GUID>(),
diff --git a/llvm/include/llvm/IR/ModuleSummaryIndexYAML.h b/llvm/include/llvm/IR/ModuleSummaryIndexYAML.h
index 33e57e5f2102fd..21dddcf0c03dcd 100644
--- a/llvm/include/llvm/IR/ModuleSummaryIndexYAML.h
+++ b/llvm/include/llvm/IR/ModuleSummaryIndexYAML.h
@@ -137,7 +137,7 @@ template <> struct MappingTraits<TypeIdSummary> {
 
 struct FunctionSummaryYaml {
   unsigned Linkage, Visibility;
-  bool NotEligibleToImport, Live, IsLocal, CanAutoHide;
+  bool NotEligibleToImport, Live, IsLocal, CanAutoHide, ImportAsDec;
   std::vector<uint64_t> Refs;
   std::vector<uint64_t> TypeTests;
   std::vector<FunctionSummary::VFuncId> TypeTestAssumeVCalls,
@@ -227,7 +227,7 @@ template <> struct CustomMappingTraits<GlobalValueSummaryMapTy> {
               static_cast<GlobalValue::LinkageTypes>(FSum.Linkage),
               static_cast<GlobalValue::VisibilityTypes>(FSum.Visibility),
               FSum.NotEligibleToImport, FSum.Live, FSum.IsLocal,
-              FSum.CanAutoHide),
+              FSum.CanAutoHide, FSum.ImportAsDec),
           /*NumInsts=*/0, FunctionSummary::FFlags{}, /*EntryCount=*/0, Refs,
           ArrayRef<FunctionSummary::EdgeTy>{}, std::move(FSum.TypeTests),
           std::move(FSum.TypeTestAssumeVCalls),
@@ -251,7 +251,8 @@ template <> struct CustomMappingTraits<GlobalValueSummaryMapTy> {
               static_cast<bool>(FSum->flags().NotEligibleToImport),
               static_cast<bool>(FSum->flags().Live),
               static_cast<bool>(FSum->flags().DSOLocal),
-              static_cast<bool>(FSum->flags().CanAutoHide), Refs,
+              static_cast<bool>(FSum->flags().CanAutoHide),
+              static_cast<bool>(FSum->flags().ImportAsDec), Refs,
               FSum->type_tests(), FSum->type_test_assume_vcalls(),
               FSum->type_checked_load_vcalls(),
               FSum->type_test_assume_const_vcalls(),
diff --git a/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp b/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
index 3ad0bab827a512..eb81de81e476dd 100644
--- a/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
+++ b/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp
@@ -635,7 +635,8 @@ static void computeFunctionSummary(
                               HasIndirBranchToBlockAddress || HasIFuncCall;
   GlobalValueSummary::GVFlags Flags(
       F.getLinkage(), F.getVisibility(), NotEligibleForImport,
-      /* Live = */ false, F.isDSOLocal(), F.canBeOmittedFromSymbolTable());
+      /* Live = */ false, F.isDSOLocal(), F.canBeOmittedFromSymbolTable(),
+      /*ImportAsDec=*/false);
   FunctionSummary::FFlags FunFlags{
       F.doesNotAccessMemory(), F.onlyReadsMemory() && !F.doesNotAccessMemory(),
       F.hasFnAttribute(Attribute::NoRecurse), F.returnDoesNotAlias(),
@@ -761,7 +762,8 @@ static void computeVariableSummary(ModuleSummaryIndex &Index,
   bool NonRenamableLocal = isNonRenamableLocal(V);
   GlobalValueSummary::GVFlags Flags(
       V.getLinkage(), V.getVisibility(), NonRenamableLocal,
-      /* Live = */ false, V.isDSOLocal(), V.canBeOmittedFromSymbolTable());
+      /* Live = */ false, V.isDSOLocal(), V.canBeOmittedFromSymbolTable(),
+      /* ImportAsDec= */ false);
 
   VTableFuncList VTableFuncs;
   // If splitting is not enabled, then we compute the summary information
@@ -807,7 +809,8 @@ static void computeAliasSummary(ModuleSummaryIndex &Index, const GlobalAlias &A,
   bool NonRenamableLocal = isNonRenamableLocal(A);
   GlobalValueSummary::GVFlags Flags(
       A.getLinkage(), A.getVisibility(), NonRenamableLocal,
-      /* Live = */ false, A.isDSOLocal(), A.canBeOmittedFromSymbolTable());
+      /* Live = */ false, A.isDSOLocal(), A.canBeOmittedFromSymbolTable(),
+      /* ImportAsDec= */ false);
   auto AS = std::make_unique<AliasSummary>(Flags);
   auto AliaseeVI = Index.getValueInfo(Aliasee->getGUID());
   assert(AliaseeVI && "Alias expects aliasee summary to be available");
@@ -887,7 +890,8 @@ ModuleSummaryIndex llvm::buildModuleSummaryIndex(
               GlobalValue::InternalLinkage, GlobalValue::DefaultVisibility,
               /* NotEligibleToImport = */ true,
               /* Live = */ true,
-              /* Local */ GV->isDSOLocal(), GV->canBeOmittedFromSymbolTable());
+              /* Local */ GV->isDSOLocal(), GV->canBeOmittedFromSymbolTable(),
+              /* ImportAsDec= */ false);
           CantBePromoted.insert(GV->getGUID());
           // Create the appropriate summary type.
           if (Function *F = dyn_cast<Function>(GV)) {
diff --git a/llvm/lib/AsmParser/LLLexer.cpp b/llvm/lib/AsmParser/LLLexer.cpp
index 2301a27731eaff..28be5a7a9bc54f 100644
--- a/llvm/lib/AsmParser/LLLexer.cpp
+++ b/llvm/lib/AsmParser/LLLexer.cpp
@@ -737,6 +737,7 @@ lltok::Kind LLLexer::LexIdentifier() {
   KEYWORD(live);
   KEYWORD(dsoLocal);
   KEYWORD(canAutoHide);
+  KEYWORD(importAsDec);
   KEYWORD(function);
   KEYWORD(insts);
   KEYWORD(funcFlags);
diff --git a/llvm/lib/AsmParser/LLParser.cpp b/llvm/lib/AsmParser/LLParser.cpp
index fe49e52ae4283c..d437637f468cfd 100644
--- a/llvm/lib/AsmParser/LLParser.cpp
+++ b/llvm/lib/AsmParser/LLParser.cpp
@@ -9215,7 +9215,8 @@ bool LLParser::parseFunctionSummary(std::string Name, GlobalValue::GUID GUID,
   GlobalValueSummary::GVFlags GVFlags = GlobalValueSummary::GVFlags(
       GlobalValue::ExternalLinkage, GlobalValue::DefaultVisibility,
       /*NotEligibleToImport=*/false,
-      /*Live=*/false, /*IsLocal=*/false, /*CanAutoHide=*/false);
+      /*Live=*/false, /*IsLocal=*/false, /*CanAutoHide=*/false,
+      /*ImportAsDec=*/false);
   unsigned InstCount;
   std::vector<FunctionSummary::EdgeTy> Calls;
   FunctionSummary::TypeIdInfo TypeIdInfo;
@@ -9302,7 +9303,8 @@ bool LLParser::parseVariableSummary(std::string Name, GlobalValue::GUID GUID,
   GlobalValueSummary::GVFlags GVFlags = GlobalValueSummary::GVFlags(
       GlobalValue::ExternalLinkage, GlobalValue::DefaultVisibility,
       /*NotEligibleToImport=*/false,
-      /*Live=*/false, /*IsLocal=*/false, /*CanAutoHide=*/false);
+      /*Live=*/false, /*IsLocal=*/false, /*CanAutoHide=*/false,
+      /*ImportAsDec*/ false);
   GlobalVarSummary::GVarFlags GVarFlags(/*ReadOnly*/ false,
                                         /* WriteOnly */ false,
                                         /* Constant */ false,
@@ -9360,7 +9362,8 @@ bool LLParser::parseAliasSummary(std::string Name, GlobalValue::GUID GUID,
   GlobalValueSummary::GVFlags GVFlags = GlobalValueSummary::GVFlags(
       GlobalValue::ExternalLinkage, GlobalValue::DefaultVisibility,
       /*NotEligibleToImport=*/false,
-      /*Live=*/false, /*IsLocal=*/false, /*CanAutoHide=*/false);
+      /*Live=*/false, /*IsLocal=*/false, /*CanAutoHide=*/false,
+      /*ImportAsDec=*/false);
   if (parseToken(lltok::colon, "expected ':' here") ||
       parseToken(lltok::lparen, "expected '(' here") ||
       parseModuleReference(ModulePath) ||
@@ -10146,6 +10149,12 @@ bool LLParser::parseGVFlags(GlobalValueSummary::GVFlags &GVFlags) {
         return true;
       GVFlags.CanAutoHide = Flag;
       break;
+    case lltok::kw_importAsDec:
+      Lex.Lex();
+      if (parseToken(lltok::colon, "expected ':'") || parseFlag(Flag))
+        return true;
+      GVFlags.ImportAsDec = Flag;
+      break;
     default:
       return error(Lex.getLoc(), "expected gv flag type");
     }
diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
index aa6c9c95ca2403..408e6f55cdecd2 100644
--- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
@@ -1133,6 +1133,7 @@ static GlobalValueSummary::GVFlags getDecodedGVSummaryFlags(uint64_t RawFlags,
   // to getDecodedLinkage() will need to be taken into account here as above.
   auto Linkage = GlobalValue::LinkageTypes(RawFlags & 0xF); // 4 bits
   auto Visibility = GlobalValue::VisibilityTypes((RawFlags >> 8) & 3); // 2 bits
+  bool ImportAsDec = ((RawFlags >> 10) & 1);
   RawFlags = RawFlags >> 4;
   bool NotEligibleToImport = (RawFlags & 0x1) || Version < 3;
   // The Live flag wasn't introduced until version 3. For dead stripping
@@ -1143,7 +1144,7 @@ static GlobalValueSummary::GVFlags getDecodedGVSummaryFlags(uint64_t RawFlags,
   bool AutoHide = (RawFlags & 0x8);
 
   return GlobalValueSummary::GVFlags(Linkage, Visibility, NotEligibleToImport,
-                                     Live, Local, AutoHide);
+                                     Live, Local, AutoHide, ImportAsDec);
 }
 
 // Decode the flags for GlobalVariable in the summary
diff --git a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
index dd554e422516f6..5007d498e5a220 100644
--- a/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
+++ b/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
@@ -1202,7 +1202,8 @@ static uint64_t getEncodedFFlags(FunctionSummary::FFlags Flags) {
 
 // Decode the flags for GlobalValue in the summary. See getDecodedGVSummaryFlags
 // in BitcodeReader.cpp.
-static uint64_t getEncodedGVSummaryFlags(GlobalValueSummary::GVFlags Flags) {
+static uint64_t getEncodedGVSummaryFlags(GlobalValueSummary::GVFlags Flags,
+                                         bool ImportAsDec = false) {
   uint64_t RawFlags = 0;
 
   RawFlags |= Flags.NotEligibleToImport; // bool
@@ -1217,6 +1218,8 @@ static uint64_t getEncodedGVSummaryFlags(GlobalValueSummary::GVFlags Flags) {
 
   RawFlags |= (Flags.Visibility << 8); // 2 bits
 
+  RawFlags |= ((ImportAsDec) & (1 << 10)); // 1 bit
+
   return RawFlags;
 }
 
diff --git a/llvm/lib/IR/AsmWriter.cpp b/llvm/lib/IR/AsmWriter.cpp
index b778a14158ef2b..11dbd8a6484bb4 100644
--- a/llvm/lib/IR/AsmWriter.cpp
+++ b/llvm/lib/IR/AsmWriter.cpp
@@ -3545,6 +3545,7 @@ void AssemblyWriter::printSummary(const GlobalValueSummary &Summary) {
   Out << ", live: " << GVFlags.Live;
   Out << ", dsoLocal: " << GVFlags.DSOLocal;
   Out << ", canAutoHide: " << GVFlags.CanAutoHide;
+  Out << ", importAsDec: " << GVFlags.ImportAsDec;
   Out << ")";
 
   if (Summary.getSummaryKind() == GlobalValueSummary::AliasKind)
diff --git a/llvm/lib/IR/ModuleSummaryIndex.cpp b/llvm/lib/IR/ModuleSummaryIndex.cpp
index 198c730418c724..b09ab668d49763 100644
--- a/llvm/lib/IR/ModuleSummaryIndex.cpp
+++ b/llvm/lib/IR/ModuleSummaryIndex.cpp
@@ -644,6 +644,8 @@ void ModuleSummaryIndex::exportToDot(
         A.addComment("dsoLocal");
       if (Flags.CanAutoHide)
         A.addComment("canAutoHide");
+      if (Flags.ImportAsDec)
+        A.addComment("importAsDec");
       if (GUIDPreservedSymbols.count(SummaryIt.first))
         A.addComment("preserved");
 
diff --git a/llvm/test/Assembler/thinlto-summary.ll b/llvm/test/Assembler/thinlto-summary.ll
index 9eb3c6669780d6..d7a1abe58789f8 100644
--- a/llvm/test/Assembler/thinlto-summary.ll
+++ b/llvm/test/Assembler/thinlto-summary.ll
@@ -22,7 +22,7 @@
 ^6 = gv: (guid: 5, summaries: (function: (module: ^0, flags: (linkage: available_externally, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 1)))
 ^7 = gv: (guid: 6, summaries: (function: (module: ^0, flags: (linkage: linkonce, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 1)))
 ^8 = gv: (guid: 7, summaries: (function: (module: ^0, flags: (linkage: linkonce_odr, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 1)))
-^9 = gv: (guid: 8, summaries: (function: (module: ^0, flags: (linkage: weak_odr, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 1), insts: 1)))
+^9 = gv: (guid: 8, summaries: (function: (module: ^0, flags: (linkage: weak_odr, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 1, importAsDec: 0), insts: 1)))
 ^10 = gv: (guid: 9, summaries: (function: (module: ^0, flags: (linkage: weak, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 1)))
 ^11 = gv: (guid: 10, summaries: (variable: (module: ^0, flags: (linkage: common, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0), varFlags: (readonly: 0))))
 ; Test appending globel variable with reference (tests backward reference on
@@ -69,30 +69,30 @@
 ; Make sure we get back from llvm-dis essentially what we put in via llvm-as.
 ; CHECK: ^0 = module: (path: "thinlto-summary1.o", hash: (1369602428, 2747878711, 259090915, 2507395659, 1141468049))
 ; CHECK: ^1 = module: (path: "thinlto-summary2.o", hash: (2998369023, 4283347029, 1195487472, 2757298015, 1852134156))
-; CHECK: ^2 = gv: (guid: 1, summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 10, calls: ((callee: ^15, hotness: hot), (callee: ^17, hotness: cold), (callee: ^16, hotness: none, tail: 1)), refs: (^11, readonly ^13, writeonly ^14))))
+; CHECK: ^2 = gv: (guid: 1, summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0, importAsDec: 0), insts: 10, calls: ((callee: ^15, hotness: hot), (callee: ^17, hotness: cold), (callee: ^16, hotness: none, tail: 1)), refs: (^11, readonly ^13, writeonly ^14))))
 ;; relbf is not emitted since this is a combined summary, and that is only
 ;; emitted for per-module summaries.
-; CHECK: ^3 = gv: (guid: 2, summaries: (function: (module: ^1, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 10, calls: ((callee: ^15, tail: 1)))))
-; CHECK: ^4 = gv: (guid: 3, summaries: (function: (module: ^0, flags: (linkage: internal, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1, canAutoHide: 0), insts: 1)))
-; CHECK: ^5 = gv: (guid: 4, summaries: (alias: (module: ^0, flags: (linkage: private, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1, canAutoHide: 0), aliasee: ^14)))
-; CHECK: ^6 = gv: (guid: 5, summaries: (function: (module: ^0, flags: (linkage: available_externally, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 1)))
-; CHECK: ^7 = gv: (guid: 6, summaries: (function: (module: ^0, flags: (linkage: linkonce, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 1)))
-; CHECK: ^8 = gv: (guid: 7, summaries: (function: (module: ^0, flags: (linkage: linkonce_odr, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 1)))
-; CHECK: ^9 = gv: (guid: 8, summaries: (function: (module: ^0, flags: (linkage: weak_odr, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 1), insts: 1)))
-; CHECK: ^10 = gv: (guid: 9, summaries: (function: (module: ^0, flags: (linkage: weak, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 1)))
-; CHECK: ^11 = gv: (guid: 10, summaries: (variable: (module: ^0, flags: (linkage: common, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), varFlags: (readonly: 0, writeonly: 0, constant: 0))))
-; CHECK: ^12 = gv: (guid: 11, summaries: (variable: (module: ^0, flags: (linkage: appending, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), varFlags: (readonly: 0, writeonly: 0, constant: 0), refs: (^4))))
-; CHECK: ^13 = gv: (guid: 12, summaries: (variable: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), varFlags: (readonly: 1, writeonly: 0, constant: 0))))
-; CHECK: ^14 = gv: (guid: 13, summaries: (variable: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1, canAutoHide: 0), varFlags: (readonly: 0, writeonly: 0, constant: 0))))
-; CHECK: ^15 = gv: (guid: 14, summaries: (function: (module: ^1, flags: (linkage: external, visibility: default, notEligibleToImport: 1, live: 1, dsoLocal: 0, canAutoHide: 0), insts: 1, funcFlags: (readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0, noInline: 1, alwaysInline: 0, noUnwind: 0, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 0))))
-; CHECK: ^16 = gv: (guid: 15, summaries: (function: (module: ^1, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 1, funcFlags: (readNone: 1, readOnly: 0, noRecurse: 1, returnDoesNotAlias: 0, noInline: 0, alwaysInline: 1, noUnwind: 1, mayThrow: 1, hasUnknownCall: 1, mustBeUnreachable: 0))))
-; CHECK: ^17 = gv: (guid: 16, summaries: (function: (module: ^1, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 1, funcFlags: (readNone: 0, readOnly: 1, noRecurse: 0, returnDoesNotAlias: 1, noInline: 0, alwaysInline: 0, noUnwind: 0, mayThrow: 0, hasUnknownCall: 0, mustBeUnreachable: 1), calls: ((callee: ^15)))))
-; CHECK: ^18 = gv: (guid: 17, summaries: (alias: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 1, canAutoHide: 0), aliasee: ^14)))
-; CHECK: ^19 = gv: (guid: 18, summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 4, typeIdInfo: (typeTests: (^24, ^26)))))
-; CHECK: ^20 = gv: (guid: 19, summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 8, typeIdInfo: (typeTestAssumeVCalls: (vFuncId: (^27, offset: 16))))))
-; CHECK: ^21 = gv: (guid: 20, summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 5, typeIdInfo: (typeCheckedLoadVCalls: (vFuncId: (^25, offset: 16))))))
-; CHECK: ^22 = gv: (guid: 21, summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 15, typeIdInfo: (typeTestAssumeConstVCalls: ((vFuncId: (^27, offset: 16), args: (42)), (vFuncId: (^27, offset: 24)))))))
-; CHECK: ^23 = gv: (gu...
[truncated]

@mingmingl-llvm
Copy link
Contributor Author

@teresajohnson I'm only updating one test case (llvm/test/Assembler/thinlto-summary.ll) to show how test cases are affected now. I'm planning to update the rest of test cases as review goes on (e.g., the name of field might change during review).

Copy link
Contributor

@jvoung jvoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small comments

@@ -737,6 +737,7 @@ lltok::Kind LLLexer::LexIdentifier() {
KEYWORD(live);
KEYWORD(dsoLocal);
KEYWORD(canAutoHide);
KEYWORD(importAsDec);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too strong preference on the spelling but could be something like "importAs: dec" vs "importAs: def" (in case there is another variation later like "summary"). Otherwise we can consider that later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC your point is the new field in struct GVFlags should carry 3 states in the field, Definition, Declaration and Summary since there is a known use case for import summaries.

I can see two different ways to do that

  1. Rename importAsDec to importType and make it two bits.
  2. Keep the current way to represent def vs decl. For the import summary use case, add another one-bit field.

I like 1 better since join two 1-bit fields to represent 3 states is not very natural. If this sounds good, I could make the change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my feeling is 1 would be more natural too. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remind me - what is the summary import case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jvoung Could you elaborate the use case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the summary import case, I'm mostly looking at direct call cases, so the importing module does have a declaration already, but the declaration doesn't have the attributes that the source module has (inferred from the definition and possibly transitively from callees definitions), so the summary helps update the decl e.g., in thinLTOFinalizeInModule

One could also mark it as a declaration import, but we'd want it to overwrite instead of no-op.

Other pros / cons for import summary vs import decl in earlier comment but happy to elaborate more or if more data is needed. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand the mechanics of how a "summary" import works before we reserve a flag bit for that case. Couldn't we just mark it as a declaration import which would essentially be a no-op if a declaration/def already exists in the module?

Previously I thought reserving one bit is necessary for postlink to tell one of three scenarios from {definition, declaration, summary}. However now after f2f discussion I think one-bit (just marking one of {definition, declaration}) should work.

To summarize our discussion, one-bit approach works for summary import in a selective manner.

  1. The GVFlags could reserve one-bit for ImportType, and making it just declaration or definition.
  2. In the indexing step, 'ComputeImportForModule' creates an entry for declaration (when definition import is too costly, etc) if call-graph needs it, or when interesting attributes exists.
  3. The postlink optimizer sees IR and combined summary. If definition or declaration is already in the module (which is the case except for indirect callee, since IR verifier requests direct-callee should at least have a declaration), FunctionImporter could just read interesting attributes from the summary and set it on the function (def or decl). For the set of summaries without a function (the case for indirect callee for better call-graph sort), IRMover will be updated in a subsequent patch to import the declaration.
    • I prototyped the IRMover change. The regression test shows large indirect callee declaration is imported along with attributes.

I'm going to update this PR to use one-bit approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, one-bit serves the purpose to include the global-value-summary entry in the postlink combined summary, and indexing step can selectively set the bit since it has visibility of the combined summary.

If the one-bit indicates definition, postlink will import function definition along with attributes if function doesn't exist. If the one-bit indicates declaration, postlink can annotate attributes according to summaries if declaration already exists, and use IRMover to import declarations that don't exist yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, yes that makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to use one-bit state and the enum class comment. PTAL, thanks!

Re test coverage

  • ThinLTO/X86/dot-dumpler.ll covers ModuleSummaryIndex::exportToDot
  • Transforms/WholeProgramDevirt/import-indir.ll covers the update of
    FunctionSummaryTraits, and test both {def, decl}
  • Assembler/thinlto-summary tests both {def, decl}
  • The rest of test cases are backfill'ed with def and doesn't test decl.

@@ -737,6 +737,7 @@ lltok::Kind LLLexer::LexIdentifier() {
KEYWORD(live);
KEYWORD(dsoLocal);
KEYWORD(canAutoHide);
KEYWORD(importAsDec);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remind me - what is the summary import case?

@teresajohnson
Copy link
Contributor

Can you add a high level description of the change and the motivation to the PR summary?

@mingmingl-llvm
Copy link
Contributor Author

Can you add a high level description of the change and the motivation to the PR summary?

done.

@mingmingl-llvm mingmingl-llvm changed the title [ThinLTO]Record import type (declaration or definition) in GlobalValueSummary::GVFlags [ThinLTO]Record import type in GlobalValueSummary::GVFlags Apr 9, 2024
Re test coverage
- ThinLTO/X86/dot-dumpler.ll covers ModuleSummaryIndex::exportToDot
- Transforms/WholeProgramDevirt/import-indir.ll covers the update of
  FunctionSummaryTraits, and test both {def, decl}
- Assembler/thinlto-summary tests both {def, decl}
- The rest of test cases are backfill'ed with def and doesn't test decl.
@llvmbot llvmbot added LTO Link time optimization (regular/full LTO or ThinLTO) llvm:transforms labels Apr 9, 2024
Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor fixes / suggestions below.

Definition = 0,

// When its definition doesn't exist in the destination module and not
// imported (e.g., function is large to be inlined), the global value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"too large"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -564,6 +581,12 @@ class GlobalValueSummary {

bool canAutoHide() const { return Flags.CanAutoHide; }

bool shouldImportAsDec() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest s/Dec/Decl/ (Decl is a bit more typical as an abbreviation)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense, done.

GlobalValueSummary::ImportKind Res;
switch (Kind) {
default:
Res = GlobalValueSummary::Definition;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case should be an error I think? It corresponds to importType that is not known.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, by changing parseOptionalImportType to a member function of LLParser class (from a static free function) so it could return tokError.

^6 = gv: (guid: 5, summaries: (function: (module: ^0, flags: (linkage: available_externally, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 1)))
^7 = gv: (guid: 6, summaries: (function: (module: ^0, flags: (linkage: linkonce, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 1)))
^8 = gv: (guid: 7, summaries: (function: (module: ^0, flags: (linkage: linkonce_odr, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0), insts: 1)))
^9 = gv: (guid: 8, summaries: (function: (module: ^0, flags: (linkage: weak_odr, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 1), insts: 1)))
^9 = gv: (guid: 8, summaries: (function: (module: ^0, flags: (linkage: weak_odr, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0, importType: declaration), insts: 1)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was the only canAutoHide = true test case here. Can you add a new record that is a declaration type to test its round tripping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah canAutoHide:0 -> canAutoHide:1 is not intentional. But I agree it's better to add a new record.

Done by taking ^24 and incrementing ^ID for the rest.

Copy link
Contributor

@jvoung jvoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit, other LG thanks!

@@ -635,7 +635,8 @@ static void computeFunctionSummary(
HasIndirBranchToBlockAddress || HasIFuncCall;
GlobalValueSummary::GVFlags Flags(
F.getLinkage(), F.getVisibility(), NotEligibleForImport,
/* Live = */ false, F.isDSOLocal(), F.canBeOmittedFromSymbolTable());
/* Live = */ false, F.isDSOLocal(), F.canBeOmittedFromSymbolTable(),
/*ImportType=*/GlobalValueSummary::ImportKind::Definition);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you removed from the comment from other cases in this file already, but now that it's an enum and not "false" the comment seems less important

same in "makeDummyFunctionSummary" in the ModuleSummaryIndex

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. Fixed all callsites of GVFlags constructor.

@llvmbot llvmbot added the clang Clang issues not falling into any other category label Apr 10, 2024
@mingmingl-llvm
Copy link
Contributor Author

mingmingl-llvm commented Apr 10, 2024

Resolved review feedback, and 'backfilled' three affected tests in clang/test/CodeGen. Now ninja check-llvm check-clang check-compiler-rt passed locally.

@llvmbot llvmbot added the lld label Apr 11, 2024
@mingmingl-llvm mingmingl-llvm merged commit dda7333 into main Apr 11, 2024
5 of 6 checks passed
@mingmingl-llvm mingmingl-llvm deleted the users/minglotus-6/spr/summary branch April 11, 2024 02:46
@goldsteinn
Copy link
Contributor

I think this might be breaking:
FAIL: LLVM :: tools/gold/X86/thinlto.ll

@mingmingl-llvm
Copy link
Contributor Author

mingmingl-llvm commented Apr 11, 2024

I think this might be breaking:
FAIL: LLVM :: tools/gold/X86/thinlto.ll

Thanks for reporting and sorry for this. I'll send out a fix to llvm/test/tools/gold/ tests soon.

@mingmingl-llvm
Copy link
Contributor Author

I think this might be breaking: FAIL: LLVM :: tools/gold/X86/thinlto.ll

fixed in 20ed5b1

mingmingl-llvm added a commit that referenced this pull request May 20, 2024
…hinLTO under a default-off new option (#88024)

The goal is to populate `declaration` import status if a new flag`-import-declaration` is on.

* For in-process ThinLTO, the `declaration` status is visible to backend
`function-import` pass, so `FunctionImporter::importFunctions` should
read the import status and be no-op for declaration summaries.
Basically, the postlink pipeline is updated to keep its current behavior
(import definitions), but not updated to handle `declaration` summaries.
Two use cases (better call-graph sort and cross-module auto-init)
would use this bit differently.

* For distributed ThinLTO, the `declaration` status is not serialized to
bitcode. As discussed, #87600
will do this.

[1] https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5
[2] #87597 (comment)
mingmingl-llvm added a commit that referenced this pull request May 20, 2024
…ibuted ThinLTO under a default-off new option" (#92718)

The original PR is reviewed in
#88024, and this PR adds one
line (b9f04d1)
to fix test

Limit to one thread for in-process ThinLTO to test `LLVM_DEBUG` log.
- This should fix build bot failure like
https://lab.llvm.org/buildbot/#/builders/259/builds/4727 and
https://lab.llvm.org/buildbot/#/builders/9/builds/43876
- I could repro the failure and see interleaved log messages by using
`-thinlto-threads=all`

**Original Commit Message:**

The goal is to populate `declaration` import status if a new flag
`-import-declaration` is on.

* For in-process ThinLTO, the `declaration` status is visible to backend
`function-import` pass, so `FunctionImporter::importFunctions` should
read the import status and be no-op for declaration summaries.
Basically, the postlink pipeline is updated to keep its current behavior
(import definitions), but not updated to handle `declaration` summaries.
Two use cases ([better call-graph
sort](https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5)
or [cross-module
auto-init](#87597 (comment)))
would use this bit differently.

* For distributed ThinLTO, the `declaration` status is not serialized to
bitcode. As discussed, #87600
will do this.
mingmingl-llvm added a commit that referenced this pull request Jun 20, 2024
…ibuted ThinLTO under a default-off new option" (#95482)

Make `FunctionsToImportTy` an `unordered_map` rather than `DenseMap`.
Credit goes to jvoung@ for the 'DenseMap -> unordered_map' change. This
is a reland of #92718

* `DenseMap` allocates space for a large number of key/value pairs and
wastes space when the number of elements are small.
* While init bucket size is zero [1], it quickly allocates buckets for 64 elements [2]
when the number of elements is small (for example, 3 or 4 elements). The programmer
manual [3] also mentions it could waste space.
* Experiments show `FunctionsToImportTy.size()` is smaller than 4 for
multiple binaries with high indexing ram usage. `unordered_map` grows
factor is at most 2 in llvm libc [4] for insert operations.
 
With this change, `ComputeCrossModuleImport` ram increase is smaller
than 0.5G on a couple of binaries with high indexing ram usage. A wider
range of (pre-release) tests pass.

[1] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/llvm/include/llvm/ADT/DenseMap.h#L431-L432 
[2] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/llvm/include/llvm/ADT/DenseMap.h#L849
[3] https://llvm.org/docs/ProgrammersManual.html#llvm-adt-densemap-h
[4] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/libcxx/include/__hash_table#L1525-L1526

**Original commit message** 
The goal is to populate `declaration` import status if a new flag
`-import-declaration` is on.

* For in-process ThinLTO, the `declaration` status is visible to backend
`function-import` pass, so `FunctionImporter::importFunctions` should
read the import status and be no-op for declaration summaries.
Basically, the postlink pipeline is updated to keep its current behavior
(import definitions), but not updated to handle `declaration` summaries.
Two use cases ([better call-graph
sort](https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5)
or [cross-module
auto-init](#87597 (comment)))
would use this bit differently.

* For distributed ThinLTO, the `declaration` status is not serialized to
bitcode. As discussed, #87600
will do this.
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
…ibuted ThinLTO under a default-off new option" (llvm#95482)

Make `FunctionsToImportTy` an `unordered_map` rather than `DenseMap`.
Credit goes to jvoung@ for the 'DenseMap -> unordered_map' change. This
is a reland of llvm#92718

* `DenseMap` allocates space for a large number of key/value pairs and
wastes space when the number of elements are small.
* While init bucket size is zero [1], it quickly allocates buckets for 64 elements [2]
when the number of elements is small (for example, 3 or 4 elements). The programmer
manual [3] also mentions it could waste space.
* Experiments show `FunctionsToImportTy.size()` is smaller than 4 for
multiple binaries with high indexing ram usage. `unordered_map` grows
factor is at most 2 in llvm libc [4] for insert operations.
 
With this change, `ComputeCrossModuleImport` ram increase is smaller
than 0.5G on a couple of binaries with high indexing ram usage. A wider
range of (pre-release) tests pass.

[1] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/llvm/include/llvm/ADT/DenseMap.h#L431-L432 
[2] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/llvm/include/llvm/ADT/DenseMap.h#L849
[3] https://llvm.org/docs/ProgrammersManual.html#llvm-adt-densemap-h
[4] https://github.com/llvm/llvm-project/blob/ad79a14c9e5ec4a369eed4adf567c22cc029863f/libcxx/include/__hash_table#L1525-L1526

**Original commit message** 
The goal is to populate `declaration` import status if a new flag
`-import-declaration` is on.

* For in-process ThinLTO, the `declaration` status is visible to backend
`function-import` pass, so `FunctionImporter::importFunctions` should
read the import status and be no-op for declaration summaries.
Basically, the postlink pipeline is updated to keep its current behavior
(import definitions), but not updated to handle `declaration` summaries.
Two use cases ([better call-graph
sort](https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5)
or [cross-module
auto-init](llvm#87597 (comment)))
would use this bit differently.

* For distributed ThinLTO, the `declaration` status is not serialized to
bitcode. As discussed, llvm#87600
will do this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category lld:ELF lld llvm:analysis llvm:ir llvm:transforms LTO Link time optimization (regular/full LTO or ThinLTO)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants