Skip to content

[lld-macho] Fix category merging category map non-determinism #91159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 6, 2024

Conversation

alx32
Copy link
Contributor

@alx32 alx32 commented May 6, 2024

Currently in ObjcCategoryMerger::doMerge and generateCatListForNonErasedCategories we use maps of pointers which leads to non-determinism. Switch instead to using MapVector which preserves determinism.

@alx32 alx32 requested review from thevinster and kyulee-com May 6, 2024 02:32
@alx32 alx32 force-pushed the 08_nodeterminism_catorder branch from 2ac1d78 to 8add44d Compare May 6, 2024 12:23
@alx32 alx32 requested a review from ellishg May 6, 2024 12:24
@alx32 alx32 marked this pull request as ready for review May 6, 2024 12:32
@llvmbot
Copy link
Member

llvmbot commented May 6, 2024

@llvm/pr-subscribers-lld-macho

@llvm/pr-subscribers-lld

Author: None (alx32)

Changes

Currently in ObjcCategoryMerger::doMerge we create merged categories via for (auto &entry : categoryMap) [...]. The issue is that categoryMap is a DenseMap of pointers to symbols. Since pointers in memory change from run to run, the iteration through the map will also be arbitrary. This will cause categories to be emitted in random order. This violates linker determinism.

To fix this, we instead sort categories by the same order that they showed up in the inputSections. This way determinism is preserved.


Full diff: https://github.com/llvm/llvm-project/pull/91159.diff

1 Files Affected:

  • (modified) lld/MachO/ObjC.cpp (+33-12)
diff --git a/lld/MachO/ObjC.cpp b/lld/MachO/ObjC.cpp
index 4760fffebe3b30..a48baec9064f12 100644
--- a/lld/MachO/ObjC.cpp
+++ b/lld/MachO/ObjC.cpp
@@ -341,10 +341,13 @@ class ObjcCategoryMerger {
     ConcatInputSection *catListIsec;
     ConcatInputSection *catBodyIsec;
     uint32_t offCatListIsec = 0;
+    uint32_t inputIndex = 0;
 
     bool wasMerged = false;
   };
 
+  typedef std::vector<InfoInputCategory> CategoryGroup;
+
   // To write new (merged) categories or classes, we will try make limited
   // assumptions about the alignment and the sections the various class/category
   // info are stored in and . So we'll just reuse the same sections and
@@ -416,8 +419,7 @@ class ObjcCategoryMerger {
 
 private:
   void collectAndValidateCategoriesData();
-  void
-  mergeCategoriesIntoSingleCategory(std::vector<InfoInputCategory> &categories);
+  void mergeCategoriesIntoSingleCategory(CategoryGroup &categories);
 
   void eraseISec(ConcatInputSection *isec);
   void eraseMergedCategories();
@@ -476,8 +478,8 @@ class ObjcCategoryMerger {
 
   InfoCategoryWriter infoCategoryWriter;
   std::vector<ConcatInputSection *> &allInputSections;
-  // Map of base class Symbol to list of InfoInputCategory's for it
-  DenseMap<const Symbol *, std::vector<InfoInputCategory>> categoryMap;
+  // Info for all input categories, grouped by base class
+  std::vector<CategoryGroup> categoryGroups;
 
   // Normally, the binary data comes from the input files, but since we're
   // generating binary data ourselves, we use the below array to store it in.
@@ -1043,6 +1045,12 @@ void ObjcCategoryMerger::createSymbolReference(Defined *refFrom,
 }
 
 void ObjcCategoryMerger::collectAndValidateCategoriesData() {
+  // Make category merging deterministic by using a counter for found categories
+  uint32_t inputCategoryIndex = 0;
+  // Map of base class Symbol to list of InfoInputCategory's for it. We use this
+  // for fast lookup of categories to their base class. Later this info will be
+  // moved into 'categoryGroups' member.
+  DenseMap<const Symbol *, std::vector<InfoInputCategory>> categoryMap;
   for (InputSection *sec : allInputSections) {
     if (sec->getName() != section_names::objcCatList)
       continue;
@@ -1072,12 +1080,25 @@ void ObjcCategoryMerger::collectAndValidateCategoriesData() {
           tryGetSymbolAtIsecOffset(catBodyIsec, catLayout.klassOffset);
       assert(classSym && "Category does not have a valid base class");
 
-      InfoInputCategory catInputInfo{catListCisec, catBodyIsec, off};
+      InfoInputCategory catInputInfo{catListCisec, catBodyIsec, off,
+                                     inputCategoryIndex++};
       categoryMap[classSym].push_back(catInputInfo);
 
       collectCategoryWriterInfoFromCategory(catInputInfo);
     }
   }
+
+  // Move categoryMap into categoryGroups and sort by the first category's
+  // inputIndex. This way we can be sure that category merging will be
+  // deterministic across linker runs.
+  categoryGroups.reserve(categoryMap.size());
+  for (auto &mapEntry : categoryMap)
+    categoryGroups.push_back(mapEntry.second);
+
+  std::sort(categoryGroups.begin(), categoryGroups.end(),
+            [](const CategoryGroup &a, const CategoryGroup &b) {
+              return a[0].inputIndex < b[0].inputIndex;
+            });
 }
 
 // In the input we have multiple __objc_catlist InputSection, each of which may
@@ -1153,8 +1174,8 @@ void ObjcCategoryMerger::eraseMergedCategories() {
   // Map of InputSection to a set of offsets of the categories that were merged
   std::map<ConcatInputSection *, std::set<uint64_t>> catListToErasedOffsets;
 
-  for (auto &mapEntry : categoryMap) {
-    for (InfoInputCategory &catInfo : mapEntry.second) {
+  for (auto &catGroup : categoryGroups) {
+    for (InfoInputCategory &catInfo : catGroup) {
       if (catInfo.wasMerged) {
         eraseISec(catInfo.catListIsec);
         catListToErasedOffsets[catInfo.catListIsec].insert(
@@ -1169,8 +1190,8 @@ void ObjcCategoryMerger::eraseMergedCategories() {
   generateCatListForNonErasedCategories(catListToErasedOffsets);
 
   // Erase the old method lists & names of the categories that were merged
-  for (auto &mapEntry : categoryMap) {
-    for (InfoInputCategory &catInfo : mapEntry.second) {
+  for (auto &catgroup : categoryGroups) {
+    for (InfoInputCategory &catInfo : catgroup) {
       if (!catInfo.wasMerged)
         continue;
 
@@ -1193,10 +1214,10 @@ void ObjcCategoryMerger::eraseMergedCategories() {
 void ObjcCategoryMerger::doMerge() {
   collectAndValidateCategoriesData();
 
-  for (auto &entry : categoryMap)
-    if (entry.second.size() > 1)
+  for (auto &catGroup : categoryGroups)
+    if (catGroup.size() > 1)
       // Merge all categories into a new, single category
-      mergeCategoriesIntoSingleCategory(entry.second);
+      mergeCategoriesIntoSingleCategory(catGroup);
 
   // Erase all categories that were merged
   eraseMergedCategories();

Copy link
Contributor

@kyulee-com kyulee-com left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you thought about using llvm::MapVector (instead of llvm::DenseMap) which should preserve the input/insertion order to the map, which you seem want here?

@alx32 alx32 force-pushed the 08_nodeterminism_catorder branch from 5c49bd0 to 09f6d07 Compare May 6, 2024 22:32
@alx32 alx32 merged commit 6e5ed35 into llvm:main May 6, 2024
3 of 4 checks passed
@alx32 alx32 deleted the 08_nodeterminism_catorder branch July 11, 2024 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants