Skip to content

Add support for using foreign type units in .debug_names. #87740

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jun 24, 2024

Conversation

clayborg
Copy link
Collaborator

@clayborg clayborg commented Apr 5, 2024

This patch adds support for the new foreign type unit support in .debug_names. Features include:

  • don't manually index foreign TUs if we have info for them
  • only use the type unit entries that match the .dwo files when we have a .dwp file
  • fix crashers that happen due to PeekDIEName() using wrong offsets

@clayborg clayborg added the lldb label Apr 5, 2024
@clayborg clayborg self-assigned this Apr 5, 2024
@clayborg clayborg requested a review from JDevlieghere as a code owner April 5, 2024 03:36
@llvmbot
Copy link
Member

llvmbot commented Apr 5, 2024

@llvm/pr-subscribers-debuginfo

@llvm/pr-subscribers-lldb

Author: Greg Clayton (clayborg)

Changes

This patch adds support for the new foreign type unit support in .debug_names. Features include:

  • don't manually index foreign TUs if we have info for them
  • only use the type unit entries that match the .dwo files when we have a .dwp file
  • fix crashers that happen due to PeekDIEName() using wrong offsets

Patch is 22.47 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/87740.diff

11 Files Affected:

  • (modified) lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.cpp (+15-1)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.h (+2)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/DebugNamesDWARFIndex.cpp (+67-1)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/DebugNamesDWARFIndex.h (+4-1)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp (+4-2)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.h (+5-2)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp (+39-26)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h (+9)
  • (added) lldb/test/Shell/SymbolFile/DWARF/x86/dwp-foreign-type-units.cpp (+91)
  • (modified) llvm/include/llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h (+11)
  • (modified) llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp (+13)
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.cpp b/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.cpp
index 44febcfac3b000..0e111c8ec47f45 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.cpp
@@ -223,7 +223,17 @@ DWARFUnit *DWARFDebugInfo::GetUnitAtOffset(DIERef::Section section,
 }
 
 DWARFUnit *DWARFDebugInfo::GetUnit(const DIERef &die_ref) {
-  return GetUnitContainingDIEOffset(die_ref.section(), die_ref.die_offset());
+  // Make sure we get the correct SymbolFileDWARF from the DIERef before
+  // asking for information from a debug info object. We might start with the
+  // DWARFDebugInfo for the main executable in a split DWARF and the DIERef
+  // might be pointing to a specific .dwo file or to the .dwp file. So this
+  // makes sure we get the right SymbolFileDWARF instance before finding the
+  // DWARFUnit that contains the offset. If we just use this object to do the
+  // search, we might be using the wrong .debug_info section from the wrong
+  // file with an offset meant for a different section.
+  SymbolFileDWARF *dwarf = m_dwarf.GetDIERefSymbolFile(die_ref);
+  return dwarf->DebugInfo().GetUnitContainingDIEOffset(die_ref.section(),
+                                                       die_ref.die_offset());
 }
 
 DWARFUnit *
@@ -236,6 +246,10 @@ DWARFDebugInfo::GetUnitContainingDIEOffset(DIERef::Section section,
   return result;
 }
 
+const std::shared_ptr<SymbolFileDWARFDwo> DWARFDebugInfo::GetDwpSymbolFile() {
+  return m_dwarf.GetDwpSymbolFile();
+}
+
 DWARFTypeUnit *DWARFDebugInfo::GetTypeUnitForHash(uint64_t hash) {
   auto pos = llvm::lower_bound(m_type_hash_to_unit_index,
                                std::make_pair(hash, 0u), llvm::less_first());
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.h b/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.h
index c1f0cb0203fb76..b7f99ce6282b1f 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.h
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DWARFDebugInfo.h
@@ -58,6 +58,8 @@ class DWARFDebugInfo {
 
   const DWARFDebugAranges &GetCompileUnitAranges();
 
+  const std::shared_ptr<SymbolFileDWARFDwo> GetDwpSymbolFile();
+
 protected:
   typedef std::vector<DWARFUnitSP> UnitColl;
 
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DebugNamesDWARFIndex.cpp b/lldb/source/Plugins/SymbolFile/DWARF/DebugNamesDWARFIndex.cpp
index 4da0d56fdcacb4..ba35b3b872e6c6 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DebugNamesDWARFIndex.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DebugNamesDWARFIndex.cpp
@@ -34,6 +34,18 @@ DebugNamesDWARFIndex::Create(Module &module, DWARFDataExtractor debug_names,
       module, std::move(index_up), debug_names, debug_str, dwarf));
 }
 
+
+llvm::DenseSet<uint64_t>
+DebugNamesDWARFIndex::GetTypeUnitSigs(const DebugNames &debug_names) {
+  llvm::DenseSet<uint64_t> result;
+  for (const DebugNames::NameIndex &ni : debug_names) {
+    const uint32_t num_tus = ni.getForeignTUCount();
+    for (uint32_t tu = 0; tu < num_tus; ++tu)
+      result.insert(ni.getForeignTUSignature(tu));
+  }
+  return result;
+}
+
 llvm::DenseSet<dw_offset_t>
 DebugNamesDWARFIndex::GetUnits(const DebugNames &debug_names) {
   llvm::DenseSet<dw_offset_t> result;
@@ -48,6 +60,15 @@ DebugNamesDWARFIndex::GetUnits(const DebugNames &debug_names) {
   return result;
 }
 
+DWARFTypeUnit *
+DebugNamesDWARFIndex::GetForeignTypeUnit(const DebugNames::Entry &entry) const {
+  std::optional<uint64_t> type_sig = entry.getForeignTUTypeSignature();
+  if (type_sig)
+    if (auto dwp_sp = m_debug_info.GetDwpSymbolFile())
+      return dwp_sp->DebugInfo().GetTypeUnitForHash(*type_sig);
+  return nullptr;
+}
+
 std::optional<DIERef>
 DebugNamesDWARFIndex::ToDIERef(const DebugNames::Entry &entry) const {
   // Look for a DWARF unit offset (CU offset or local TU offset) as they are
@@ -55,8 +76,15 @@ DebugNamesDWARFIndex::ToDIERef(const DebugNames::Entry &entry) const {
   std::optional<uint64_t> unit_offset = entry.getCUOffset();
   if (!unit_offset) {
     unit_offset = entry.getLocalTUOffset();
-    if (!unit_offset)
+    if (!unit_offset) {
+      if (DWARFTypeUnit *tu = GetForeignTypeUnit(entry)) {
+        if (std::optional<uint64_t> die_offset = entry.getDIEUnitOffset())
+          return DIERef(tu->GetSymbolFileDWARF().GetFileIndex(),
+                        DIERef::Section::DebugInfo,
+                        tu->GetOffset() + *die_offset);
+      }
       return std::nullopt;
+    }
   }
 
   DWARFUnit *cu =
@@ -273,6 +301,44 @@ void DebugNamesDWARFIndex::GetFullyQualifiedType(
     if (!isType(entry.tag()))
       continue;
 
+
+    DWARFTypeUnit *foreign_tu = GetForeignTypeUnit(entry);
+    if (foreign_tu) {
+      // If this entry represents a foreign type unit, we need to verify that
+      // the type unit that ended up in the final .dwp file is the right type
+      // unit. Type units have signatures which are the same across multiple
+      // .dwo files, but only one of those type units will end up in the .dwp
+      // file. The contents of type units for the same type can be different
+      // in different .dwo file, which means the DIE offsets might not be the
+      // same between two different type units. So we need to determine if this
+      // accelerator table matches the type unit in the .dwp file. If it doesn't
+      // match, then we need to ignore this accelerator table entry as the type
+      // unit that is in the .dwp file will have its own index.
+      const llvm::DWARFDebugNames::NameIndex *name_index = entry.getNameIndex();
+      if (name_index == nullptr)
+        continue;
+      // In order to determine if the type unit that ended up in a .dwp file
+      // is valid, we need to grab the type unit and check the attribute on the
+      // type unit matches the .dwo file. For this to happen we rely on each
+      // .dwo file having its own .debug_names table with a single compile unit
+      // and multiple type units. This is the only way we can tell if a type
+      // unit came from a specific .dwo file.
+      if (name_index->getCUCount() == 1) {
+        dw_offset_t cu_offset = name_index->getCUOffset(0);
+        DWARFUnit *cu = m_debug_info.GetUnitAtOffset(DIERef::DebugInfo,
+                                                     cu_offset);
+        if (cu) {
+          DWARFBaseDIE cu_die = cu->GetUnitDIEOnly();
+          DWARFBaseDIE tu_die = foreign_tu->GetUnitDIEOnly();
+          llvm::StringRef cu_dwo_name =
+              cu_die.GetAttributeValueAsString(DW_AT_dwo_name, nullptr);
+          llvm::StringRef tu_dwo_name =
+              tu_die.GetAttributeValueAsString(DW_AT_dwo_name, nullptr);
+          if (cu_dwo_name != tu_dwo_name)
+            continue; // Ignore this entry, the CU DWO doesn't match the TU DWO
+        }
+      }
+    }
     // Grab at most one extra parent, subsequent parents are not necessary to
     // test equality.
     std::optional<llvm::SmallVector<Entry, 4>> parent_chain =
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DebugNamesDWARFIndex.h b/lldb/source/Plugins/SymbolFile/DWARF/DebugNamesDWARFIndex.h
index b54dd1162d20ab..6b48ce4eea1875 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DebugNamesDWARFIndex.h
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DebugNamesDWARFIndex.h
@@ -71,7 +71,8 @@ class DebugNamesDWARFIndex : public DWARFIndex {
       : DWARFIndex(module), m_debug_info(dwarf.DebugInfo()),
         m_debug_names_data(debug_names_data), m_debug_str_data(debug_str_data),
         m_debug_names_up(std::move(debug_names_up)),
-        m_fallback(module, dwarf, GetUnits(*m_debug_names_up)) {}
+        m_fallback(module, dwarf, GetUnits(*m_debug_names_up),
+                   GetTypeUnitSigs(*m_debug_names_up)) {}
 
   DWARFDebugInfo &m_debug_info;
 
@@ -84,6 +85,7 @@ class DebugNamesDWARFIndex : public DWARFIndex {
   std::unique_ptr<DebugNames> m_debug_names_up;
   ManualDWARFIndex m_fallback;
 
+  DWARFTypeUnit *GetForeignTypeUnit(const DebugNames::Entry &entry) const;
   std::optional<DIERef> ToDIERef(const DebugNames::Entry &entry) const;
   bool ProcessEntry(const DebugNames::Entry &entry,
                     llvm::function_ref<bool(DWARFDIE die)> callback);
@@ -97,6 +99,7 @@ class DebugNamesDWARFIndex : public DWARFIndex {
                                   llvm::StringRef name);
 
   static llvm::DenseSet<dw_offset_t> GetUnits(const DebugNames &debug_names);
+  static llvm::DenseSet<uint64_t> GetTypeUnitSigs(const DebugNames &debug_names);
 };
 
 } // namespace dwarf
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp b/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp
index 92275600f99caa..103e157d3cac59 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp
@@ -60,8 +60,10 @@ void ManualDWARFIndex::Index() {
   }
   if (dwp_info && dwp_info->ContainsTypeUnits()) {
     for (size_t U = 0; U < dwp_info->GetNumUnits(); ++U) {
-      if (auto *tu = llvm::dyn_cast<DWARFTypeUnit>(dwp_info->GetUnitAtIndex(U)))
-        units_to_index.push_back(tu);
+      if (auto *tu = llvm::dyn_cast<DWARFTypeUnit>(dwp_info->GetUnitAtIndex(U))) {
+        if (m_type_sigs_to_avoid.count(tu->GetTypeHash()) == 0)
+          units_to_index.push_back(tu);
+      }
     }
   }
 
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.h b/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.h
index 0126e587e52d85..3f0bb39dfc20c7 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.h
+++ b/lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.h
@@ -21,9 +21,11 @@ class SymbolFileDWARFDwo;
 class ManualDWARFIndex : public DWARFIndex {
 public:
   ManualDWARFIndex(Module &module, SymbolFileDWARF &dwarf,
-                   llvm::DenseSet<dw_offset_t> units_to_avoid = {})
+                   llvm::DenseSet<dw_offset_t> units_to_avoid = {},
+                   llvm::DenseSet<uint64_t> type_sigs_to_avoid = {})
       : DWARFIndex(module), m_dwarf(&dwarf),
-        m_units_to_avoid(std::move(units_to_avoid)) {}
+        m_units_to_avoid(std::move(units_to_avoid)),
+        m_type_sigs_to_avoid(type_sigs_to_avoid) {}
 
   void Preload() override { Index(); }
 
@@ -170,6 +172,7 @@ class ManualDWARFIndex : public DWARFIndex {
   SymbolFileDWARF *m_dwarf;
   /// Which dwarf units should we skip while building the index.
   llvm::DenseSet<dw_offset_t> m_units_to_avoid;
+  llvm::DenseSet<uint64_t> m_type_sigs_to_avoid;
 
   IndexSet m_set;
   bool m_indexed = false;
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
index 1164bc62682a9a..b828b56cc36606 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
@@ -693,7 +693,6 @@ llvm::DWARFDebugAbbrev *SymbolFileDWARF::DebugAbbrev() {
   if (debug_abbrev_data.GetByteSize() == 0)
     return nullptr;
 
-  ElapsedTime elapsed(m_parse_time);
   auto abbr =
       std::make_unique<llvm::DWARFDebugAbbrev>(debug_abbrev_data.GetAsLLVM());
   llvm::Error error = abbr->parse();
@@ -1726,14 +1725,7 @@ lldb::ModuleSP SymbolFileDWARF::GetExternalModule(ConstString name) {
   return pos->second;
 }
 
-DWARFDIE
-SymbolFileDWARF::GetDIE(const DIERef &die_ref) {
-  // This method can be called without going through the symbol vendor so we
-  // need to lock the module.
-  std::lock_guard<std::recursive_mutex> guard(GetModuleMutex());
-
-  SymbolFileDWARF *symbol_file = nullptr;
-
+SymbolFileDWARF *SymbolFileDWARF::GetDIERefSymbolFile(const DIERef &die_ref) {
   // Anytime we get a "lldb::user_id_t" from an lldb_private::SymbolFile API we
   // must make sure we use the correct DWARF file when resolving things. On
   // MacOSX, when using SymbolFileDWARFDebugMap, we will use multiple
@@ -1741,29 +1733,51 @@ SymbolFileDWARF::GetDIE(const DIERef &die_ref) {
   // references to other DWARF objects and we must be ready to receive a
   // "lldb::user_id_t" that specifies a DIE from another SymbolFileDWARF
   // instance.
+
   std::optional<uint32_t> file_index = die_ref.file_index();
+
+  // If the file index matches, then we have the right SymbolFileDWARF already.
+  // This will work for both .dwo file and DWARF in .o files for mac. Also if
+  // both the file indexes are invalid, then we have a match.
+  if (GetFileIndex() == file_index)
+    return this;
+
+  // If we are currently in a .dwo file and our file index doesn't match we need
+  // to let the base symbol file handle this.
+  SymbolFileDWARFDwo *dwo = llvm::dyn_cast_or_null<SymbolFileDWARFDwo>(this);
+  if (dwo)
+    return dwo->GetBaseSymbolFile().GetDIERefSymbolFile(die_ref);
+
   if (file_index) {
-    if (SymbolFileDWARFDebugMap *debug_map = GetDebugMapSymfile()) {
-      symbol_file = debug_map->GetSymbolFileByOSOIndex(*file_index); // OSO case
-      if (symbol_file)
-        return symbol_file->DebugInfo().GetDIE(die_ref);
-      return DWARFDIE();
-    }
+    SymbolFileDWARFDebugMap *debug_map = GetDebugMapSymfile();
+    if (debug_map) {
+        // We have a SymbolFileDWARFDebugMap, so let it find the right file
+      return debug_map->GetSymbolFileByOSOIndex(*file_index);
+    } else {
+      // Handle the .dwp file case correctly
+      if (*file_index == DIERef::k_file_index_mask)
+        return GetDwpSymbolFile().get(); // DWP case
 
-    if (*file_index == DIERef::k_file_index_mask)
-      symbol_file = GetDwpSymbolFile().get(); // DWP case
-    else
-      symbol_file = this->DebugInfo()
-                        .GetUnitAtIndex(*die_ref.file_index())
+      // Handle the .dwo file case correctly
+      return DebugInfo().GetUnitAtIndex(*die_ref.file_index())
                         ->GetDwoSymbolFile(); // DWO case
-  } else if (die_ref.die_offset() == DW_INVALID_OFFSET) {
-    return DWARFDIE();
+    }
   }
+  return this;
+}
 
-  if (symbol_file)
-    return symbol_file->GetDIE(die_ref);
+DWARFDIE
+SymbolFileDWARF::GetDIE(const DIERef &die_ref) {
+  if (die_ref.die_offset() == DW_INVALID_OFFSET)
+    return DWARFDIE();
 
-  return DebugInfo().GetDIE(die_ref);
+  // This method can be called without going through the symbol vendor so we
+  // need to lock the module.
+  std::lock_guard<std::recursive_mutex> guard(GetModuleMutex());
+  SymbolFileDWARF *symbol_file = GetDIERefSymbolFile(die_ref);
+  if (symbol_file)
+    return symbol_file->DebugInfo().GetDIE(die_ref);
+  return DWARFDIE();
 }
 
 /// Return the DW_AT_(GNU_)dwo_id.
@@ -2717,7 +2731,6 @@ void SymbolFileDWARF::FindTypes(const TypeQuery &query, TypeResults &results) {
       die_context = die.GetDeclContext();
     else
       die_context = die.GetTypeLookupContext();
-    assert(!die_context.empty());
     if (!query.ContextMatches(die_context))
       return true; // Keep iterating over index types, context mismatch.
 
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
index 2f8f80f8765cb8..a3e34278ec9422 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
+++ b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
@@ -241,6 +241,15 @@ class SymbolFileDWARF : public SymbolFileCommon {
     return m_external_type_modules;
   }
 
+  /// Given a DIERef, find the correct SymbolFileDWARF.
+  ///
+  /// A DIERef contains a file index that can uniquely identify a N_OSO file for
+  /// DWARF in .o files on mac, or a .dwo or .dwp file index for split DWARF.
+  /// Calling this function will find the correct symbol file to use so that
+  /// further lookups can be done on the correct symbol file so that the DIE
+  /// offset makes sense in the DIERef.
+  SymbolFileDWARF *GetDIERefSymbolFile(const DIERef &die_ref);
+
   virtual DWARFDIE GetDIE(const DIERef &die_ref);
 
   DWARFDIE GetDIE(lldb::user_id_t uid);
diff --git a/lldb/test/Shell/SymbolFile/DWARF/x86/dwp-foreign-type-units.cpp b/lldb/test/Shell/SymbolFile/DWARF/x86/dwp-foreign-type-units.cpp
new file mode 100644
index 00000000000000..36620591667901
--- /dev/null
+++ b/lldb/test/Shell/SymbolFile/DWARF/x86/dwp-foreign-type-units.cpp
@@ -0,0 +1,91 @@
+// REQUIRES: lld
+
+// This test will make a type that will be compiled differently into two
+// different .dwo files in a type unit with the same type hash, but with
+// differing contents. I have discovered that the hash for the type unit is
+// simply based off of the typename and doesn't seem to differ when the contents
+// differ, so that will help us test foreign type units in the .debug_names
+// section of the main executable. When a DWP file is made, only one type unit
+// will be kept and the type unit that is kept has the .dwo file name that it
+// came from. When LLDB loads the foreign type units, it needs to verify that
+// any entries from foreign type units come from the right .dwo file. We test
+// this since the contents of type units are not always the same even though
+// they have the same type hash. We don't want invalid accelerator table entries
+// to come from one .dwo file and be used on a type unit from another since this
+// could cause invalid lookups to happen. LLDB knows how to track down which
+// .dwo file a type unit comes from by looking at the DW_AT_dwo_name attribute
+// in the DW_TAG_type_unit.
+
+// Now test with DWARF5
+// RUN: %clang -target x86_64-pc-linux -gdwarf-5 -gsplit-dwarf \
+// RUN:   -fdebug-types-section -gpubnames -c %s -o %t.main.o
+// RUN: %clang -target x86_64-pc-linux -gdwarf-5 -gsplit-dwarf -DVARIANT \
+// RUN:   -fdebug-types-section -gpubnames -c %s -o %t.foo.o
+// RUN: ld.lld %t.main.o %t.foo.o -o %t
+
+// First we check when we make the .dwp file with %t.main.dwo first so it will
+// pick the type unit from %t.main.dwo. Verify we find only the types from
+// %t.main.dwo's type unit.
+// RUN: llvm-dwp %t.main.dwo %t.foo.dwo -o %t.dwp
+// RUN: %lldb \
+// RUN:   -o "type lookup IntegerType" \
+// RUN:   -o "type lookup FloatType" \
+// RUN:   -o "type lookup IntegerType" \
+// RUN:   -b %t | FileCheck %s
+// CHECK: (lldb) type lookup IntegerType
+// CHECK-NEXT: int
+// CHECK-NEXT: (lldb) type lookup FloatType
+// CHECK-NEXT: double
+// CHECK-NEXT: (lldb) type lookup IntegerType
+// CHECK-NEXT: int
+
+// Next we check when we make the .dwp file with %t.foo.dwo first so it will
+// pick the type unit from %t.main.dwo. Verify we find only the types from
+// %t.main.dwo's type unit.
+// RUN: llvm-dwp %t.foo.dwo %t.main.dwo -o %t.dwp
+// RUN: %lldb \
+// RUN:   -o "type lookup IntegerType" \
+// RUN:   -o "type lookup FloatType" \
+// RUN:   -o "type lookup IntegerType" \
+// RUN:   -b %t | FileCheck %s --check-prefix=VARIANT
+
+// VARIANT: (lldb) type lookup IntegerType
+// VARIANT-NEXT: unsigned int
+// VARIANT-NEXT: (lldb) type lookup FloatType
+// VARIANT-NEXT: float
+// VARIANT-NEXT: (lldb) type lookup IntegerType
+// VARIANT-NEXT: unsigned int
+
+
+// We need to do this so we end with a type unit in each .dwo file and that has
+// the same signature but different contents. When we make the .dwp file, then
+// one of the type units will end up in the .dwp file and we will have
+// .debug_names accelerator tables for both type units and we need to ignore
+// the type units .debug_names entries that don't match the .dwo file whose
+// copy of the type unit ends up in the final .dwp file. To do this, LLDB will
+// look at the type unit and take the DWO name attribute and make sure it
+// matches, and if it doesn't, it will ignore the accelerator table entry.
+struct CustomType {
+  // We switch the order of "FloatType" and "IntegerType" so that if we do
+  // end up reading the wrong accelerator table entry, that we would end up
+  // getting an invalid offset and not find anything, or the offset would have
+  // matched and we would find the wrong thing.
+#ifdef VARIANT
+  typedef float FloatType;
+  typedef unsigned IntegerType;
+#else
+  typedef int IntegerType;
+  typedef double FloatType;
+#endif
+  IntegerType x;
+  FloatType y;
+};
+
+#ifdef VARIANT
+int foo() {
+#else
+int main() {
+#endif
+  CustomType c = {1, 2.0};
+  return 0;
+}
diff --git a/llvm/include/llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h b/llvm/include/llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h
index f1d4fc72d5a727..2fe33ca6d29c30 100644
--- a/llvm/include/llvm/DebugInfo/DWARF/DW...
[truncated]

Copy link

github-actions bot commented Apr 5, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@@ -34,6 +34,18 @@ DebugNamesDWARFIndex::Create(Module &module, DWARFDataExtractor debug_names,
module, std::move(index_up), debug_names, debug_str, dwarf));
}


llvm::DenseSet<uint64_t>
DebugNamesDWARFIndex::GetTypeUnitSigs(const DebugNames &debug_names) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Type out the full word GetTypeUnitSignatures for consistency. DebugNames::NameIndex::getForeignTUSignature does this.

DWARFTypeUnit *
DebugNamesDWARFIndex::GetForeignTypeUnit(const DebugNames::Entry &entry) const {
std::optional<uint64_t> type_sig = entry.getForeignTUTypeSignature();
if (type_sig)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You can merge these lines:

if (std::optional<uint64_t> type_sig = entry.getForeignTUTypeSignature())

@@ -273,6 +301,44 @@ void DebugNamesDWARFIndex::GetFullyQualifiedType(
if (!isType(entry.tag()))
continue;


DWARFTypeUnit *foreign_tu = GetForeignTypeUnit(entry);
if (foreign_tu) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Merge these two lines

if (DWARFTypeUnit *foreign_tu = GetForeignTypeUnit(entry)) {

dw_offset_t cu_offset = name_index->getCUOffset(0);
DWARFUnit *cu = m_debug_info.GetUnitAtOffset(DIERef::DebugInfo,
cu_offset);
if (cu) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Merge these two lines

cu_die.GetAttributeValueAsString(DW_AT_dwo_name, nullptr);
llvm::StringRef tu_dwo_name =
tu_die.GetAttributeValueAsString(DW_AT_dwo_name, nullptr);
if (cu_dwo_name != tu_dwo_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for cu_dwo_name and tu_dwo_name to both be empty/invalid?

@@ -693,7 +693,6 @@ llvm::DWARFDebugAbbrev *SymbolFileDWARF::DebugAbbrev() {
if (debug_abbrev_data.GetByteSize() == 0)
return nullptr;

ElapsedTime elapsed(m_parse_time);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this ElapsedTime?

// If we are currently in a .dwo file and our file index doesn't match we need
// to let the base symbol file handle this.
SymbolFileDWARFDwo *dwo = llvm::dyn_cast_or_null<SymbolFileDWARFDwo>(this);
if (dwo)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Merge these two lines

return DWARFDIE();
}
SymbolFileDWARFDebugMap *debug_map = GetDebugMapSymfile();
if (debug_map) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Merge these

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, and also remove the else after return.

@@ -2717,7 +2731,6 @@ void SymbolFileDWARF::FindTypes(const TypeQuery &query, TypeResults &results) {
die_context = die.GetDeclContext();
else
die_context = die.GetTypeLookupContext();
assert(!die_context.empty());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the assertion? Not saying we should keep it, but why doesn't this hold?

// RUN: %lldb \
// RUN: -o "type lookup IntegerType" \
// RUN: -o "type lookup FloatType" \
// RUN: -o "type lookup IntegerType" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why look up IntegerType twice?

@@ -58,6 +58,8 @@ class DWARFDebugInfo {

const DWARFDebugAranges &GetCompileUnitAranges();

const std::shared_ptr<SymbolFileDWARFDwo> GetDwpSymbolFile();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove const from by-value return. (it messes with move semantics and some other things) - or was this meant to return by reference? - yeah, I guess this latter, the underlying m_dwarf.GetDwpSymbolFile() seems to return by const reference, so this function probably should too?

DebugNamesDWARFIndex::GetForeignTypeUnit(const DebugNames::Entry &entry) const {
std::optional<uint64_t> type_sig = entry.getForeignTUTypeSignature();
if (type_sig)
if (auto dwp_sp = m_debug_info.GetDwpSymbolFile())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this auto should probably be const ref, to avoid inc/dec on the ref counted smart pointer's ref count

Comment on lines 322 to 325
// type unit matches the .dwo file. For this to happen we rely on each
// .dwo file having its own .debug_names table with a single compile unit
// and multiple type units. This is the only way we can tell if a type
// unit came from a specific .dwo file.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like this wouldn't work for a merged .debug_names table? Could you leave a FIXME/do you plan to fix this?
Oh, it also wouldn't work for any kind of LTO which could have multiple CUs in a single object file/dwo file.

(FYI @cmtice )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our conclusion from previous discussions was to put DW_IDX_compile_unit, in addition to the DW_IDX_type_unit on the entries in the .debug_names table - and add the CU column to the .debug_tu_index in the dwp file, to match these things up?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our conclusion from previous discussions was to put DW_IDX_compile_unit, in addition to the DW_IDX_type_unit on the entries in the .debug_names table - and add the CU column to the .debug_tu_index in the dwp file, to match these things up?

We went with adding comp_dir/name to type unit unit die:
https://discourse.llvm.org/t/debuginfo-dwarfv5-lld-debug-names-with-fdebug-type-sections/73445/29?u=ayermolo
A bit of a shortcut so we don't have to deal with non-standar dwp.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like this wouldn't work for a merged .debug_names table? Could you leave a FIXME/do you plan to fix this? Oh, it also wouldn't work for any kind of LTO which could have multiple CUs in a single object file/dwo file.

(FYI @cmtice )

Does "multiple cus" in .o/dwo actually happen?
From spec perspective I don't understand how this will work. Skeleton CU points to .o/dwo which then has multiple cus? What does it mean?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, indeed - no, currently there's no well defined way to do LTO (LTO is basically the only time you get multiple CUs in a single .o straight out of the compiler) and Split DWARF (there's that long-standing issue that Cary and I should discuss how that should work & propose something to the DWARF committee, but we haven't as-yet).

I guess even if we did make that work, all the skeleton CUs in the dwo file would have the same DW_AT_dwo_name

Comment on lines +1777 to +1772
SymbolFileDWARF *symbol_file = GetDIERefSymbolFile(die_ref);
if (symbol_file)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar feedback others have given elsewhere, roll the variable declaration into the if condition

Comment on lines 5 to 7
// differing contents. I have discovered that the hash for the type unit is
// simply based off of the typename and doesn't seem to differ when the contents
// differ, so that will help us test foreign type units in the .debug_names
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than the personal statement, could maybe just be explicit:

"Clang's type unit signature is based only on the mangled name of the type, regardless of the contents of the type"

(I can tell you that's how it works - that's how I implemented it - it is a violation of the DWARF spec (the spec says to use a content hash - though that content hash is of the semantic contents, not the syntactic content (eg: if the TU contained a type with a single int member, the spec-compliant hash would be the same whether it was the type DIE followed by the int DIE, or the other way around) - so it would still be the same despite some variations in layout, so the index entries still wouldn't be portable to all matched-hash definitions))

Comment on lines +68 to +99
struct CustomType {
// We switch the order of "FloatType" and "IntegerType" so that if we do
// end up reading the wrong accelerator table entry, that we would end up
// getting an invalid offset and not find anything, or the offset would have
// matched and we would find the wrong thing.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we have a difference between these type variants that's simpler - like one version of the struct with a single int member, and one version of the type with no members? Then do lookup of the CustomType and make sure the type we get back is the one with the int member? (or not, depending on the test)

@ayermolo
Copy link
Contributor

ayermolo commented Apr 16, 2024

@clayborg
Figured I reply here to your comment:
#88092 (comment)
This was regarding merging .debug_names in linker (although types are not implemented yet there), but FYI BOLT output is similar.
All the CUs are in one module.
Entry for type unit will have two indexes.
One for CU to which it belongs and another for type unit. Which can either be local or foreign.
example:
bolt/test/X86/dwarf5-df-types-debug-names.test

  Compilation Unit offsets [
    CU[0]: 0x00000000
    CU[1]: 0x00000029
  ]
  Foreign Type Unit signatures [
    ForeignTU[0]: 0x49dc260088be7e56
    ForeignTU[1]: 0x104ec427d2ebea6f
    ForeignTU[2]: 0xca1e65a66d92b970
    ForeignTU[3]: 0x104ec427d2ebea6f
  ]
  
  ...
  Bucket 1 [
    Name 1 {
      Hash: 0x7C96E4DB
      String: 0x00000027 "Foo2"
      Entry @ 0x117 {
        Abbrev: 0x1
        Tag: DW_TAG_structure_type
        DW_IDX_type_unit: 0x00
        DW_IDX_compile_unit: 0x00
        DW_IDX_die_offset: 0x00000021
        DW_IDX_parent: <parent not indexed>
      }
    }

@clayborg
Copy link
Collaborator Author

clayborg commented Jun 7, 2024

All feedback has been addressed and this now supports loading accelerator table entries from .dwo files for foreign type units as well.

std::optional<uint64_t>
DWARFDebugNames::Entry::getForeignTUSkeletonCUOffset() const {
// Must have a DW_IDX_type_unit and it must be a foreign type unit.
if (!getForeignTUTypeSignature())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think according to spec if there is only one TU DW_IDX_type_unit is not needed. Just like if there is only one CU.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can you end up with a .debug_names table that contains only one type unit? If this can happen, I can modify getForeignTUTypeSignature() to do the right thing. But file would need to have a single foreign or local TU only, and no CUs at all?

Copy link
Collaborator Author

@clayborg clayborg Jun 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec states:

6.1.1.4.2 List of CUs

The list of CUs immediately follows the header. Each entry in the list is an offset of the corresponding compilation unit in the .debug_info section. In the DWARF-32 format, a section offset is 4 bytes, while in the DWARF-64 format, a section offset is 8 bytes.

The total number of entries in the list is given by comp_unit_count. There must be at least one CU.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry wasn't clear. I wasn't saying it will have only TU. I was saying it can have one TU (with a CU), at which point the DW_IDX_type_unit is not necessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still necessary otherwise we will think it is just the one and only DW_IDX_comp_unit, not the one any only DW_IDX_type_unit...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I miss understood "Must have a DW_IDX_type_unit". I thought you meant that an entry for foreign type unit must have DW_IDX_type_unit attribute.

Comment on lines 76 to 82
std::optional<uint64_t> unit_offset = entry.getForeignTUSkeletonCUOffset();
if (!unit_offset)
return nullptr; // Return NULL, this is a type unit, but couldn't find it.
DWARFUnit *cu =
m_debug_info.GetUnitAtOffset(DIERef::Section::DebugInfo, *unit_offset);
if (!cu)
return nullptr; // Return NULL, this is a type unit, but couldn't find it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this part could be moved into the common (dwp and non-dwp) code.

Comment on lines 107 to 110
std::optional<uint64_t> cu_offset = entry.getForeignTUSkeletonCUOffset();
if (cu_offset) {
DWARFUnit *cu = m_debug_info.GetUnitAtOffset(DIERef::DebugInfo, *cu_offset);
if (cu) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and merged with this.

/// doesn't match the originating skeleton compile unit's entry
/// Returns std::nullopt if this entry is not a foreign type unit entry.
std::optional<DWARFTypeUnit *>
IsForeignTypeUnit(const DebugNames::Entry &entry) const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I would expect a function called IsSomething to return bool. Maybe just call it GetForeignTypeUnit ?

Comment on lines 1763 to 1765
SymbolFileDWARFDwo *dwo = llvm::dyn_cast_or_null<SymbolFileDWARFDwo>(this);
if (dwo)
return dwo->GetBaseSymbolFile().GetDIERefSymbolFile(die_ref);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better (and more consistent) way to implement this would be to make GetDIERefSymbolFile virtual, and then have the Dwo implementation do return GetBaseSymbolFile().GetDIERefSymbolFile(die_ref).

@clayborg
Copy link
Collaborator Author

I think I have addressed all issues. Does anyone have time to review?

Copy link
Collaborator

@labath labath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the patch is very clean, but I'm bothered by the getForeignTUSkeletonCUOffset function, how it opens up with a mostly redundant (the callers checks this already) call to getForeignTUTypeSignature, and then proceeds with a reimplementation of getCUOffset (sans the type unit check). I think we could design a better set of APIs for this, but I'm not sure what those is, because I'm unclear of the meaning of various combinations of DW_IDX unit entries.

What the new function (I think) essentially does is "give me the CU associated with this entry even if the entry does not describe a DIE in this CU" (because DW_IDX_die_offset is relative to a type unit)". I think this API would make sense, even without needing to talk about type units. However, is it actually implementable? For single-CU indexes, that's fine, because we can kind of assume all entries are related to that CU. But what about multi-CU indexes? The only way to determine the CU there is to have an explicit attribute. Are we saying that each entry in a multi-CU index must have a DW_IDX_compile_unit (in addition to a DW_IDX_type_unit)?

If the answer is yes, then this function can be implemented, but then I think the current implementation of getCUOffset (which I interpret as "give me the CU offset IF this entry is relative to that CU") doesn't make sense -- because it treats entries with an explicit DW_IDX_compile_unit different from an implicit/missing DW_IDX_compile_unit. And in this world, we assume that DW_IDX_type_unit takes precedence over DW_IDX_compile_unit -- if both are present then DW_IDX_die_offset is relative to the former. And I'm not sure if we would actually want to take up space by putting both values into the index. Looking at existing producers would be interesting, but I'm not sure if there are any (lld does not merge type unit indexes right now, possibly for this very reason). Maybe one could produce something with LTO?

OTOH, if the answer is no, then the function isn't implementable in the generic case. That doesn't mean you can't implement this feature -- which is useful for guarding against ODR violations (exacerbated by llvm's nonconforming type signature computation algorithm...). However, I think it could be implemented in a much simpler way. For example, lldb could just check if it's looking at a single-CU index, and get the CU offset that way. No extra llvm APIs would be needed.

Comment on lines 1761 to 1751
// If we are currently in a .dwo file and our file index doesn't match we need
// to let the base symbol file handle this.
SymbolFileDWARFDwo *dwo = llvm::dyn_cast_or_null<SymbolFileDWARFDwo>(this);
if (dwo)
return dwo->GetDIERefSymbolFile(die_ref);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't necessary now that SymbolFileDWARFDwo overrides this function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment still holds (please remove the dyn_cast and the associated code)

return DWARFDIE();
}
SymbolFileDWARFDebugMap *debug_map = GetDebugMapSymfile();
if (debug_map) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, and also remove the else after return.

// type unit comes from by looking at the DW_AT_dwo_name attribute in the
// DW_TAG_type_unit.

// Now test with DWARF5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line doesn't make sense here. I guess it was copied from a test with multiple flavours.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too.

Comment on lines 96 to 103
// We need to do this so we end with a type unit in each .dwo file and that has
// the same signature but different contents. When we make the .dwp file, then
// one of the type units will end up in the .dwp file and we will have
// .debug_names accelerator tables for both type units and we need to ignore
// the type units .debug_names entries that don't match the .dwo file whose
// copy of the type unit ends up in the final .dwp file. To do this, LLDB will
// look at the type unit and take the DWO name attribute and make sure it
// matches, and if it doesn't, it will ignore the accelerator table entry.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels very repetitive. Can you merge this comment with the one at the top. (Optionally, you can also move the source code closer to the top of the file, that one will already know what to expect when one gets to the expectations, if one reads file top to bottom)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this.

// Returns the the CU offset for a foreign TU.
//
// Entries that represent foreign type units can have both a
// DW_IDX_compile_unit and a DW_IDX_type_unit. In this case the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading about DW_IDX_compile_unit in a supposedly-generic interface feels a bit out of place. Do these even need to be defined on the interface? AFAICT, the only callers are in DebugNamesDWARFIndex, which already know they are dealing with a debug_names table, and the debug_names entry class already has a bunch of non-virtual methods (hasParentInformation and friends) for debug_names-specific functionality.

@ayermolo
Copy link
Contributor

ayermolo commented Jun 14, 2024

Most of the patch is very clean, but I'm bothered by the getForeignTUSkeletonCUOffset function, how it opens up with a mostly redundant (the callers checks this already) call to getForeignTUTypeSignature, and then proceeds with a reimplementation of getCUOffset (sans the type unit check). I think we could design a better set of APIs for this, but I'm not sure what those is, because I'm unclear of the meaning of various combinations of DW_IDX unit entries.

What the new function (I think) essentially does is "give me the CU associated with this entry even if the entry does not describe a DIE in this CU" (because DW_IDX_die_offset is relative to a type unit)". I think this API would make sense, even without needing to talk about type units. However, is it actually implementable? For single-CU indexes, that's fine, because we can kind of assume all entries are related to that CU. But what about multi-CU indexes? The only way to determine the CU there is to have an explicit attribute. Are we saying that each entry in a multi-CU index must have a DW_IDX_compile_unit (in addition to a DW_IDX_type_unit)?

If the answer is yes, then this function can be implemented, but then I think the current implementation of getCUOffset (which I interpret as "give me the CU offset IF this entry is relative to that CU") doesn't make sense -- because it treats entries with an explicit DW_IDX_compile_unit different from an implicit/missing DW_IDX_compile_unit. And in this world, we assume that DW_IDX_type_unit takes precedence over DW_IDX_compile_unit -- if both are present then DW_IDX_die_offset is relative to the former. And I'm not sure if we would actually want to take up space by putting both values into the index. Looking at existing producers would be interesting, but I'm not sure if there are any (lld does not merge type unit indexes right now, possibly for this very reason). Maybe one could produce something with LTO?

OTOH, if the answer is no, then the function isn't implementable in the generic case. That doesn't mean you can't implement this feature -- which is useful for guarding against ODR violations (exacerbated by llvm's nonconforming type signature computation algorithm...). However, I think it could be implemented in a much simpler way. For example, lldb could just check if it's looking at a single-CU index, and get the CU offset that way. No extra llvm APIs would be needed.

BOLT does as I mentioned in:
#87740 (comment)

It will put all the CUs/Tus into one one module, thus for foreign TUs both DW_IDX_compile_unit and DW_IDX_type_unit is needed.

For LLD I believe this will be in follow up patch. Right now it does it only for CUs.

@labath
Copy link
Collaborator

labath commented Jun 14, 2024

BOLT does as I mentioned in: #87740 (comment)

I'm sorry, I missed that part. I'm bit of late to the party here.

It will put all the CUs/Tus into one one module, thus for foreign TUs both DW_IDX_compile_unit and DW_IDX_type_unit is needed.

Right, the CU is necessary to find the DWO file containing the type unit. I forgot about that. That resolves some of my questions. However, I still find the current implementation of getCUOffset strange, namely that:

  • if the entry contains a DW_IDX_compile_unit (perhaps because we have a multi-cu index), it returns that CU, regardless of anything else
  • if it does not contain the attribute (and we have a single-CU index), it returns that CU only if the index also does not contain a DW_IDX_type_unit (even though the standard says "In a per-CU index, index entries without this attribute implicitly refer to the single CU")

This doesn't seem particularly useful. I think it should either always return CU offset (regardless of whether it's implicit or explicit) or only return it if the DW_IDX_type_unit is not present. And of these two, I think the first one is better, because the caller can always explicitly check for the presence of the type attribute, and it avoids the need to introduce another function (which is needed because we now have a caller who wants the CU offset even though the entry refers to a type unit).

From what I can see, that type unit check was added relatively recently (#72952), and it was to support this piece of code in lldb:

+  // Look for a DWARF unit offset (CU offset or local TU offset) as they are
+  // both offsets into the .debug_info section.
+  std::optional<uint64_t> unit_offset = entry.getCUOffset();
+  if (!unit_offset) {
+    unit_offset = entry.getLocalTUOffset();
+    if (!unit_offset)
+      return std::nullopt;
+  }

If we just reversed the order of those checks in lldb, then it wouldn't matter that this function return a CU offset even if the index entry does not refer to a DIE in that CU.

Or am I (still) missing something?

@clayborg
Copy link
Collaborator Author

Most of the patch is very clean, but I'm bothered by the getForeignTUSkeletonCUOffset function, how it opens up with a mostly redundant (the callers checks this already) call to getForeignTUTypeSignature, and then proceeds with a reimplementation of getCUOffset (sans the type unit check). I think we could design a better set of APIs for this, but I'm not sure what those is, because I'm unclear of the meaning of various combinations of DW_IDX unit entries.

If we have an entry with a DW_IDX_type_unit, it can be a normal type unit, or a foreign type unit, that depends on the indexes value. If it is less than the number of local type units, then it is a local type unit and there should be no DW_IDX_comp_unit in the entry, else it is a foreign type unit from a .dwo or .dwp file. If we have a .dwp file, we need the DW_IDX_comp_unit to be able to tell if the foreign type unit made it into the .dwp file, or if this is an entry that represents the a different type unit and we should ignore it. If we have no .dwp file, then all entries for foreign type units are valid and we don't need to check the compile unit it came from.

So if you just ask an entry for its DW_IDX_comp_unit, you will get an answer for an entry from a normal compile unit, or for a foreign type unit with a compile unit that will be used to discriminate for the .dwp file case.

So this function is named getForeignTUSkeletonCUOffset() to help say "this might have compile unit entry, but we only want it if this is a foreign type unit. I took me a while to understand the spec when it came to foreign type units...

What the new function (I think) essentially does is "give me the CU associated with this entry even if the entry does not describe a DIE in this CU" (because DW_IDX_die_offset is relative to a type unit)". I think this API would make sense, even without needing to talk about type units. However, is it actually implementable? For single-CU indexes, that's fine, because we can kind of assume all entries are related to that CU. But what about multi-CU indexes? The only way to determine the CU there is to have an explicit attribute. Are we saying that each entry in a multi-CU index must have a DW_IDX_compile_unit (in addition to a DW_IDX_type_unit)?

Yes, but only if the DW_IDX_type_unit represents a foreign type unit, and we will only need the DW_IDX_comp_unit if we have a .dwp file. But since we don't know if the user will use a .dwp file, we always need to put the data for the originating compile uint in as a DW_IDX_comp_unit.

I would rather have had a DW_IDX_skeleton_unit to differentiate between a DW_IDX_comp_unit and requiring the user to know that they can't just find the compile unit, but they must find the non-skeleton compile unit, before adding the DW_IDX_die_offset. But we don't have that in the spec...

If the answer is yes, then this function can be implemented, but then I think the current implementation of getCUOffset (which I interpret as "give me the CU offset IF this entry is relative to that CU") doesn't make sense -- because it treats entries with an explicit DW_IDX_compile_unit different from an implicit/missing DW_IDX_compile_unit.

It doesn't treat them differently. if you ask an entry for its DW_IDX_compile_unit and it doesn't have one, we can fall back to grabbing a CU from the CU table, but only if the CU table has only 1 entry in it.

And in this world, we assume that DW_IDX_type_unit takes precedence over DW_IDX_compile_unit -- if both are present then DW_IDX_die_offset is relative to the former. And I'm not sure if we would actually want to take up space by putting both values into the index. Looking at existing producers would be interesting, but I'm not sure if there are any (lld does not merge type unit indexes right now, possibly for this very reason). Maybe one could produce something with LTO?

LTO and bolt and soon lld will produce these single tables with multiple compile units.

And we MUST have the DW_IDX_comp_unit for foreign type units because type units can generate differing content and one copy of the type unit will make it into the .dwp and we need to know from which .dwo file it came and only use the entries from the matching .dwo file because the offsets can be meaningless for type units with the same signature but that came from different .dwo files.

OTOH, if the answer is no, then the function isn't implementable in the generic case. That doesn't mean you can't implement this feature -- which is useful for guarding against ODR violations (exacerbated by llvm's nonconforming type signature computation algorithm...). However, I think it could be implemented in a much simpler way. For example, lldb could just check if it's looking at a single-CU index, and get the CU offset that way. No extra llvm APIs would be needed.

All of these APIs get the single CU if there is only one for both the case where you only have a DW_IDX_comp_unit in an entry and for the case where you have a DW_IDX_type_unit and/or not a DW_IDX_comp_unit in the same entry.

So not sure what you are asking for here, but this is the best way I can represent the messy state of the .debug_names tables. Monolithic tables with multiple CUs are here now from BOLT, and will soon be from lld, so we need to be able to handle these.

@labath
Copy link
Collaborator

labath commented Jun 17, 2024

My previous message reflected my confusion about the usefulness about the DW_IDX_cu+DW_IDX_tu combo. Let me try to start over.

The thing I don't like about this patch is the duplication between getForeignTUSkeletonCUOffset (the new function) and getCUOffset (the existing function). Since the duplication is motivated by undesirable behavior (result) of the existing function in some cases, this got me thinking if there's a way to define a single function in a way that will be usable in all scenarios (and still have a consistent easy-to-understand definition).

To see if that's possible, let's consider the possible cases (I'm putting cases with DW_IDX_tu first, as those are the interesting ones here):

  1. A single-CU index, DW_IDX_cu not present, DW_IDX_tu present
  2. A multi-CU index, DW_IDX_cu and DW_IDX_tu present

These two definitely are the most common (of those including type units) ones, but we can also have:

  1. A single-CU index, DW_IDX_cu and DW_IDX_tu present
  2. A multi-CU index, DW_IDX_cu not present, DW_IDX_tu present

For completeness, we can also have the cases without type units. I will include only include one here, as I think the others are obvious:

  1. A single-CU index, DW_IDX_cu and DW_IDX_tu not present

Now that we have that, what are the possible getCUOffset? I can think of three:

a) "return the value of DW_IDX_cu. Period." If an explicit value is not present return nothing. I don't think anyone wants this, but I'm mentioning it for completeness. With this definition, the function should return the cu offset only in the second and third cases (NYYNN)
b) "return the DW_IDX_cu value, or the single CU from the index." Let's call this the "raw" version. With this definition, I think this function should return the cu offset in all cases except the fourth (that one is impossible to resolve): YYYNY
c) "return the DW_IDX_cu value, only if the entry describes a DIE within that CU." Let's call this the "semantic" version. I believe a function with this definition should not return a CU offset only in the last case: NNNNY

Now, the problem I see is that the current implementation of getCUOffset is neither of these options. If an entry has an (explicit) DW_IDX_cu, it will always return it (even if it also contains DW_IDX_tu). However, an implicit CU will only be returned if DW_IDX_tu is not present. I.e, this function does a NYNNY.

Therefore, my proposal is to redefine the function to do (b), which essentially amounts to deleting these two lines:

  if (lookup(dwarf::DW_IDX_type_unit).has_value())
    return std::nullopt;

Besides providing a consistent definition of the function (YMMV, if you can provide an explanation for the current behavior, I'd like to hear it), it would also enable you to reuse it in getForeignTUSkeletonCUOffset, as that is exactly the behavior you need there (although, at that point, I don't think getForeignTUSkeletonCUOffset needs to exist as a separate function, as it will be simple enough to inline into lldb).

Changing the definition of course requires going through the call sites and making sure they are okay with that definition. I counted a total of three, one in llvm and two in lldb. The call in llvm (llvm-dwarfdump.cpp) is broken for type units, both before and after this change. Maybe the same goes for one of the calls in lldb (DebugNamesDWARFIndex::GetGlobalVariables -- I'm not sure if you can have global (static class member) variables in a type unit). The last one is definitely broken (and the test suite attests that, but fixing it is trivial: see c5c2a9d. I'll also note that this code was also broken for the case (2) -- a multi CU-index with the accelerator entry referring to a DIE in a (local) type unit, and that that patch fixes that.

@cmtice
Copy link
Contributor

cmtice commented Jun 17, 2024

Just a reminder: Once we complete the work of adding type units to the merged debug_names index, foreign type unit entries will need (and have) DW_IDX_compile_unit values, even in the case without .dwp files, and will need to be handled properly. So that should be kept in mind while implementing this.

@clayborg
Copy link
Collaborator Author

My previous message reflected my confusion about the usefulness about the DW_IDX_cu+DW_IDX_tu combo. Let me try to start over.

The thing I don't like about this patch is the duplication between getForeignTUSkeletonCUOffset (the new function) and getCUOffset (the existing function). Since the duplication is motivated by undesirable behavior (result) of the existing function in some cases, this got me thinking if there's a way to define a single function in a way that will be usable in all scenarios (and still have a consistent easy-to-understand definition).

To see if that's possible, let's consider the possible cases (I'm putting cases with DW_IDX_tu first, as those are the interesting ones here):

  1. A single-CU index, DW_IDX_cu not present, DW_IDX_tu present
  2. A multi-CU index, DW_IDX_cu and DW_IDX_tu present

These two definitely are the most common (of those including type units) ones, but we can also have:

  1. A single-CU index, DW_IDX_cu and DW_IDX_tu present
  2. A multi-CU index, DW_IDX_cu not present, DW_IDX_tu present

For completeness, we can also have the cases without type units. I will include only include one here, as I think the others are obvious:

  1. A single-CU index, DW_IDX_cu and DW_IDX_tu not present

Now that we have that, what are the possible getCUOffset? I can think of three:

a) "return the value of DW_IDX_cu. Period." If an explicit value is not present return nothing. I don't think anyone wants this, but I'm mentioning it for completeness. With this definition, the function should return the cu offset only in the second and third cases (NYYNN)
b) "return the DW_IDX_cu value, or the single CU from the index." Let's call this the "raw" version. With this definition, I think this function should return the cu offset in all cases except the fourth (that one is impossible to resolve): YYYNY
c) "return the DW_IDX_cu value, only if the entry describes a DIE within that CU." Let's call this the "semantic" version. I believe a function with this definition should not return a CU offset only in the last case: NNNNY

Now, the problem I see is that the current implementation of getCUOffset is neither of these options. If an entry has an (explicit) DW_IDX_cu, it will always return it (even if it also contains DW_IDX_tu). However, an implicit CU will only be returned if DW_IDX_tu is not present. I.e, this function does a NYNNY.

Therefore, my proposal is to redefine the function to do (b), which essentially amounts to deleting these two lines:

  if (lookup(dwarf::DW_IDX_type_unit).has_value())
    return std::nullopt;

But we won't want a CU index back if we have a DW_IDX_type_unit that is a local type unit. Makes no sense to return a CUOffset from such entries. So maybe we modify this be:

std::optional<uint64_t> DWARFDebugNames::Entry::getCUIndex() const {
  if (std::optional<DWARFFormValue> Off = lookup(dwarf::DW_IDX_compile_unit))
    return Off->getAsUnsignedConstant();
  // In a per-CU index, the entries without a DW_IDX_compile_unit attribute
  // implicitly refer to the single CU, but only if we don't have a
  // DW_IDX_type_unit that is a foreign type unit.
  if (IsForeignTypeUnit())
    return std::nullopt;
  if (NameIdx->getCUCount() == 1)
    return 0;
  return std::nullopt;
}

Then we implement a bool IsForeignTypeUnit() const that just checks if there is a DW_IDX_type_unit whose index is a foreign type unit.

Besides providing a consistent definition of the function (YMMV, if you can provide an explanation for the current behavior, I'd like to hear it), it would also enable you to reuse it in getForeignTUSkeletonCUOffset, as that is exactly the behavior you need there (although, at that point, I don't think getForeignTUSkeletonCUOffset needs to exist as a separate function, as it will be simple enough to inline into lldb).

That is fine with me as long as you agree with the above fix.

Let me know if the above fix to DWARFDebugNames::Entry::getCUIndex() makes sense.

@labath
Copy link
Collaborator

labath commented Jun 17, 2024

But we won't want a CU index back if we have a DW_IDX_type_unit that is a local type unit. Makes no sense to return a CUOffset from such entries.

Can you elaborate on that? I believe it doesn't make sense to you, but I just don't see the logic behind it. In particular I don't understand why it makes sense to return the CU offset when the DW_IDX_compile_unit attribute is explicitly specified (cases 2 and 3), but not when the compile unit is referred to implicitly via the lack of the attribute on a single-cu index (case 1)

@clayborg
Copy link
Collaborator Author

But we won't want a CU index back if we have a DW_IDX_type_unit that is a local type unit. Makes no sense to return a CUOffset from such entries.

Can you elaborate on that? I believe it doesn't make sense to you, but I just don't see the logic behind it. In particular I don't understand why it makes sense to return the CU offset when the DW_IDX_compile_unit attribute is explicitly specified (cases 2 and 3), but not when the compile unit is referred to implicitly via the lack of the attribute on a single-cu index (case 1)

When we have an entry like:

DW_IDX_type_unit(0)
DW_IDX_die_offset(0x32)

And type unit at index 0 is a local type unit, then the entry describes a type unit DIE in the .debug_info of the current executable. The current API we have is in DWARFAcceleratorTable.h for DWARFDebugNames::Entry is:

    std::optional<uint64_t> getCUOffset() const override;
    std::optional<uint64_t> getLocalTUOffset() const override;
    std::optional<uint64_t> getForeignTUTypeSignature() const override;

So if we have an entry mentioned above where the type unit lives in the local .debug_info section, we should get an invalid value back from a call to getCUOffset() if this is a local type unit. And we should get a valid value back from getLocalTUOffset(). But it makes no sense to return the first compile unit for a call to getCUOffset() for an entry that represents a local type unit because it isn't related to the local type unit.

Lets say we do allow an entry with a local type unit to return the first CU from the .debug_names only because there is only one CU in the .debug_names table. Soon lld will generate a single .debug_names table with multiple CUs. We will get no CU offset back for an entry that contains a local type unit + die offset (no CU specified) when using a combined .debug_names table because it can't rely on there only being one CU in the list, but we will get a valid value back for individual .debug_names tables, which makes the API usage inconsistent between individual tables and combined tables.

Any foreign type units that must be associated with a skeleton compile unit, should have both a DW_IDX_type_unit and a DW_IDX_comp_unit to make sure we can make this connection between a TU and originating CU.

@labath
Copy link
Collaborator

labath commented Jun 18, 2024

Thank you. I now have a better idea of where you're coming from.

Can you elaborate on that? I believe it doesn't make sense to you, but I just don't see the logic behind it. In particular I don't understand why it makes sense to return the CU offset when the DW_IDX_compile_unit attribute is explicitly specified (cases 2 and 3), but not when the compile unit is referred to implicitly via the lack of the attribute on a single-cu index (case 1)

When we have an entry like:

DW_IDX_type_unit(0)
DW_IDX_die_offset(0x32)

And type unit at index 0 is a local type unit, then the entry describes a type unit DIE in the .debug_info of the current executable. The current API we have is in DWARFAcceleratorTable.h for DWARFDebugNames::Entry is:

    std::optional<uint64_t> getCUOffset() const override;
    std::optional<uint64_t> getLocalTUOffset() const override;
    std::optional<uint64_t> getForeignTUTypeSignature() const override;

So if we have an entry mentioned above where the type unit lives in the local .debug_info section, we should get an invalid value back from a call to getCUOffset() if this is a local type unit. And we should get a valid value back from getLocalTUOffset(). But it makes no sense to return the first compile unit for a call to getCUOffset() for an entry that represents a local type unit because it isn't related to the local type unit.

This sounds to me like you'd prefer that getCUOffset behave like the (c) definition from my post above: "return the DW_IDX_cu value, only if the entry describes a DIE within that CU". Do you concur?

If so I think mutually acceptable version (the thing I want to avoid the most is the duplicate implementation of DW_IDX_cu retrieval) could be to have flavours of getCUOffset, a higher-level version doing (c), and a lower-level version doing (b), which would be called into from (c). Something like:

// Returns the CU that is related to this entry, not necessarily the CU that *contains* the entry.
getRelatedCUIndex /*name TBD*/() {
  if (std::optional<DWARFFormValue> Off = lookup(dwarf::DW_IDX_compile_unit))
    return Off->getAsUnsignedConstant();
  if (NameIdx->getCUCount() == 1)
    return 0;
  return std::nullopt;
}
// ditto
getRelatedCUOffset() {
  std::optional<uint64_t> Index = getCUIndex();
  if (!Index || *Index >= NameIdx->getCUCount())
    return std::nullopt;
  return NameIdx->getCUOffset(*Index);
}
// Returns the CU offset, if this entry describes a DIE in that CU
getCUOffset() {
  if (lookup(dwarf::DW_IDX_type_unit).has_value())
    return std::nullopt;
  // Entries without a type unit attribute describe a DIE in the compile unit.
  return getRelatedCUOffset();
}

and getForeignTUSkeletonCUOffset could be:

  if (!getForeignTUTypeSignature()) return std::nullopt;
  return getRelatedCUOffset();

... although I'd prefer to not have that function on the llvm interface, but rather inline it into lldb.

How does that sound?

Lets say we do allow an entry with a local type unit to return the first CU from the .debug_names only because there is only one CU in the .debug_names table. Soon lld will generate a single .debug_names table with multiple CUs. We will get no CU offset back for an entry that contains a local type unit + die offset (no CU specified) when using a combined .debug_names table because it can't rely on there only being one CU in the list, but we will get a valid value back for individual .debug_names tables, which makes the API usage inconsistent between individual tables and combined tables.

Okay, I can see how that can appear to be inconsistent. However, at the same time, I think it's the most accurate representation of what's in the debug info. If you have a type unit in a multi-CU index without a DW_IDX_cu attribute, then you really cannot tell which CU does it relate to. It doesn't matter if it a local or foreign type unit -- the information is just not there. You may not need this information for local type units, but I don't think that means we should deliberately suppress it -- particularly if it means that the function can be used as building block to implement higher level functionality (like I did with getRelatedCUOffset above)

Any foreign type units that must be associated with a skeleton compile unit, should have both a DW_IDX_type_unit and a DW_IDX_comp_unit to make sure we can make this connection between a TU and originating CU.

I'm not sure if this refers to multi-CU indexes or indexes in general, but if it's the former, then I totally agree. (In a single-CU index, I think that connection can be given implicitly through the sole CU in the index)

@nikic nikic removed their request for review June 21, 2024 11:46
@clayborg
Copy link
Collaborator Author

All issues resolved except the change of GetDIERefSymbolFile. Let me know if you agree with my comments, or want a GetSymbolFileByFileIndex(std::optional<uint32_t> file_idx).

/// further lookups can be done on the correct symbol file so that the DIE
/// offset makes sense in the DIERef.
virtual SymbolFileDWARF *GetDIERefSymbolFile(const DIERef &die_ref);

virtual DWARFDIE GetDIE(const DIERef &die_ref);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now GetDIE is virtual and handles SymbolFile resolution, this one doesn't need to be.

// This method can be called without going through the symbol vendor so we
// need to lock the module.
std::lock_guard<std::recursive_mutex> guard(GetModuleMutex());
SymbolFileDWARF *symbol_file = GetDIERefSymbolFile(die_ref);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the file index needs to necessarily come from a DIERef. I can imagine other scenarios where it might be interesting to ask for a symbol file, but I think we can deal with it when the time comes.

clayborg and others added 12 commits June 24, 2024 09:55
This patch adds support for the new foreign type unit support in .debug_names. Features include:
- don't manually index foreign TUs if we have info for them
- only use the type unit entries that match the .dwo files when we have a .dwp file
- fix crashers that happen due to PeekDIEName() using wrong offsets
BOLT creates a single .debug_names table where foreign type units have both a DW_IDX_type_unit and a DW_IDX_compile_unit to uniquely identify the .dwo file that the type unit came from.
- Make GetDIERefSymbolFile() virtual in SymbolFileDWARF and override in SymbolFileDWARFDwo
- Rename IsForeignTypeUnit to GetForeignTypeUnit
…#94982)

Reverts llvm#94212
Some codes assume that `NDEBUG` implies
`LLVM_ENABLE_ABI_BREAKING_CHECKS`, thus llvm#94212 breaks some build bots.
@clayborg clayborg merged commit 3b5b814 into llvm:main Jun 24, 2024
4 of 6 checks passed
@clayborg clayborg deleted the foreign-tus branch June 24, 2024 17:00
clayborg added a commit that referenced this pull request Jun 24, 2024
@clayborg
Copy link
Collaborator Author

An extra character snuck in and messed with the buildbots, fixed with:

commit fc066ca1c32b4aef549f3e371dc70589804aba0f (HEAD -> main, origin/main, origin/HEAD)
Author: Greg Clayton <[email protected]>
Date:   Mon Jun 24 10:15:55 2024 -0700

    Fix buildbots for https://github.com/llvm/llvm-project/pull/87740

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jun 24, 2024

LLVM Buildbot has detected a new failure on builder llvm-x86_64-debian-dylib running on gribozavr4 while building lldb,llvm at step 5 "build-unified-tree".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/803

Here is the relevant piece of the build log for the reference:

Step 5 (build-unified-tree) failure: build (failure)
...
44.575 [114/14/6770] Building CXX object tools/lldb/source/Plugins/SymbolFile/DWARF/CMakeFiles/lldbPluginSymbolFileDWARF.dir/DWARFDebugInfo.cpp.o
44.818 [114/13/6771] Building CXX object tools/lldb/source/Plugins/SymbolFile/DWARF/CMakeFiles/lldbPluginSymbolFileDWARF.dir/DWARFIndex.cpp.o
44.924 [114/12/6772] Building CXX object tools/lldb/source/Plugins/SymbolFile/DWARF/CMakeFiles/lldbPluginSymbolFileDWARF.dir/DWARFASTParser.cpp.o
45.685 [114/11/6773] Building CXX object tools/lldb/source/Plugins/SymbolFile/DWARF/CMakeFiles/lldbPluginSymbolFileDWARF.dir/ManualDWARFIndex.cpp.o
45.944 [114/10/6774] Building CXX object tools/lldb/source/Plugins/SymbolFile/DWARF/CMakeFiles/lldbPluginSymbolFileDWARF.dir/SymbolFileDWARFDebugMap.cpp.o
46.307 [114/9/6775] Building CXX object tools/lldb/source/Expression/CMakeFiles/lldbExpression.dir/DWARFExpression.cpp.o
46.458 [114/8/6776] Building CXX object tools/lld/ELF/CMakeFiles/lldELF.dir/InputSection.cpp.o
48.597 [114/7/6777] Building CXX object tools/lld/ELF/CMakeFiles/lldELF.dir/Relocations.cpp.o
49.905 [114/6/6778] Building CXX object tools/lld/ELF/CMakeFiles/lldELF.dir/Writer.cpp.o
50.272 [114/5/6779] Building CXX object tools/lldb/source/Plugins/SymbolFile/DWARF/CMakeFiles/lldbPluginSymbolFileDWARF.dir/SymbolFileDWARF.cpp.o
FAILED: tools/lldb/source/Plugins/SymbolFile/DWARF/CMakeFiles/lldbPluginSymbolFileDWARF.dir/SymbolFileDWARF.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /usr/bin/clang++ -DGTEST_HAS_RTTI=0 -DHAVE_ROUND -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/b/1/llvm-x86_64-debian-dylib/build/tools/lldb/source/Plugins/SymbolFile/DWARF -I/b/1/llvm-x86_64-debian-dylib/llvm-project/lldb/source/Plugins/SymbolFile/DWARF -I/b/1/llvm-x86_64-debian-dylib/llvm-project/lldb/include -I/b/1/llvm-x86_64-debian-dylib/build/tools/lldb/include -I/b/1/llvm-x86_64-debian-dylib/build/include -I/b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/include -I/b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/../clang/include -I/b/1/llvm-x86_64-debian-dylib/build/tools/lldb/../clang/include -I/b/1/llvm-x86_64-debian-dylib/llvm-project/lldb/source -I/b/1/llvm-x86_64-debian-dylib/build/tools/lldb/source -isystem /usr/include/libxml2 -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wno-deprecated-declarations -Wno-unknown-pragmas -Wno-strict-aliasing -Wno-deprecated-register -Wno-vla-extension -O3 -DNDEBUG  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -std=c++17 -MD -MT tools/lldb/source/Plugins/SymbolFile/DWARF/CMakeFiles/lldbPluginSymbolFileDWARF.dir/SymbolFileDWARF.cpp.o -MF tools/lldb/source/Plugins/SymbolFile/DWARF/CMakeFiles/lldbPluginSymbolFileDWARF.dir/SymbolFileDWARF.cpp.o.d -o tools/lldb/source/Plugins/SymbolFile/DWARF/CMakeFiles/lldbPluginSymbolFileDWARF.dir/SymbolFileDWARF.cpp.o -c /b/1/llvm-x86_64-debian-dylib/llvm-project/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
/b/1/llvm-x86_64-debian-dylib/llvm-project/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp:1749:1: error: expected expression
\    if (SymbolFileDWARFDebugMap *debug_map = GetDebugMapSymfile())
^
1 error generated.
53.142 [114/4/6780] Building CXX object tools/lld/ELF/CMakeFiles/lldELF.dir/InputFiles.cpp.o
55.217 [114/3/6781] Building CXX object tools/lld/ELF/CMakeFiles/lldELF.dir/Driver.cpp.o
55.652 [114/2/6782] Building CXX object tools/lldb/source/Plugins/SymbolFile/DWARF/CMakeFiles/lldbPluginSymbolFileDWARF.dir/DWARFASTParserClang.cpp.o
57.506 [114/1/6783] Building CXX object tools/lld/ELF/CMakeFiles/lldELF.dir/SyntheticSections.cpp.o
ninja: build stopped: subcommand failed.

@labath
Copy link
Collaborator

labath commented Jun 25, 2024

It looks like the test is flaky: https://lab.llvm.org/buildbot/#/builders/59/builds/535

It looks like the order of the types is nondeterministic. I think you should be able to use CHECK-DAG along with this trick to match newlines to make the test accept both orderings.

@vvereschaka
Copy link
Contributor

This build got the same failed test on the windows host - https://lab.llvm.org/buildbot/#/builders/141/builds/318

Failed Tests (1):
  lldb-shell :: SymbolFile/DWARF/x86/dwp-foreign-type-units.cpp
# .---command stderr------------
# | C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\SymbolFile\DWARF\x86\dwp-foreign-type-units.cpp:34:16: error: NODWP-NEXT: is not on the line after the previous match
# | // NODWP-NEXT: unsigned int
# |                ^
# | <stdin>:20:10: note: 'next' match was here
# |  typedef unsigned int IntegerType;
# |          ^
# | <stdin>:7:13: note: previous match ended here
# | unsigned int
# |             ^
# | <stdin>:8:1: note: non-matching line after previous match is here
# | int
# | ^
# | 
# | Input file: <stdin>
# | Check file: C:\Users\tcwg\llvm-worker\lldb-aarch64-windows\llvm-project\lldb\test\Shell\SymbolFile\DWARF\x86\dwp-foreign-type-units.cpp
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |          .
# |          .
# |          .
# |         15:  typedef double FloatType; 
# |         16:  CustomType::IntegerType x; 
# |         17:  CustomType::FloatType y; 
# |         18: } 
# |         19: struct CustomType { 
# |         20:  typedef unsigned int IntegerType; 
# | next:34              !~~~~~~~~~~~               error: match on wrong line
# |         21:  typedef float FloatType; 
# |         22:  CustomType::IntegerType x; 
# |         23:  CustomType::FloatType y; 
# |         24: } 
# | >>>>>>
# `-----------------------------
# error: command failed with exit status: 1

@dwblaikie
Copy link
Collaborator

It looks like the test is flaky: https://lab.llvm.org/buildbot/#/builders/59/builds/535

It looks like the order of the types is nondeterministic. I think you should be able to use CHECK-DAG along with this trick to match newlines to make the test accept both orderings.

What are lldb's guarantees in this regard? Clang/LLVM/etc require nondeterministic output - presumably the same would be valuable to lldb, but I don't know what lldb's precedents are in this regard.

@clayborg
Copy link
Collaborator Author

Here is a PR to fix the flaky test:

#96800

@labath
Copy link
Collaborator

labath commented Jun 27, 2024

It looks like the test is flaky: https://lab.llvm.org/buildbot/#/builders/59/builds/535
It looks like the order of the types is nondeterministic. I think you should be able to use CHECK-DAG along with this trick to match newlines to make the test accept both orderings.

What are lldb's guarantees in this regard? Clang/LLVM/etc require nondeterministic output - presumably the same would be valuable to lldb, but I don't know what lldb's precedents are in this regard.

It would be useful, but we've never made a serious effort to try to achieve that.

AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
This patch adds support for the new foreign type unit support in
.debug_names. Features include:
- don't manually index foreign TUs if we have info for them
- only use the type unit entries that match the .dwo files when we have
a .dwp file
- fix type unit lookups for .dwo files
- fix crashers that happen due to PeekDIEName() using wrong offsets where an entry had DW_IDX_comp_unit and DW_IDX_type_unit entries and when we had no type unit support, it would cause us to think it was a normal DIE in .debug_info from the main executable.

---------

Co-authored-by: paperchalice <[email protected]>
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants