Skip to content

[lldb] Support disassembling discontinuous functions #126505

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 12, 2025
Merged

Conversation

labath
Copy link
Collaborator

@labath labath commented Feb 10, 2025

The command already supported disassembling multiple ranges, among other reasons because inline functions can be discontinuous. The main thing that was missing was being able to retrieve the function ranges from the top level function object.

The output of the command for the case where the function entry point is not its lowest address is somewhat confusing (we're showing negative offsets), but it is correct.

@llvmbot
Copy link
Member

llvmbot commented Feb 10, 2025

@llvm/pr-subscribers-lldb

Author: Pavel Labath (labath)

Changes

The command already supported disassembling multiple ranges, among other reasons because inline functions can be discontinuous. The main thing that was missing was being able to retrieve the function ranges from the top level function object.

The output of the command for the case where the function entry point is not its lowest address is somewhat confusing (we're showing negative offsets), but it is correct.


Full diff: https://github.com/llvm/llvm-project/pull/126505.diff

4 Files Affected:

  • (modified) lldb/source/Commands/CommandObjectDisassemble.cpp (+47-41)
  • (modified) lldb/source/Commands/CommandObjectDisassemble.h (+2-1)
  • (modified) lldb/source/Symbol/SymbolContext.cpp (+2-2)
  • (modified) lldb/test/Shell/Commands/command-disassemble.s (+103-9)
diff --git a/lldb/source/Commands/CommandObjectDisassemble.cpp b/lldb/source/Commands/CommandObjectDisassemble.cpp
index 5b131fe86dedbae..d116188966daa49 100644
--- a/lldb/source/Commands/CommandObjectDisassemble.cpp
+++ b/lldb/source/Commands/CommandObjectDisassemble.cpp
@@ -21,6 +21,7 @@
 #include "lldb/Target/SectionLoadList.h"
 #include "lldb/Target/StackFrame.h"
 #include "lldb/Target/Target.h"
+#include <iterator>
 
 static constexpr unsigned default_disasm_byte_size = 32;
 static constexpr unsigned default_disasm_num_ins = 4;
@@ -236,19 +237,25 @@ CommandObjectDisassemble::CommandObjectDisassemble(
 
 CommandObjectDisassemble::~CommandObjectDisassemble() = default;
 
-llvm::Error CommandObjectDisassemble::CheckRangeSize(const AddressRange &range,
-                                                     llvm::StringRef what) {
+llvm::Expected<std::vector<AddressRange>>
+CommandObjectDisassemble::CheckRangeSize(std::vector<AddressRange> ranges,
+                                         llvm::StringRef what) {
+  addr_t total_range_size = 0;
+  for (const AddressRange &r : ranges)
+    total_range_size += r.GetByteSize();
+
   if (m_options.num_instructions > 0 || m_options.force ||
-      range.GetByteSize() < GetDebugger().GetStopDisassemblyMaxSize())
-    return llvm::Error::success();
+      total_range_size < GetDebugger().GetStopDisassemblyMaxSize())
+    return ranges;
+
   StreamString msg;
   msg << "Not disassembling " << what << " because it is very large ";
-  range.Dump(&msg, &GetTarget(), Address::DumpStyleLoadAddress,
-             Address::DumpStyleFileAddress);
+  for (const AddressRange &r: ranges)
+    r.Dump(&msg, &GetTarget(), Address::DumpStyleLoadAddress,
+           Address::DumpStyleFileAddress);
   msg << ". To disassemble specify an instruction count limit, start/stop "
          "addresses or use the --force option.";
-  return llvm::createStringError(llvm::inconvertibleErrorCode(),
-                                 msg.GetString());
+  return llvm::createStringError(msg.GetString());
 }
 
 llvm::Expected<std::vector<AddressRange>>
@@ -262,9 +269,11 @@ CommandObjectDisassemble::GetContainingAddressRanges() {
         addr, eSymbolContextEverything, sc, resolve_tail_call_address);
     if (sc.function || sc.symbol) {
       AddressRange range;
-      sc.GetAddressRange(eSymbolContextFunction | eSymbolContextSymbol, 0,
-                         false, range);
-      ranges.push_back(range);
+      for (uint32_t idx = 0;
+           sc.GetAddressRange(eSymbolContextFunction | eSymbolContextSymbol,
+                              idx, false, range);
+           ++idx)
+        ranges.push_back(range);
     }
   };
 
@@ -292,9 +301,7 @@ CommandObjectDisassemble::GetContainingAddressRanges() {
         m_options.symbol_containing_addr);
   }
 
-  if (llvm::Error err = CheckRangeSize(ranges[0], "the function"))
-    return std::move(err);
-  return ranges;
+  return CheckRangeSize(std::move(ranges), "the function");
 }
 
 llvm::Expected<std::vector<AddressRange>>
@@ -304,29 +311,24 @@ CommandObjectDisassemble::GetCurrentFunctionRanges() {
   if (!frame) {
     if (process) {
       return llvm::createStringError(
-          llvm::inconvertibleErrorCode(),
-          "Cannot disassemble around the current "
-          "function without the process being stopped.\n");
-    } else {
-      return llvm::createStringError(llvm::inconvertibleErrorCode(),
-                                     "Cannot disassemble around the current "
-                                     "function without a selected frame: "
-                                     "no currently running process.\n");
+          "Cannot disassemble around the current function without the process "
+          "being stopped.\n");
     }
+    return llvm::createStringError(
+        "Cannot disassemble around the current function without a selected "
+        "frame: no currently running process.\n");
   }
-  SymbolContext sc(
-      frame->GetSymbolContext(eSymbolContextFunction | eSymbolContextSymbol));
-  AddressRange range;
+  SymbolContext sc =
+      frame->GetSymbolContext(eSymbolContextFunction | eSymbolContextSymbol);
+  std::vector<AddressRange> ranges;
   if (sc.function)
-    range = sc.function->GetAddressRange();
-  else if (sc.symbol && sc.symbol->ValueIsAddress()) {
-    range = {sc.symbol->GetAddress(), sc.symbol->GetByteSize()};
-  } else
-    range = {frame->GetFrameCodeAddress(), default_disasm_byte_size};
-
-  if (llvm::Error err = CheckRangeSize(range, "the current function"))
-    return std::move(err);
-  return std::vector<AddressRange>{range};
+    ranges = sc.function->GetAddressRanges();
+  else if (sc.symbol && sc.symbol->ValueIsAddress())
+    ranges.emplace_back(sc.symbol->GetAddress(), sc.symbol->GetByteSize());
+  else
+    ranges.emplace_back(frame->GetFrameCodeAddress(), default_disasm_byte_size);
+
+  return CheckRangeSize(std::move(ranges), "the current function");
 }
 
 llvm::Expected<std::vector<AddressRange>>
@@ -372,19 +374,23 @@ CommandObjectDisassemble::GetNameRanges(CommandReturnObject &result) {
 
   std::vector<AddressRange> ranges;
   llvm::Error range_errs = llvm::Error::success();
-  AddressRange range;
   const uint32_t scope =
       eSymbolContextBlock | eSymbolContextFunction | eSymbolContextSymbol;
   const bool use_inline_block_range = true;
   for (SymbolContext sc : sc_list.SymbolContexts()) {
+    std::vector<AddressRange> fn_ranges;
+    AddressRange range;
     for (uint32_t range_idx = 0;
          sc.GetAddressRange(scope, range_idx, use_inline_block_range, range);
-         ++range_idx) {
-      if (llvm::Error err = CheckRangeSize(range, "a range"))
-        range_errs = joinErrors(std::move(range_errs), std::move(err));
-      else
-        ranges.push_back(range);
-    }
+         ++range_idx)
+      fn_ranges.push_back(std::move(range));
+
+    if (llvm::Expected<std::vector<AddressRange>> checked_ranges =
+            CheckRangeSize(std::move(fn_ranges), "a function"))
+      llvm::move(*checked_ranges, std::back_inserter(ranges));
+    else
+      range_errs =
+          joinErrors(std::move(range_errs), checked_ranges.takeError());
   }
   if (ranges.empty()) {
     if (range_errs)
diff --git a/lldb/source/Commands/CommandObjectDisassemble.h b/lldb/source/Commands/CommandObjectDisassemble.h
index f9cba1e5ae9cb6c..4fbcd72d1c0425e 100644
--- a/lldb/source/Commands/CommandObjectDisassemble.h
+++ b/lldb/source/Commands/CommandObjectDisassemble.h
@@ -100,7 +100,8 @@ class CommandObjectDisassemble : public CommandObjectParsed {
   llvm::Expected<std::vector<AddressRange>> GetPCRanges();
   llvm::Expected<std::vector<AddressRange>> GetStartEndAddressRanges();
 
-  llvm::Error CheckRangeSize(const AddressRange &range, llvm::StringRef what);
+  llvm::Expected<std::vector<AddressRange>>
+  CheckRangeSize(std::vector<AddressRange> ranges, llvm::StringRef what);
 
   CommandOptions m_options;
 };
diff --git a/lldb/source/Symbol/SymbolContext.cpp b/lldb/source/Symbol/SymbolContext.cpp
index 19f4f91e29d2598..4725df52ff5592c 100644
--- a/lldb/source/Symbol/SymbolContext.cpp
+++ b/lldb/source/Symbol/SymbolContext.cpp
@@ -351,8 +351,8 @@ bool SymbolContext::GetAddressRange(uint32_t scope, uint32_t range_idx,
   }
 
   if ((scope & eSymbolContextFunction) && (function != nullptr)) {
-    if (range_idx == 0) {
-      range = function->GetAddressRange();
+    if (range_idx < function->GetAddressRanges().size()) {
+      range = function->GetAddressRanges()[range_idx];
       return true;
     }
   }
diff --git a/lldb/test/Shell/Commands/command-disassemble.s b/lldb/test/Shell/Commands/command-disassemble.s
index 1625f80468eb17f..951d96cefd4b9d5 100644
--- a/lldb/test/Shell/Commands/command-disassemble.s
+++ b/lldb/test/Shell/Commands/command-disassemble.s
@@ -82,20 +82,25 @@
 # CHECK-NEXT: (lldb) disassemble --name case2
 # CHECK-NEXT: command-disassemble.s.tmp`n1::case2:
 # CHECK-NEXT: command-disassemble.s.tmp[0x2044] <+0>: int    $0x32
-# CHECK-NEXT: warning: Not disassembling a range because it is very large [0x0000000000002046-0x0000000000004046). To disassemble specify an instruction count limit, start/stop addresses or use the --force option.
+# CHECK-NEXT: warning: Not disassembling a function because it is very large [0x0000000000002046-0x0000000000004046). To disassemble specify an instruction count limit, start/stop addresses or use the --force option.
 # CHECK-NEXT: (lldb) disassemble --name case3
-# CHECK-NEXT: error: Not disassembling a range because it is very large [0x0000000000004046-0x0000000000006046). To disassemble specify an instruction count limit, start/stop addresses or use the --force option.
-# CHECK-NEXT: Not disassembling a range because it is very large [0x0000000000006046-0x0000000000008046). To disassemble specify an instruction count limit, start/stop addresses or use the --force option.
+# CHECK-NEXT: error: Not disassembling a function because it is very large [0x0000000000006046-0x0000000000007046)[0x0000000000009046-0x000000000000a046). To disassemble specify an instruction count limit, start/stop addresses or use the --force option.
+# CHECK-NEXT: Not disassembling a function because it is very large [0x0000000000004046-0x0000000000006046). To disassemble specify an instruction count limit, start/stop addresses or use the --force option.
 # CHECK-NEXT: (lldb) disassemble --name case3 --count 3
+# CHECK-NEXT: command-disassemble.s.tmp`n2::case3:
+# CHECK-NEXT: command-disassemble.s.tmp[0x6046] <-12288>: int    $0x2a
+# CHECK-NEXT: command-disassemble.s.tmp[0x6048] <-12286>: int    $0x2a
+# CHECK-NEXT: command-disassemble.s.tmp[0x604a] <-12284>: int    $0x2a
+# CHECK-EMPTY:
+# CHECK-NEXT: command-disassemble.s.tmp`n2::case3:
+# CHECK-NEXT: command-disassemble.s.tmp[0x9046] <+0>: int    $0x2a
+# CHECK-NEXT: command-disassemble.s.tmp[0x9048] <+2>: int    $0x2a
+# CHECK-NEXT: command-disassemble.s.tmp[0x904a] <+4>: int    $0x2a
+# CHECK-EMPTY:
 # CHECK-NEXT: command-disassemble.s.tmp`n1::case3:
 # CHECK-NEXT: command-disassemble.s.tmp[0x4046] <+0>: int    $0x2a
 # CHECK-NEXT: command-disassemble.s.tmp[0x4048] <+2>: int    $0x2a
 # CHECK-NEXT: command-disassemble.s.tmp[0x404a] <+4>: int    $0x2a
-# CHECK-EMPTY:
-# CHECK-NEXT: command-disassemble.s.tmp`n2::case3:
-# CHECK-NEXT: command-disassemble.s.tmp[0x6046] <+0>: int    $0x2a
-# CHECK-NEXT: command-disassemble.s.tmp[0x6048] <+2>: int    $0x2a
-# CHECK-NEXT: command-disassemble.s.tmp[0x604a] <+4>: int    $0x2a
 # CHECK-EMPTY:
 
 
@@ -158,8 +163,97 @@ _ZN2n15case3Ev:
         .rept 0x1000
         int $42
         .endr
+        .size _ZN2n15case3Ev, .-_ZN2n15case3Ev
 
-_ZN2n25case3Ev:
+.L_ZN2n25case3Ev.__part.1:
+        .rept 0x800
+        int $42
+        .endr
+.L_ZN2n25case3Ev.__part.1_end:
+
+.Lpadding:
         .rept 0x1000
         int $42
         .endr
+
+_ZN2n25case3Ev:
+        .rept 0x800
+        int $42
+        .endr
+.L_ZN2n25case3Ev_end:
+
+        .section        .debug_abbrev,"",@progbits
+        .byte   1                               # Abbreviation Code
+        .byte   17                              # DW_TAG_compile_unit
+        .byte   1                               # DW_CHILDREN_yes
+        .byte   37                              # DW_AT_producer
+        .byte   8                               # DW_FORM_string
+        .byte   19                              # DW_AT_language
+        .byte   5                               # DW_FORM_data2
+        .byte   17                              # DW_AT_low_pc
+        .byte   1                               # DW_FORM_addr
+        .byte   85                              # DW_AT_ranges
+        .byte   23                              # DW_FORM_sec_offset
+        .byte   0                               # EOM(1)
+        .byte   0                               # EOM(2)
+        .byte   2                               # Abbreviation Code
+        .byte   57                              # DW_TAG_namespace
+        .byte   1                               # DW_CHILDREN_yes
+        .byte   3                               # DW_AT_name
+        .byte   8                               # DW_FORM_string
+        .byte   0                               # EOM(1)
+        .byte   0                               # EOM(2)
+        .byte   3                               # Abbreviation Code
+        .byte   46                              # DW_TAG_subprogram
+        .byte   0                               # DW_CHILDREN_no
+        .byte   85                              # DW_AT_ranges
+        .byte   23                              # DW_FORM_sec_offset
+        .byte   3                               # DW_AT_name
+        .byte   8                               # DW_FORM_string
+        .byte   110                             # DW_AT_linkage_name
+        .byte   8                               # DW_FORM_string
+        .byte   0                               # EOM(1)
+        .byte   0                               # EOM(2)
+        .byte   0                               # EOM(3)
+
+        .section        .debug_info,"",@progbits
+.Lcu_begin0:
+        .long   .Ldebug_info_end0-.Ldebug_info_start0 # Length of Unit
+.Ldebug_info_start0:
+        .short  5                               # DWARF version number
+        .byte   1                               # DWARF Unit Type
+        .byte   8                               # Address Size (in bytes)
+        .long   .debug_abbrev                   # Offset Into Abbrev. Section
+        .byte   1                               # Abbrev DW_TAG_compile_unit
+        .asciz  "Hand-written DWARF"            # DW_AT_producer
+        .short  29                              # DW_AT_language
+        .quad   0                               # DW_AT_low_pc
+        .long   .Ldebug_ranges0                 # DW_AT_ranges
+        .byte   2                               # Abbrev DW_TAG_namespace
+        .asciz  "n2"                            # DW_AT_name
+        .byte   3                               # Abbrev DW_TAG_subprogram
+        .long   .Ldebug_ranges0                 # DW_AT_ranges
+        .asciz  "case3"                         # DW_AT_name
+        .asciz  "_ZN2n25case3Ev"                # DW_AT_linkage_name
+        .byte   0                               # End Of Children Mark
+        .byte   0                               # End Of Children Mark
+.Ldebug_info_end0:
+
+        .section        .debug_rnglists,"",@progbits
+        .long   .Ldebug_list_header_end0-.Ldebug_list_header_start0 # Length
+.Ldebug_list_header_start0:
+        .short  5                               # Version
+        .byte   8                               # Address size
+        .byte   0                               # Segment selector size
+        .long   2                               # Offset entry count
+.Lrnglists_table_base0:
+        .long   .Ldebug_ranges0-.Lrnglists_table_base0
+.Ldebug_ranges0:
+        .byte   6                               # DW_RLE_start_end
+        .quad _ZN2n25case3Ev
+        .quad .L_ZN2n25case3Ev_end
+        .byte   6                               # DW_RLE_start_end
+        .quad .L_ZN2n25case3Ev.__part.1
+        .quad .L_ZN2n25case3Ev.__part.1_end
+        .byte   0                               # DW_RLE_end_of_list
+.Ldebug_list_header_end0:

Copy link

github-actions bot commented Feb 10, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

The command already supported disassembling multiple ranges, among other
reasons because inline functions can be discontinuous. The main thing
that was missing was being able to retrieve the function ranges from the
top level function object.

The output of the command for the case where the function entry point is
not its lowest address is somewhat confusing (we're showing negative
offsets), but it is correct.
Copy link
Collaborator

@DavidSpickett DavidSpickett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Negative offset makes sense to me given that it's relative to the symbol of the function. Will be interesting to see how users interpret it though.

Copy link
Collaborator

@DavidSpickett DavidSpickett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@labath labath merged commit 37f36cb into llvm:main Feb 12, 2025
7 checks passed
@labath labath deleted the disasm branch February 12, 2025 09:47
labath added a commit to labath/llvm-project that referenced this pull request Feb 12, 2025
…embling

We need to iterate through the all symbol context ranges returned by
(since llvm#126505) SymbolContext::GetAddressRange. This also includes a fix
to print the function offsets as signed values.

I've also wanted to check that the addresses which are in the middle of
the function do *not* resolve to the function, but that's not entirely
the case right now. This appears to be a separate issue though, so I've
just left a TODO for now.
labath added a commit that referenced this pull request Feb 13, 2025
…embling (#126925)

We need to iterate through the all symbol context ranges returned by
(since #126505) SymbolContext::GetAddressRange. This also includes a fix
to print the function offsets as signed values.

I've also wanted to check that the addresses which are in the middle of
the function do *not* resolve to the function, but that's not entirely
the case right now. This appears to be a separate issue though, so I've
just left a TODO for now.
flovent pushed a commit to flovent/llvm-project that referenced this pull request Feb 13, 2025
The command already supported disassembling multiple ranges, among other
reasons because inline functions can be discontinuous. The main thing
that was missing was being able to retrieve the function ranges from the
top level function object.

The output of the command for the case where the function entry point is
not its lowest address is somewhat confusing (we're showing negative
offsets), but it is correct.
flovent pushed a commit to flovent/llvm-project that referenced this pull request Feb 13, 2025
…embling (llvm#126925)

We need to iterate through the all symbol context ranges returned by
(since llvm#126505) SymbolContext::GetAddressRange. This also includes a fix
to print the function offsets as signed values.

I've also wanted to check that the addresses which are in the middle of
the function do *not* resolve to the function, but that's not entirely
the case right now. This appears to be a separate issue though, so I've
just left a TODO for now.
joaosaffran pushed a commit to joaosaffran/llvm-project that referenced this pull request Feb 14, 2025
The command already supported disassembling multiple ranges, among other
reasons because inline functions can be discontinuous. The main thing
that was missing was being able to retrieve the function ranges from the
top level function object.

The output of the command for the case where the function entry point is
not its lowest address is somewhat confusing (we're showing negative
offsets), but it is correct.
joaosaffran pushed a commit to joaosaffran/llvm-project that referenced this pull request Feb 14, 2025
…embling (llvm#126925)

We need to iterate through the all symbol context ranges returned by
(since llvm#126505) SymbolContext::GetAddressRange. This also includes a fix
to print the function offsets as signed values.

I've also wanted to check that the addresses which are in the middle of
the function do *not* resolve to the function, but that's not entirely
the case right now. This appears to be a separate issue though, so I've
just left a TODO for now.
sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this pull request Feb 24, 2025
The command already supported disassembling multiple ranges, among other
reasons because inline functions can be discontinuous. The main thing
that was missing was being able to retrieve the function ranges from the
top level function object.

The output of the command for the case where the function entry point is
not its lowest address is somewhat confusing (we're showing negative
offsets), but it is correct.
sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this pull request Feb 24, 2025
…embling (llvm#126925)

We need to iterate through the all symbol context ranges returned by
(since llvm#126505) SymbolContext::GetAddressRange. This also includes a fix
to print the function offsets as signed values.

I've also wanted to check that the addresses which are in the middle of
the function do *not* resolve to the function, but that's not entirely
the case right now. This appears to be a separate issue though, so I've
just left a TODO for now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants