[lldb] Fix Block::GetRangeIndexContainingAddress for discontinuous functions #124931

labath · 2025-01-29T15:21:17Z

This is a followup to #122440, which changed function-relative calculations to use the function entry point rather than the lowest address of the function (but missed this usage). Like in #116777, the logic is changed to use file addresses instead of section offsets (as not all parts of the function have to be in the same section).

…nctions This is a followup to llvm#122440, which changed function-relative calculations to use the function entry point rather than the lowest address of the function (but missed this usage). Like in llvm#116777, the logic is changed to use file addresses instead of section offsets (as not all parts of the function have to be in the same section).

llvmbot · 2025-01-29T15:21:59Z

@llvm/pr-subscribers-lldb

Author: Pavel Labath (labath)

Changes

This is a followup to #122440, which changed function-relative calculations to use the function entry point rather than the lowest address of the function (but missed this usage). Like in #116777, the logic is changed to use file addresses instead of section offsets (as not all parts of the function have to be in the same section).

Full diff: https://github.com/llvm/llvm-project/pull/124931.diff

2 Files Affected:

(modified) lldb/source/Symbol/Block.cpp (+19-32)
(modified) lldb/test/Shell/ScriptInterpreter/Python/sb_function_ranges.s (+24-13)

diff --git a/lldb/source/Symbol/Block.cpp b/lldb/source/Symbol/Block.cpp
index 139fa06d08fca5..bddb015333e09c 100644
--- a/lldb/source/Symbol/Block.cpp
+++ b/lldb/source/Symbol/Block.cpp
@@ -243,25 +243,15 @@ bool Block::GetRangeContainingAddress(const Address &addr,
                                       AddressRange &range) {
   Function *function = CalculateSymbolContextFunction();
   if (function) {
-    const AddressRange &func_range = function->GetAddressRange();
-    if (addr.GetModule() == func_range.GetBaseAddress().GetModule()) {
-      const addr_t file_addr = addr.GetFileAddress();
-      const addr_t func_file_addr =
-          func_range.GetBaseAddress().GetFileAddress();
-      if (file_addr >= func_file_addr &&
-          file_addr < func_file_addr + func_range.GetByteSize()) {
-        addr_t offset = file_addr - func_file_addr;
-
-        const Range *range_ptr = m_ranges.FindEntryThatContains(offset);
-
-        if (range_ptr) {
-          range.GetBaseAddress() =
-              Address(func_file_addr + range_ptr->GetRangeBase(),
-                      addr.GetModule()->GetSectionList());
-          range.SetByteSize(range_ptr->GetByteSize());
-          return true;
-        }
-      }
+    if (uint32_t idx = GetRangeIndexContainingAddress(addr);
+        idx != UINT32_MAX) {
+      const Range *range_ptr = m_ranges.GetEntryAtIndex(idx);
+      assert(range_ptr);
+
+      range.GetBaseAddress() = function->GetAddress();
+      range.GetBaseAddress().Slide(range_ptr->GetRangeBase());
+      range.SetByteSize(range_ptr->GetByteSize());
+      return true;
     }
   }
   range.Clear();
@@ -278,19 +268,16 @@ bool Block::GetRangeContainingLoadAddress(lldb::addr_t load_addr,
 
 uint32_t Block::GetRangeIndexContainingAddress(const Address &addr) {
   Function *function = CalculateSymbolContextFunction();
-  if (function) {
-    const AddressRange &func_range = function->GetAddressRange();
-    if (addr.GetSection() == func_range.GetBaseAddress().GetSection()) {
-      const addr_t addr_offset = addr.GetOffset();
-      const addr_t func_offset = func_range.GetBaseAddress().GetOffset();
-      if (addr_offset >= func_offset &&
-          addr_offset < func_offset + func_range.GetByteSize()) {
-        addr_t offset = addr_offset - func_offset;
-        return m_ranges.FindEntryIndexThatContains(offset);
-      }
-    }
-  }
-  return UINT32_MAX;
+  if (!function)
+    return UINT32_MAX;
+
+  const Address &func_addr = function->GetAddress();
+  if (addr.GetModule() != func_addr.GetModule())
+    return UINT32_MAX;
+
+  const addr_t file_addr = addr.GetFileAddress();
+  const addr_t func_file_addr = func_addr.GetFileAddress();
+  return m_ranges.FindEntryIndexThatContains(file_addr - func_file_addr);
 }
 
 bool Block::GetRangeAtIndex(uint32_t range_idx, AddressRange &range) {
diff --git a/lldb/test/Shell/ScriptInterpreter/Python/sb_function_ranges.s b/lldb/test/Shell/ScriptInterpreter/Python/sb_function_ranges.s
index 2e2bc52cd3ff99..4d42c50467da06 100644
--- a/lldb/test/Shell/ScriptInterpreter/Python/sb_function_ranges.s
+++ b/lldb/test/Shell/ScriptInterpreter/Python/sb_function_ranges.s
@@ -6,15 +6,22 @@
 
 # CHECK: Found 1 function(s).
 # CHECK: foo: [input.o[0x0-0xe), input.o[0x14-0x1c)]
-# CHECK-NEXT: input.o[0x0]: cmpl   $0x0, %edi
-# CHECK-NEXT: input.o[0x3]: je     0x14
-# CHECK-NEXT: input.o[0x5]: jmp    0x7
-# CHECK-NEXT: input.o[0x7]: callq  0xe
-# CHECK-NEXT: input.o[0xc]: jmp    0x1b
+# CHECK-NEXT: input.o[0x0]: callq  0xe
+# CHECK-NEXT: input.o[0x5]: jmp    0x1b
+# CHECK-NEXT: input.o[0x7]: cmpl   $0x0, %edi
+# CHECK-NEXT: input.o[0xa]: je     0x14
+# CHECK-NEXT: input.o[0xc]: jmp    0x0
 # CHECK-EMPTY:
 # CHECK-NEXT: input.o[0x14]: callq  0x19
 # CHECK-NEXT: input.o[0x19]: jmp    0x1b
 # CHECK-NEXT: input.o[0x1b]: retq
+# CHECK-NEXT: offset 0x00 => index 0
+# CHECK-NEXT: offset 0x0c => index 0
+# CHECK-NEXT: offset 0x0e => index ffffffff
+# CHECK-NEXT: offset 0x13 => index ffffffff
+# CHECK-NEXT: offset 0x14 => index 1
+# CHECK-NEXT: offset 0x1b => index 1
+# CHECK-NEXT: offset 0x1c => index ffffffff
 
 
 #--- script.py
@@ -28,6 +35,10 @@ def __lldb_init_module(debugger, internal_dict):
     fn = ctx.function
     print(f"{fn.name}: {fn.GetRanges()}")
     print(fn.GetInstructions(target))
+    text = fn.addr.section
+    for offset in [0x00, 0x0c, 0x0e, 0x13, 0x14, 0x1b, 0x1c]:
+      idx = fn.block.GetRangeIndexForBlockAddress(lldb.SBAddress(text, offset))
+      print(f"offset 0x{offset:02x} => index {idx:x}")
 
 #--- input.s
 # An example of a function which has been split into two parts. Roughly
@@ -40,6 +51,14 @@ def __lldb_init_module(debugger, internal_dict):
         .text
 
         .type   foo,@function
+foo.__part.1:
+        .cfi_startproc
+        callq   bar
+        jmp     foo.__part.3
+.Lfoo.__part.1_end:
+        .size   foo.__part.1, .Lfoo.__part.1_end-foo.__part.1
+        .cfi_endproc
+
 foo:
         .cfi_startproc
         cmpl    $0, %edi
@@ -49,14 +68,6 @@ foo:
 .Lfoo_end:
         .size   foo, .Lfoo_end-foo
 
-foo.__part.1:
-        .cfi_startproc
-        callq   bar
-        jmp     foo.__part.3
-.Lfoo.__part.1_end:
-        .size   foo.__part.1, .Lfoo.__part.1_end-foo.__part.1
-        .cfi_endproc
-
 bar:
         .cfi_startproc
         movl    $47, %eax

DavidSpickett

I've looked at this a couple of times and on a general level, it seems fine, but all the address vs. file address stuff is confusing to me. I'm just guessing what each one is really.

So I have added some questions about that, maybe I'll be able to approve after those, or Jason can appear, whichever comes first.

DavidSpickett · 2025-02-03T14:52:39Z

lldb/source/Symbol/Block.cpp

+      assert(range_ptr);
+
+      range.GetBaseAddress() = function->GetAddress();
+      range.GetBaseAddress().Slide(range_ptr->GetRangeBase());


function->GetAddress() returns the address relative to the start of the range, then you slide that address up to get the absolute address of the function, right?

function->GetAddress() returns the address of the start of the function. The address is internally stored as a section+offset pair, but for the most part, you can think of it as an absolute address of the function. Block ranges are internally stored as pairs of integers (offset+size), where the offset is relative to the start of the function. This converts the internal (implicitly relative to section start) representation of the range to the external, absolute (well, section-relative) representation.
It basically does that by adjusting the section offset, but it actually does that incorrectly since different parts of the function may be in different sections. So, the correct thing to do is to convert the address into a file address (which is relative to the "base address" of the file -- what ELF would call "virtual address"), do the subtraction there, and then re-resolve the result into a section-relative address.

My updated version will do that.

lldb/source/Symbol/Block.cpp

DavidSpickett · 2025-02-03T14:55:22Z

lldb/test/Shell/ScriptInterpreter/Python/sb_function_ranges.s

+# CHECK-NEXT: offset 0x13 => index ffffffff
+# CHECK-NEXT: offset 0x14 => index 1
+# CHECK-NEXT: offset 0x1b => index 1
+# CHECK-NEXT: offset 0x1c => index ffffffff


I have zero idea what's going on here.

You moved foo.__part.1 before foo, presumably because that would give you the wrong answers if it weren't for the changes you made elsewhere?

Correct, GetRangeIndexForBlockAddress would return wrong results because it assumed that the internal block representation was relative to the lowest address in the function (fn->GetAddressRange().GetBaseAddress()) but they really are (as of #122440) relative to the entry point.

And this is a slightly funny way of testing that interface -- I print the function results in python and then FileCheck the result.

ffffffff means that it's not part of a block, is that right?

So if I have a function that does not cover the offset I ask for, it would return this failure value.

Add a comment there saying so, it will help someone who is reading because this failed out of the blue sometime.

labath · 2025-02-04T16:42:31Z

Thanks for taking a look. I've tried to explain what the code that, but I don't know if I've succeeded. The functionality is at some level quite simple, but wrapping your head around the different representations is very tricky. Let me know if there's something I need to elaborate on.

labath · 2025-02-12T10:38:11Z

Any more thoughts on this?

DavidSpickett

With the 1 extra comment, I understand this now, LGTM.

…nctions (llvm#124931) This is a followup to llvm#122440, which changed function-relative calculations to use the function entry point rather than the lowest address of the function (but missed this usage). Like in llvm#116777, the logic is changed to use file addresses instead of section offsets (as not all parts of the function have to be in the same section).

Continuing the theme from llvm#116777 and llvm#124931, this patch ensures we compute the correct address when a functions is spread across multiple sections. Due to this, it's not sufficient to adjust the offset in the section+offset pair (Address::Slide). We must actually slide the file offset and then recompute the section using the result.

#137955) Continuing the theme from #116777 and #124931, this patch ensures we compute the correct address when a functions is spread across multiple sections. Due to this, it's not sufficient to adjust the offset in the section+offset pair (Address::Slide). We must actually slide the file offset and then recompute the section using the result. I found this out due to a failure to disassemble some parts of the function, so I'm testing with that, although it's likely there are other things that were broken due to this.

llvm#137955) Continuing the theme from llvm#116777 and llvm#124931, this patch ensures we compute the correct address when a functions is spread across multiple sections. Due to this, it's not sufficient to adjust the offset in the section+offset pair (Address::Slide). We must actually slide the file offset and then recompute the section using the result. I found this out due to a failure to disassemble some parts of the function, so I'm testing with that, although it's likely there are other things that were broken due to this.

labath requested review from DavidSpickett and jasonmolenda January 29, 2025 15:21

labath requested a review from JDevlieghere as a code owner January 29, 2025 15:21

llvmbot added the lldb label Jan 29, 2025

DavidSpickett reviewed Feb 3, 2025

View reviewed changes

fix sliding

94ec2e9

DavidSpickett approved these changes Feb 12, 2025

View reviewed changes

Add comment

362f8f4

labath merged commit 298caeb into llvm:main Feb 13, 2025
7 checks passed

labath deleted the block-fn branch February 13, 2025 09:08

labath mentioned this pull request Apr 30, 2025

[lldb] Fix block address resolution for functions in multiple sections #137955

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[lldb] Fix Block::GetRangeIndexContainingAddress for discontinuous functions #124931

[lldb] Fix Block::GetRangeIndexContainingAddress for discontinuous functions #124931

Uh oh!

labath commented Jan 29, 2025

Uh oh!

llvmbot commented Jan 29, 2025

Uh oh!

DavidSpickett left a comment

Uh oh!

DavidSpickett Feb 3, 2025

Uh oh!

labath Feb 4, 2025

Uh oh!

Uh oh!

DavidSpickett Feb 3, 2025

Uh oh!

labath Feb 4, 2025

Uh oh!

DavidSpickett Feb 12, 2025

Uh oh!

labath Feb 12, 2025

Uh oh!

DavidSpickett Feb 12, 2025

Uh oh!

labath commented Feb 4, 2025

Uh oh!

labath commented Feb 12, 2025

Uh oh!

DavidSpickett left a comment

Uh oh!

Uh oh!

Uh oh!

[lldb] Fix Block::GetRangeIndexContainingAddress for discontinuous functions #124931

[lldb] Fix Block::GetRangeIndexContainingAddress for discontinuous functions #124931

Uh oh!

Conversation

labath commented Jan 29, 2025

Uh oh!

llvmbot commented Jan 29, 2025

Uh oh!

DavidSpickett left a comment

Choose a reason for hiding this comment

Uh oh!

DavidSpickett Feb 3, 2025

Choose a reason for hiding this comment

Uh oh!

labath Feb 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DavidSpickett Feb 3, 2025

Choose a reason for hiding this comment

Uh oh!

labath Feb 4, 2025

Choose a reason for hiding this comment

Uh oh!

DavidSpickett Feb 12, 2025

Choose a reason for hiding this comment

Uh oh!

labath Feb 12, 2025

Choose a reason for hiding this comment

Uh oh!

DavidSpickett Feb 12, 2025

Choose a reason for hiding this comment

Uh oh!

labath commented Feb 4, 2025

Uh oh!

labath commented Feb 12, 2025

Uh oh!

DavidSpickett left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!