Skip to content

[lldb] Fix Block::GetRangeIndexContainingAddress for discontinuous functions #124931

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 13, 2025

Conversation

labath
Copy link
Collaborator

@labath labath commented Jan 29, 2025

This is a followup to #122440, which changed function-relative calculations to use the function entry point rather than the lowest address of the function (but missed this usage). Like in #116777, the logic is changed to use file addresses instead of section offsets (as not all parts of the function have to be in the same section).

…nctions

This is a followup to llvm#122440, which changed function-relative
calculations to use the function entry point rather than the lowest
address of the function (but missed this usage). Like in llvm#116777, the
logic is changed to use file addresses instead of section offsets (as
not all parts of the function have to be in the same section).
@llvmbot
Copy link
Member

llvmbot commented Jan 29, 2025

@llvm/pr-subscribers-lldb

Author: Pavel Labath (labath)

Changes

This is a followup to #122440, which changed function-relative calculations to use the function entry point rather than the lowest address of the function (but missed this usage). Like in #116777, the logic is changed to use file addresses instead of section offsets (as not all parts of the function have to be in the same section).


Full diff: https://github.com/llvm/llvm-project/pull/124931.diff

2 Files Affected:

  • (modified) lldb/source/Symbol/Block.cpp (+19-32)
  • (modified) lldb/test/Shell/ScriptInterpreter/Python/sb_function_ranges.s (+24-13)
diff --git a/lldb/source/Symbol/Block.cpp b/lldb/source/Symbol/Block.cpp
index 139fa06d08fca5..bddb015333e09c 100644
--- a/lldb/source/Symbol/Block.cpp
+++ b/lldb/source/Symbol/Block.cpp
@@ -243,25 +243,15 @@ bool Block::GetRangeContainingAddress(const Address &addr,
                                       AddressRange &range) {
   Function *function = CalculateSymbolContextFunction();
   if (function) {
-    const AddressRange &func_range = function->GetAddressRange();
-    if (addr.GetModule() == func_range.GetBaseAddress().GetModule()) {
-      const addr_t file_addr = addr.GetFileAddress();
-      const addr_t func_file_addr =
-          func_range.GetBaseAddress().GetFileAddress();
-      if (file_addr >= func_file_addr &&
-          file_addr < func_file_addr + func_range.GetByteSize()) {
-        addr_t offset = file_addr - func_file_addr;
-
-        const Range *range_ptr = m_ranges.FindEntryThatContains(offset);
-
-        if (range_ptr) {
-          range.GetBaseAddress() =
-              Address(func_file_addr + range_ptr->GetRangeBase(),
-                      addr.GetModule()->GetSectionList());
-          range.SetByteSize(range_ptr->GetByteSize());
-          return true;
-        }
-      }
+    if (uint32_t idx = GetRangeIndexContainingAddress(addr);
+        idx != UINT32_MAX) {
+      const Range *range_ptr = m_ranges.GetEntryAtIndex(idx);
+      assert(range_ptr);
+
+      range.GetBaseAddress() = function->GetAddress();
+      range.GetBaseAddress().Slide(range_ptr->GetRangeBase());
+      range.SetByteSize(range_ptr->GetByteSize());
+      return true;
     }
   }
   range.Clear();
@@ -278,19 +268,16 @@ bool Block::GetRangeContainingLoadAddress(lldb::addr_t load_addr,
 
 uint32_t Block::GetRangeIndexContainingAddress(const Address &addr) {
   Function *function = CalculateSymbolContextFunction();
-  if (function) {
-    const AddressRange &func_range = function->GetAddressRange();
-    if (addr.GetSection() == func_range.GetBaseAddress().GetSection()) {
-      const addr_t addr_offset = addr.GetOffset();
-      const addr_t func_offset = func_range.GetBaseAddress().GetOffset();
-      if (addr_offset >= func_offset &&
-          addr_offset < func_offset + func_range.GetByteSize()) {
-        addr_t offset = addr_offset - func_offset;
-        return m_ranges.FindEntryIndexThatContains(offset);
-      }
-    }
-  }
-  return UINT32_MAX;
+  if (!function)
+    return UINT32_MAX;
+
+  const Address &func_addr = function->GetAddress();
+  if (addr.GetModule() != func_addr.GetModule())
+    return UINT32_MAX;
+
+  const addr_t file_addr = addr.GetFileAddress();
+  const addr_t func_file_addr = func_addr.GetFileAddress();
+  return m_ranges.FindEntryIndexThatContains(file_addr - func_file_addr);
 }
 
 bool Block::GetRangeAtIndex(uint32_t range_idx, AddressRange &range) {
diff --git a/lldb/test/Shell/ScriptInterpreter/Python/sb_function_ranges.s b/lldb/test/Shell/ScriptInterpreter/Python/sb_function_ranges.s
index 2e2bc52cd3ff99..4d42c50467da06 100644
--- a/lldb/test/Shell/ScriptInterpreter/Python/sb_function_ranges.s
+++ b/lldb/test/Shell/ScriptInterpreter/Python/sb_function_ranges.s
@@ -6,15 +6,22 @@
 
 # CHECK: Found 1 function(s).
 # CHECK: foo: [input.o[0x0-0xe), input.o[0x14-0x1c)]
-# CHECK-NEXT: input.o[0x0]: cmpl   $0x0, %edi
-# CHECK-NEXT: input.o[0x3]: je     0x14
-# CHECK-NEXT: input.o[0x5]: jmp    0x7
-# CHECK-NEXT: input.o[0x7]: callq  0xe
-# CHECK-NEXT: input.o[0xc]: jmp    0x1b
+# CHECK-NEXT: input.o[0x0]: callq  0xe
+# CHECK-NEXT: input.o[0x5]: jmp    0x1b
+# CHECK-NEXT: input.o[0x7]: cmpl   $0x0, %edi
+# CHECK-NEXT: input.o[0xa]: je     0x14
+# CHECK-NEXT: input.o[0xc]: jmp    0x0
 # CHECK-EMPTY:
 # CHECK-NEXT: input.o[0x14]: callq  0x19
 # CHECK-NEXT: input.o[0x19]: jmp    0x1b
 # CHECK-NEXT: input.o[0x1b]: retq
+# CHECK-NEXT: offset 0x00 => index 0
+# CHECK-NEXT: offset 0x0c => index 0
+# CHECK-NEXT: offset 0x0e => index ffffffff
+# CHECK-NEXT: offset 0x13 => index ffffffff
+# CHECK-NEXT: offset 0x14 => index 1
+# CHECK-NEXT: offset 0x1b => index 1
+# CHECK-NEXT: offset 0x1c => index ffffffff
 
 
 #--- script.py
@@ -28,6 +35,10 @@ def __lldb_init_module(debugger, internal_dict):
     fn = ctx.function
     print(f"{fn.name}: {fn.GetRanges()}")
     print(fn.GetInstructions(target))
+    text = fn.addr.section
+    for offset in [0x00, 0x0c, 0x0e, 0x13, 0x14, 0x1b, 0x1c]:
+      idx = fn.block.GetRangeIndexForBlockAddress(lldb.SBAddress(text, offset))
+      print(f"offset 0x{offset:02x} => index {idx:x}")
 
 #--- input.s
 # An example of a function which has been split into two parts. Roughly
@@ -40,6 +51,14 @@ def __lldb_init_module(debugger, internal_dict):
         .text
 
         .type   foo,@function
+foo.__part.1:
+        .cfi_startproc
+        callq   bar
+        jmp     foo.__part.3
+.Lfoo.__part.1_end:
+        .size   foo.__part.1, .Lfoo.__part.1_end-foo.__part.1
+        .cfi_endproc
+
 foo:
         .cfi_startproc
         cmpl    $0, %edi
@@ -49,14 +68,6 @@ foo:
 .Lfoo_end:
         .size   foo, .Lfoo_end-foo
 
-foo.__part.1:
-        .cfi_startproc
-        callq   bar
-        jmp     foo.__part.3
-.Lfoo.__part.1_end:
-        .size   foo.__part.1, .Lfoo.__part.1_end-foo.__part.1
-        .cfi_endproc
-
 bar:
         .cfi_startproc
         movl    $47, %eax

Copy link
Collaborator

@DavidSpickett DavidSpickett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked at this a couple of times and on a general level, it seems fine, but all the address vs. file address stuff is confusing to me. I'm just guessing what each one is really.

So I have added some questions about that, maybe I'll be able to approve after those, or Jason can appear, whichever comes first.

assert(range_ptr);

range.GetBaseAddress() = function->GetAddress();
range.GetBaseAddress().Slide(range_ptr->GetRangeBase());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function->GetAddress() returns the address relative to the start of the range, then you slide that address up to get the absolute address of the function, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function->GetAddress() returns the address of the start of the function. The address is internally stored as a section+offset pair, but for the most part, you can think of it as an absolute address of the function. Block ranges are internally stored as pairs of integers (offset+size), where the offset is relative to the start of the function. This converts the internal (implicitly relative to section start) representation of the range to the external, absolute (well, section-relative) representation.
It basically does that by adjusting the section offset, but it actually does that incorrectly since different parts of the function may be in different sections. So, the correct thing to do is to convert the address into a file address (which is relative to the "base address" of the file -- what ELF would call "virtual address"), do the subtraction there, and then re-resolve the result into a section-relative address.

My updated version will do that.

# CHECK-NEXT: offset 0x13 => index ffffffff
# CHECK-NEXT: offset 0x14 => index 1
# CHECK-NEXT: offset 0x1b => index 1
# CHECK-NEXT: offset 0x1c => index ffffffff
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have zero idea what's going on here.

You moved foo.__part.1 before foo, presumably because that would give you the wrong answers if it weren't for the changes you made elsewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, GetRangeIndexForBlockAddress would return wrong results because it assumed that the internal block representation was relative to the lowest address in the function (fn->GetAddressRange().GetBaseAddress()) but they really are (as of #122440) relative to the entry point.

And this is a slightly funny way of testing that interface -- I print the function results in python and then FileCheck the result.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ffffffff means that it's not part of a block, is that right?

So if I have a function that does not cover the offset I ask for, it would return this failure value.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment there saying so, it will help someone who is reading because this failed out of the blue sometime.

@labath
Copy link
Collaborator Author

labath commented Feb 4, 2025

Thanks for taking a look. I've tried to explain what the code that, but I don't know if I've succeeded. The functionality is at some level quite simple, but wrapping your head around the different representations is very tricky. Let me know if there's something I need to elaborate on.

@labath
Copy link
Collaborator Author

labath commented Feb 12, 2025

Any more thoughts on this?

Copy link
Collaborator

@DavidSpickett DavidSpickett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the 1 extra comment, I understand this now, LGTM.

@labath labath merged commit 298caeb into llvm:main Feb 13, 2025
7 checks passed
@labath labath deleted the block-fn branch February 13, 2025 09:08
flovent pushed a commit to flovent/llvm-project that referenced this pull request Feb 13, 2025
…nctions (llvm#124931)

This is a followup to llvm#122440, which changed function-relative
calculations to use the function entry point rather than the lowest
address of the function (but missed this usage). Like in llvm#116777, the
logic is changed to use file addresses instead of section offsets (as
not all parts of the function have to be in the same section).
joaosaffran pushed a commit to joaosaffran/llvm-project that referenced this pull request Feb 14, 2025
…nctions (llvm#124931)

This is a followup to llvm#122440, which changed function-relative
calculations to use the function entry point rather than the lowest
address of the function (but missed this usage). Like in llvm#116777, the
logic is changed to use file addresses instead of section offsets (as
not all parts of the function have to be in the same section).
sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this pull request Feb 24, 2025
…nctions (llvm#124931)

This is a followup to llvm#122440, which changed function-relative
calculations to use the function entry point rather than the lowest
address of the function (but missed this usage). Like in llvm#116777, the
logic is changed to use file addresses instead of section offsets (as
not all parts of the function have to be in the same section).
labath added a commit to labath/llvm-project that referenced this pull request Apr 30, 2025
Continuing the theme from llvm#116777 and llvm#124931, this patch ensures we
compute the correct address when a functions is spread across multiple
sections. Due to this, it's not sufficient to adjust the offset in the
section+offset pair (Address::Slide). We must actually slide the file
offset and then recompute the section using the result.
labath added a commit that referenced this pull request May 7, 2025
#137955)

Continuing the theme from #116777 and #124931, this patch ensures we
compute the correct address when a functions is spread across multiple
sections. Due to this, it's not sufficient to adjust the offset in the
section+offset pair (Address::Slide). We must actually slide the file
offset and then recompute the section using the result.

I found this out due to a failure to disassemble some parts of the
function, so I'm testing with that, although it's likely there are other
things that were broken due to this.
GeorgeARM pushed a commit to GeorgeARM/llvm-project that referenced this pull request May 7, 2025
llvm#137955)

Continuing the theme from llvm#116777 and llvm#124931, this patch ensures we
compute the correct address when a functions is spread across multiple
sections. Due to this, it's not sufficient to adjust the offset in the
section+offset pair (Address::Slide). We must actually slide the file
offset and then recompute the section using the result.

I found this out due to a failure to disassemble some parts of the
function, so I'm testing with that, although it's likely there are other
things that were broken due to this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants