-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[lldb] Fix Block::GetRangeIndexContainingAddress for discontinuous functions #124931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…nctions This is a followup to llvm#122440, which changed function-relative calculations to use the function entry point rather than the lowest address of the function (but missed this usage). Like in llvm#116777, the logic is changed to use file addresses instead of section offsets (as not all parts of the function have to be in the same section).
@llvm/pr-subscribers-lldb Author: Pavel Labath (labath) ChangesThis is a followup to #122440, which changed function-relative calculations to use the function entry point rather than the lowest address of the function (but missed this usage). Like in #116777, the logic is changed to use file addresses instead of section offsets (as not all parts of the function have to be in the same section). Full diff: https://github.com/llvm/llvm-project/pull/124931.diff 2 Files Affected:
diff --git a/lldb/source/Symbol/Block.cpp b/lldb/source/Symbol/Block.cpp
index 139fa06d08fca5..bddb015333e09c 100644
--- a/lldb/source/Symbol/Block.cpp
+++ b/lldb/source/Symbol/Block.cpp
@@ -243,25 +243,15 @@ bool Block::GetRangeContainingAddress(const Address &addr,
AddressRange &range) {
Function *function = CalculateSymbolContextFunction();
if (function) {
- const AddressRange &func_range = function->GetAddressRange();
- if (addr.GetModule() == func_range.GetBaseAddress().GetModule()) {
- const addr_t file_addr = addr.GetFileAddress();
- const addr_t func_file_addr =
- func_range.GetBaseAddress().GetFileAddress();
- if (file_addr >= func_file_addr &&
- file_addr < func_file_addr + func_range.GetByteSize()) {
- addr_t offset = file_addr - func_file_addr;
-
- const Range *range_ptr = m_ranges.FindEntryThatContains(offset);
-
- if (range_ptr) {
- range.GetBaseAddress() =
- Address(func_file_addr + range_ptr->GetRangeBase(),
- addr.GetModule()->GetSectionList());
- range.SetByteSize(range_ptr->GetByteSize());
- return true;
- }
- }
+ if (uint32_t idx = GetRangeIndexContainingAddress(addr);
+ idx != UINT32_MAX) {
+ const Range *range_ptr = m_ranges.GetEntryAtIndex(idx);
+ assert(range_ptr);
+
+ range.GetBaseAddress() = function->GetAddress();
+ range.GetBaseAddress().Slide(range_ptr->GetRangeBase());
+ range.SetByteSize(range_ptr->GetByteSize());
+ return true;
}
}
range.Clear();
@@ -278,19 +268,16 @@ bool Block::GetRangeContainingLoadAddress(lldb::addr_t load_addr,
uint32_t Block::GetRangeIndexContainingAddress(const Address &addr) {
Function *function = CalculateSymbolContextFunction();
- if (function) {
- const AddressRange &func_range = function->GetAddressRange();
- if (addr.GetSection() == func_range.GetBaseAddress().GetSection()) {
- const addr_t addr_offset = addr.GetOffset();
- const addr_t func_offset = func_range.GetBaseAddress().GetOffset();
- if (addr_offset >= func_offset &&
- addr_offset < func_offset + func_range.GetByteSize()) {
- addr_t offset = addr_offset - func_offset;
- return m_ranges.FindEntryIndexThatContains(offset);
- }
- }
- }
- return UINT32_MAX;
+ if (!function)
+ return UINT32_MAX;
+
+ const Address &func_addr = function->GetAddress();
+ if (addr.GetModule() != func_addr.GetModule())
+ return UINT32_MAX;
+
+ const addr_t file_addr = addr.GetFileAddress();
+ const addr_t func_file_addr = func_addr.GetFileAddress();
+ return m_ranges.FindEntryIndexThatContains(file_addr - func_file_addr);
}
bool Block::GetRangeAtIndex(uint32_t range_idx, AddressRange &range) {
diff --git a/lldb/test/Shell/ScriptInterpreter/Python/sb_function_ranges.s b/lldb/test/Shell/ScriptInterpreter/Python/sb_function_ranges.s
index 2e2bc52cd3ff99..4d42c50467da06 100644
--- a/lldb/test/Shell/ScriptInterpreter/Python/sb_function_ranges.s
+++ b/lldb/test/Shell/ScriptInterpreter/Python/sb_function_ranges.s
@@ -6,15 +6,22 @@
# CHECK: Found 1 function(s).
# CHECK: foo: [input.o[0x0-0xe), input.o[0x14-0x1c)]
-# CHECK-NEXT: input.o[0x0]: cmpl $0x0, %edi
-# CHECK-NEXT: input.o[0x3]: je 0x14
-# CHECK-NEXT: input.o[0x5]: jmp 0x7
-# CHECK-NEXT: input.o[0x7]: callq 0xe
-# CHECK-NEXT: input.o[0xc]: jmp 0x1b
+# CHECK-NEXT: input.o[0x0]: callq 0xe
+# CHECK-NEXT: input.o[0x5]: jmp 0x1b
+# CHECK-NEXT: input.o[0x7]: cmpl $0x0, %edi
+# CHECK-NEXT: input.o[0xa]: je 0x14
+# CHECK-NEXT: input.o[0xc]: jmp 0x0
# CHECK-EMPTY:
# CHECK-NEXT: input.o[0x14]: callq 0x19
# CHECK-NEXT: input.o[0x19]: jmp 0x1b
# CHECK-NEXT: input.o[0x1b]: retq
+# CHECK-NEXT: offset 0x00 => index 0
+# CHECK-NEXT: offset 0x0c => index 0
+# CHECK-NEXT: offset 0x0e => index ffffffff
+# CHECK-NEXT: offset 0x13 => index ffffffff
+# CHECK-NEXT: offset 0x14 => index 1
+# CHECK-NEXT: offset 0x1b => index 1
+# CHECK-NEXT: offset 0x1c => index ffffffff
#--- script.py
@@ -28,6 +35,10 @@ def __lldb_init_module(debugger, internal_dict):
fn = ctx.function
print(f"{fn.name}: {fn.GetRanges()}")
print(fn.GetInstructions(target))
+ text = fn.addr.section
+ for offset in [0x00, 0x0c, 0x0e, 0x13, 0x14, 0x1b, 0x1c]:
+ idx = fn.block.GetRangeIndexForBlockAddress(lldb.SBAddress(text, offset))
+ print(f"offset 0x{offset:02x} => index {idx:x}")
#--- input.s
# An example of a function which has been split into two parts. Roughly
@@ -40,6 +51,14 @@ def __lldb_init_module(debugger, internal_dict):
.text
.type foo,@function
+foo.__part.1:
+ .cfi_startproc
+ callq bar
+ jmp foo.__part.3
+.Lfoo.__part.1_end:
+ .size foo.__part.1, .Lfoo.__part.1_end-foo.__part.1
+ .cfi_endproc
+
foo:
.cfi_startproc
cmpl $0, %edi
@@ -49,14 +68,6 @@ foo:
.Lfoo_end:
.size foo, .Lfoo_end-foo
-foo.__part.1:
- .cfi_startproc
- callq bar
- jmp foo.__part.3
-.Lfoo.__part.1_end:
- .size foo.__part.1, .Lfoo.__part.1_end-foo.__part.1
- .cfi_endproc
-
bar:
.cfi_startproc
movl $47, %eax
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked at this a couple of times and on a general level, it seems fine, but all the address vs. file address stuff is confusing to me. I'm just guessing what each one is really.
So I have added some questions about that, maybe I'll be able to approve after those, or Jason can appear, whichever comes first.
lldb/source/Symbol/Block.cpp
Outdated
assert(range_ptr); | ||
|
||
range.GetBaseAddress() = function->GetAddress(); | ||
range.GetBaseAddress().Slide(range_ptr->GetRangeBase()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function->GetAddress()
returns the address relative to the start of the range, then you slide that address up to get the absolute address of the function, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function->GetAddress()
returns the address of the start of the function. The address is internally stored as a section+offset pair, but for the most part, you can think of it as an absolute address of the function. Block ranges are internally stored as pairs of integers (offset+size), where the offset is relative to the start of the function. This converts the internal (implicitly relative to section start) representation of the range to the external, absolute (well, section-relative) representation.
It basically does that by adjusting the section offset, but it actually does that incorrectly since different parts of the function may be in different sections. So, the correct thing to do is to convert the address into a file address (which is relative to the "base address" of the file -- what ELF would call "virtual address"), do the subtraction there, and then re-resolve the result into a section-relative address.
My updated version will do that.
# CHECK-NEXT: offset 0x13 => index ffffffff | ||
# CHECK-NEXT: offset 0x14 => index 1 | ||
# CHECK-NEXT: offset 0x1b => index 1 | ||
# CHECK-NEXT: offset 0x1c => index ffffffff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have zero idea what's going on here.
You moved foo.__part.1 before foo, presumably because that would give you the wrong answers if it weren't for the changes you made elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, GetRangeIndexForBlockAddress
would return wrong results because it assumed that the internal block representation was relative to the lowest address in the function (fn->GetAddressRange().GetBaseAddress()
) but they really are (as of #122440) relative to the entry point.
And this is a slightly funny way of testing that interface -- I print the function results in python and then FileCheck the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ffffffff means that it's not part of a block, is that right?
So if I have a function that does not cover the offset I ask for, it would return this failure value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment there saying so, it will help someone who is reading because this failed out of the blue sometime.
Thanks for taking a look. I've tried to explain what the code that, but I don't know if I've succeeded. The functionality is at some level quite simple, but wrapping your head around the different representations is very tricky. Let me know if there's something I need to elaborate on. |
Any more thoughts on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the 1 extra comment, I understand this now, LGTM.
…nctions (llvm#124931) This is a followup to llvm#122440, which changed function-relative calculations to use the function entry point rather than the lowest address of the function (but missed this usage). Like in llvm#116777, the logic is changed to use file addresses instead of section offsets (as not all parts of the function have to be in the same section).
…nctions (llvm#124931) This is a followup to llvm#122440, which changed function-relative calculations to use the function entry point rather than the lowest address of the function (but missed this usage). Like in llvm#116777, the logic is changed to use file addresses instead of section offsets (as not all parts of the function have to be in the same section).
…nctions (llvm#124931) This is a followup to llvm#122440, which changed function-relative calculations to use the function entry point rather than the lowest address of the function (but missed this usage). Like in llvm#116777, the logic is changed to use file addresses instead of section offsets (as not all parts of the function have to be in the same section).
Continuing the theme from llvm#116777 and llvm#124931, this patch ensures we compute the correct address when a functions is spread across multiple sections. Due to this, it's not sufficient to adjust the offset in the section+offset pair (Address::Slide). We must actually slide the file offset and then recompute the section using the result.
#137955) Continuing the theme from #116777 and #124931, this patch ensures we compute the correct address when a functions is spread across multiple sections. Due to this, it's not sufficient to adjust the offset in the section+offset pair (Address::Slide). We must actually slide the file offset and then recompute the section using the result. I found this out due to a failure to disassemble some parts of the function, so I'm testing with that, although it's likely there are other things that were broken due to this.
llvm#137955) Continuing the theme from llvm#116777 and llvm#124931, this patch ensures we compute the correct address when a functions is spread across multiple sections. Due to this, it's not sufficient to adjust the offset in the section+offset pair (Address::Slide). We must actually slide the file offset and then recompute the section using the result. I found this out due to a failure to disassemble some parts of the function, so I'm testing with that, although it's likely there are other things that were broken due to this.
This is a followup to #122440, which changed function-relative calculations to use the function entry point rather than the lowest address of the function (but missed this usage). Like in #116777, the logic is changed to use file addresses instead of section offsets (as not all parts of the function have to be in the same section).