-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Rewrite pointer::with_addr
implementation
#110318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit changes the impl from (essentially) ```rust ptr.wrapping_byte_offset(new_addr - self.addr()) ``` to ```rust ptr.wrapping_byte_sub(self.addr()).wrapping_byte_add(new_addr) ``` While being essentially the same (`a-a+b` vs `a+(b-a)`) new implementation generation 2 GEPs, instead of an arithmetic + 1 GEP, which turns out to be easier for LLVM to optimize and reason about. (`ptradd` instead of `getelementptr` when .-.)
r? @m-ou-se (rustbot has picked a reviewer for you, use r? to override) |
This comment was marked as resolved.
This comment was marked as resolved.
Upstream patch: https://reviews.llvm.org/D148341 I'm not sure this is a good idea -- I expect that this will optimize worse in cases where you aren't doing something extremely weird. Also worth noting that this formulation is not CHERI compatible. |
// Note that we are using two offsets, instead of precomputing offset | ||
// with `addr - self.addr()`. This is because it is easier for LLVM to | ||
// optimize such code. | ||
self.wrapping_byte_sub(self.addr()).wrapping_byte_add(addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I was playing yesterday, I noticed another possible implementation:
self.wrapping_byte_sub(self.addr()).wrapping_byte_add(addr) | |
self.mask(0).wrapping_byte_add(addr) |
That's again just two IR instructions (https://rust.godbolt.org/z/EeraoxKnh), and it has the bonus of never doing a ptrtoint
.
Any thoughts on whether that's reasonable or horrible, @nikic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very horrible. ptrmask has about zero optimization support.
@nikic why is it not CHERI compatible, does it not allow to temporary create invalid pointers? 🤔 |
This will take the pointer temporarily way out of bounds, which is not supported on CHERI due to the bounds compression scheme in use. Depending on what IR this maps to I would expect LLVM to be able to perform the transformation you want on non-CHERI architectures for you. |
Well, TIL. Does this also mean that no pointer tagging is ever possible on CHERI? Anyway, I'm not particularly attached to this change. IMO it is a bit nicer code-wise and looks more logical to me. But if it adds problems for CHERI and even LLVM, we can just close this (especially given the candidate patch which should probably fix the original issue). |
Pointer tagging works provided you use low bits. Some implementations may also provide a way to set high bits without affecting bounds (e.g. as seen in Morello) but that is not portable across CHERI implementations. If you want to start performing arbitrary transformations on addresses and expect to be able to recover a usable pointer then CHERI cannot do that in its current form due to the compression scheme employed (we used to have an uncompressed variant that could, but quadrupling the pointer size is rather less acceptable than merely doubling). |
Can't this also move the pointer out of the allocation? Or is the change of value in low bits small enough that it doesn't break compression? |
Yes.
Yes. Note I said way out of bounds before, not just out of bounds. The architecture gives you 1/8th range either side (one direction is in fact 1/4, though I forget if above or below), with a minimum for small allocations that varies based on the number of bits of precision used (think mantissa width for floating point), but is on the order of KiB for all current 64-bit implementations. (I’m ignoring Microsoft’s CHERIoT here which is far stricter and thus more hostile to code that likes to play games with pointers, but can get away with it due to being used in an embedded context where you have much greater control over the code being compiled and people are more willing to adapt to weird and wonderful architectures) |
@jrtc27 I see, thanks for the explanation! <3 |
Closing as per the concerns above, given that LLVM issue was fixed. |
This commit changes the impl from (essentially)
to
While being essentially the same (
a-a+b
vsa+(b-a)
) new implementation generation 2 GEPs, instead of an arithmetic + 1 GEP, which turns out to be easier for LLVM to optimize and reason about.(
ptradd
instead ofgetelementptr
when .-.)Idea by @scottmcm, should fix the problem I've found while working on #110243, which I documented in the LLVM issue: llvm/llvm-project#62093 (TL;DR: the code for
.map_addr(|a| a << 2)
does not get optimized properly).addr << 2
: https://rust.godbolt.org/z/T4oWfx4sb