-
Notifications
You must be signed in to change notification settings - Fork 1.6k
RFC: No (opsem) Magic Boxes #3712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
Clarify the constraint o the invariant in footnote Co-authored-by: Jacob Lifshay <[email protected]>
It feels odd that one of the clear options is left out: why not expose a Like, I definitely agree that I agree that whatever happens shouldn't be specific to the |
I, at least, fully expect us to eventually have some way of writing alignment-obeying raw pointers in Rust in some way. If nothing else, (Transmuting between EDIT: added a word to try to communicate that I wasn't expecting this RFC to include such a type. |
That's my hope for the future as well, but to avoid the RFC becoming too cluttered, I am refraining from defining such a type in this RFC. |
Is there a list of optimisations that depend on noalias being emitted for Box’es? |
The RFC seems pretty clear that noalias hasn't really provided many benefits compared to being an extra burden to uphold for implementers, but maybe it is worth seeing if there are any sources that can provide a bit more detail on that. |
text/3712-box-yesalias.md
Outdated
* A pointer with an address that is not well-aligned for `T` (or in the case of a DST, the `align_of_val_raw` of the value), or | ||
* A pointer with an address that offsetting that address (as though by `.wrapping_byte_offset`) by `size_of_val_raw` bytes would wrap arround the address space | ||
|
||
The [`alloc::boxed::Box<T>`] type shall be laid out as though a `repr(transparent)` struct containing a field of type `WellFormed<T>`. The behaviour of doing a typed copy as type [`alloc::boxed::Box<T>`] shall be the same as though a typed copy of the struct `#[repr(transparent)] struct Box<T>(WellFormed<T>);`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The urlo definition is probably good.
It's defined in the opsem, but I don't know if we have a very good written record of that other than spread arround zulip threads and github issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a fan of this. I think that people moving from Vec<T>
to Box<[T]>
having to deal with drastically-different soundness rules is a giant footgun, and getting rid of the special [ST]B behaviour here sounds good to me.
My general take: The two "endpoints" here are
From what I can tell, we current orient If I could go back in time, I think I would favor end-user abstractions and offer different types (e.g., |
The RFC would benefit from some attempt to quantify the impact on performance, though our lack of standardized runtime benchmarks makes that hard. |
@rust-lang/opsem: We were curious in our discussions, does this RFC represent an existing T-opsem consensus? |
It does not represent any FCP done by T-opsem, which is why I've included them here. The claims I make, including those about the impact on the operation semantics, are included in the request for comment and consensus.
I recall some perf PR's (using the default rustc-perf suite) being done to determine the impact, which showed negligible impact. I can probably pull them up at some point in the RFC's lifecycle.
|
|
||
(Note that we do not define this type in the public standard library interface, though an implementation of the standard library could define the type locally) | ||
|
||
The following are not valid values of type `WellFormed<T>`, and a typed copy that would produce such a value is undefined behaviour: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Reference has been adjusted a while ago to state validity invariants positively, i..e by listing what must be true, instead of listing what must not be false. IMO that's more understandable, and the RFC should be updated to also do that.
There are patterns of using a custom per-
It's "every LLVM optimization that looks at alias information". The question is how much that matters in practice, which is hard to evaluate.
As Connor said, not in any formal sense. Several opsem members have expressed repeatedly that they want to see My own position is that I love how this simplifies the model and Miri, I am just slightly concerned about this being an irreversible loss of optimization potential that we might regret later. Absence of evidence of optimization benefit is not evidence of absence. Our benchmarks likely just don't have a lot of functions that take Is there a way we can query the ecosystem for functions taking |
- While the easiest alternative is to do nothing and maintain the status quo, as mentioned this has suprisingly implications both for the operational semantics of Rust | ||
- Alternative 2: Introduce a new type `AlisableBox<T>` which has the same interface as `Box<T>` but lacks the opsem implications that `Box<T>` has. | ||
- This also does not help remove the impact on the opsem model that the current `Box<T>` has, though provides an ergonomically equivalent option for `unsafe` code. | ||
- Alternative 3: We maintain the behaviour only for the unparameterized `Box<T>` type using the `Global` allocator, and remove it for `Box<T,A>` (for any allocator other than `A`), allowing unsafe code to use `Box<T, CustomAllocator>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually the status quo, since rust-lang/rust#122018
Just to follow up on some of the discussion, it wasn't immediately clear to me that types similar to I would still love if there's more data showing the lack of returns on |
…C++ counterpointer `std::unique_ptr`, in the prior art section
FTR, speaking right now as one of the main developers of lccc, my opinion is that the best way to mitigate any loss of future optimization potential is to just be far more granular with
I mentioned |
And more than them just not having them, IIRC someone tried to implement |
would still love if there's more data showing the lack of returns on noalias optimisations, since it feels wrong that something with a lot of history and usage isn't helping that much
It helps for references. I suspect people added it for Box because "why not".
|
@safinaskar is the reason for you to want this what you wrote here, or is there more? I agree with @saethlin that the GPU usecase is too domain-specific to justify such a choice for the entire language, given in particular that people writing for the GPU can use Judging from this comment, it also does not sound like |
Yes, there is more! Even if boxes become non-magical from opsem point of view (i. e.
So, we still need this Okay, I spent some time thinking about this more, and now I think that I did a mistake. As well as I understand, borrow checking happens before monomorphization. And boxes are treated specially by borrow checker. Thus borrow checker still needs to be able to call I think such change will bring many benefits. And I will again say: consider just removing allocators from boxes and introducing another different type for boxes with allocators |
This RFC specifically does not propose to change or remove any other aspect of With regards to |
Getting rid of Maybe you perceive there to be some problems with not special-casing |
I'm not convinced. See, e.g.:
It seems that More broadly, probably I'd prefer, if we could work out how, to go the other direction and to make references (and Probably, therefore, I see this RFC in this larger context. If we drop |
It does solve the However, it definitely does not solve the "people are surprised to learn about the alias requirements" part, as you have to know about the alias requirements to realize you need to suppress them with |
Reading through the issues, it doesn't seem the problem is custom allocators in general, but particular allocator strategies that want to offset from the data pointer to the allocator state. Even for custom allocators, then, it seems we might have other options than turning
Regarding the potential surprise of it, I think I'm OK if people writing custom allocators or doing other such things need to pore carefully over our documentation. We were still figuring things out ourselves and hadn't provided appropriate tools to people. Once we have, and have documented this carefully, I'm hopeful this kind of surprise will no longer be the same kind of problem. In general, I think my hesitation is that it still seems we might have other options here that we haven't yet ruled out. |
As the title says: "No (opsem) Magic Boxes". I want Box to eventually become non-magical through a combination of removing magic and adding generalizations (like
I'd have to find the benchmarks (or perhaps new, up to date, ones should be run, but those kinds of modifications to rustc are far beyond my paygrade). The rest of the comments, I'll address this coming week. |
Correct. That's not an uncommon strategy for allocators. We have some vague ideas for how to support it for custom global allocators, but that relies on deallocation being a special magic operation -- and is likely incompatible with inlining the deallocator. So this doesn't seem viable for custom local allocators.
This would need a form of associated type constructors, so that one allocator can have So, I'd say this is quite speculative.
There's literally a |
After reading this thread and reading rust-lang/rust#141219 I now agree that we definitely should split So I kindly ask someone to author RFC or MCP for this (i. e. splitting box). Hopefully it will be accepted, and then eventially someone will implement it. We totally allowed to do this, because allocator API is unstable. This will be annoyance for people working with custom allocators, but I totally agree that current state causes too much pain for compiler developers |
Splitting up Box sounds like a major regression compared to the unstable API we have today. It should be possible to write code that is generic over the allocator, with the global allocator being just an option.
The RFC for splitting up Box should be written by the people who think it is a good idea. So far, I think that's only you.
|
The |
Imho one downside of removing uniqueness guarantees from The While this problem already exists, the current restrictive rules about On the other hand, this problem seems similar to accidentally taking a But also, it seems similar to the hazard of handling |
I don't think this is true, at least with #![feature(box_as_ptr)]
struct Foo { a: i32 }
fn main() {
let mut b = Box::new(Foo { a: 0 } );
let ptr = Box::as_mut_ptr(&mut b);
// This is sound afaik (checked with Miri)
unsafe {
*b = Foo {a: 1};
b.a = 2;
*ptr = Foo {a: 3};
}
// Without this RFC, this is unsound
let b2 = b;
unsafe {
*ptr = Foo {a: 4};
}
}
I'm not sure this follows - we could always add a type that is what |
Yeah, agreed. If anything, disentangling this from |
Since |
I don't see how a |
See rust-lang/rust#138896 for yet another recent aliasing issue caused by |
Yes, that's why I had called it "leaky". (It's not a proposal though.)
Yes, and as I understand, that's what we added with Though we accepted the RFC over a year ago, we haven't yet stabilized it -- or even implemented it. That's unfortunate, and I imagine that's part of why people are still asking for this. Probably there's a large part of me that wants to see that stabilized first. Then we can see what's left. |
If I'm reading the situation there correctly, the problem was not directly that we apply Observationally, most of the examples I've seen cited in practice tend to fit this pattern. I wonder whether this might be as much an argument for choosing Tree Borrows -- or something like it -- as it is for changing the semantics of And, of course, |
Tree Borrows accepts this by sacrificing some optimization potential. If we let the opsem be guided by the library invariant for |
Probably I'd suggest that even if we were to remove |
Right, but I think it would reasonable to assume that arbitrary As an example, |
In the cases I've reviewed, it's really not clear to me that people are making this mistake for that reason as opposed to making it for the same reason they'd make it in the generic case. People write this, self.items.push(item);
self.ptrs.push(&raw mut *item); //~ ERROR value borrowed here after move and they get the borrow checker error, so then they say, "oh, right, I need to swap those lines", self.ptrs.push(&raw mut *item);
self.items.push(item); and then it "works". I've caught this in review in places where it wasn't even arguably OK, and it was never a case where the person had some fancy theory about an analogy to the (perhaps accidental) operational semantics of So that's why I say that I don't think that just dropping self.ptrs.push(&raw mut **self.items.push_mut(item)); |
It doesn't completely get us out of that, but moving There's a potential alternative here of devising an aliasing model where moves don't cause such long-term effects in the same way (see rust-lang/unsafe-code-guidelines#450). The aliasing models we have right now have more UB than LLVM immediately needs for |
Something I think about is how that also argues the other way. If someone has a generic Even without generics (i.e., with human-powered monomorphization), I think about the pedagogic value of this. If someone is running Miri intermittently and mostly writing this pattern with
Yes; that's an option too. |
OTOH there's a lot of code which people can't run in Miri or don't run for other reasons, so relying on Miri as a teaching tool is risky. And all else being equal, less UB means a strictly lower risk of unexpected compilation results. |
Indeed. Many angles to consider. |
Summary
Currently, the operational semantics of the type
alloc::boxed::Box<T>
is in dispute, but the compiler adds llvmnoalias
to it. To support it, the current operational semantics models have the type use a special form of theUnique
(Stacked Borrows) orActive
(Tree Borrows) tag, which has aliasing implications, validity implications, and also presents some unique complications in the model and in improvements to the type (e.g. Custom Allocators). We propose that, for the purposes of the runtime semantics of Rust,Box
is treated as no more special than a user-defined smart pointer you can write today1. In particular, it is given similar behaviour on a typed copy to a raw pointer.Rendered
Footnotes
We maintain some trivial validity invariants (such as alignment and address space limits) that a user cannot define, but these invariants only depend upon the value of the
Box
itself, rather than on memory. ↩