Skip to content

RFC: No (opsem) Magic Boxes #3712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

chorman0773
Copy link

@chorman0773 chorman0773 commented Oct 15, 2024

Summary

Currently, the operational semantics of the type alloc::boxed::Box<T> is in dispute, but the compiler adds llvm noalias to it. To support it, the current operational semantics models have the type use a special form of the Unique (Stacked Borrows) or Active (Tree Borrows) tag, which has aliasing implications, validity implications, and also presents some unique complications in the model and in improvements to the type (e.g. Custom Allocators). We propose that, for the purposes of the runtime semantics of Rust, Box is treated as no more special than a user-defined smart pointer you can write today1. In particular, it is given similar behaviour on a typed copy to a raw pointer.

Rendered

Footnotes

  1. We maintain some trivial validity invariants (such as alignment and address space limits) that a user cannot define, but these invariants only depend upon the value of the Box itself, rather than on memory.

@chorman0773 chorman0773 added T-lang Relevant to the language team, which will review and decide on the RFC. T-opsem Relevant to the operational semantics team, which will review and decide on the RFC. labels Oct 15, 2024
@programmerjake

This comment was marked as resolved.

Clarify the constraint o the invariant in footnote

Co-authored-by: Jacob Lifshay <[email protected]>
@clarfonthey
Copy link

It feels odd that one of the clear options is left out: why not expose a Unique<T> that has the same semantics as Box<T>, so any smart pointer type can use it?

Like, I definitely agree that Box shouldn't have special semantics that you can't reproduce elsewhere. But among the options, it feels pretty limiting to come to the conclusion that we should eliminate those semantics, rather than just making them reproducible elsewhere.

I agree that whatever happens shouldn't be specific to the Global allocator, though.

@scottmcm scottmcm added the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label Oct 15, 2024
@scottmcm
Copy link
Member

scottmcm commented Oct 15, 2024

We maintain some trivial validity invariants (such as alignment and address space limits) that a user cannot define

I, at least, fully expect us to eventually have some way of writing alignment-obeying raw pointers in Rust in some way. If nothing else, slice::Iter is less good than it could be because of the lack of them, and optimizing that type is super-important.

(Transmuting between NonNull<T> and unsafe<'a> &'a T is one thing that might work, for example, though it's also possible that the opsem for that will end up saying that it doesn't and that something else is required instead.)

EDIT: added a word to try to communicate that I wasn't expecting this RFC to include such a type.

@scottmcm scottmcm self-assigned this Oct 15, 2024
@chorman0773
Copy link
Author

I, at least, fully expect us to have some way of writing alignment-obeying raw pointers in Rust in some way. If nothing else, slice::Iter is less good than it could be because of the lack of them, and optimizing that type is super-important.

That's my hope for the future as well, but to avoid the RFC becoming too cluttered, I am refraining from defining such a type in this RFC.

@juntyr
Copy link
Contributor

juntyr commented Oct 16, 2024

Is there a list of optimisations that depend on noalias being emitted for Box’es?

@clarfonthey
Copy link

The RFC seems pretty clear that noalias hasn't really provided many benefits compared to being an extra burden to uphold for implementers, but maybe it is worth seeing if there are any sources that can provide a bit more detail on that.

* A pointer with an address that is not well-aligned for `T` (or in the case of a DST, the `align_of_val_raw` of the value), or
* A pointer with an address that offsetting that address (as though by `.wrapping_byte_offset`) by `size_of_val_raw` bytes would wrap arround the address space

The [`alloc::boxed::Box<T>`] type shall be laid out as though a `repr(transparent)` struct containing a field of type `WellFormed<T>`. The behaviour of doing a typed copy as type [`alloc::boxed::Box<T>`] shall be the same as though a typed copy of the struct `#[repr(transparent)] struct Box<T>(WellFormed<T>);`.
Copy link
Member

@kennytm kennytm Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry but is the term "typed copy" explained somewhere?

(the explanations I could find are from pretty unofficial places like a reddit1 and urlo2 post)

Footnotes

  1. "they are like memcpy, but the copy occurs with a type, giving the compiler some extra power."

  2. "a typed copy consists of essentially decoding the AM-bytes into the abstract value then encoding that abstract value back to AM-bytes at the new location."

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The urlo definition is probably good.
It's defined in the opsem, but I don't know if we have a very good written record of that other than spread arround zulip threads and github issues.

@scottmcm scottmcm removed the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label Oct 16, 2024
Copy link
Member

@scottmcm scottmcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a fan of this. I think that people moving from Vec<T> to Box<[T]> having to deal with drastically-different soundness rules is a giant footgun, and getting rid of the special [ST]B behaviour here sounds good to me.

@nikomatsakis
Copy link
Contributor

My general take:

The two "endpoints" here are

  • Efficient end-user abstractions: this allows most safe code to run faster. This would have strong alias requirements and would not expose raw/unsafe details. This permits non-obvious optimizations (e.g., small string optimization or 0-length capacity).
  • Building blocks for unsafe code: this exposes raw/unsafe details.

From what I can tell, we current orient Box as the former but Vec and String as the latter. That seems backwards, since if anything those are far more useful as abstractions than Box is.

If I could go back in time, I think I would favor end-user abstractions and offer different types (e.g., RawVec or RawBuffer or something) that exposed their innards, but I think that ship has sailed, and we might as well embrace the current situation (which is nice in some ways too).

@nikomatsakis
Copy link
Contributor

The RFC would benefit from some attempt to quantify the impact on performance, though our lack of standardized runtime benchmarks makes that hard.

@traviscross
Copy link
Contributor

@rust-lang/opsem: We were curious in our discussions, does this RFC represent an existing T-opsem consensus?

@chorman0773
Copy link
Author

chorman0773 commented Oct 16, 2024

We were curious in our discussions, does this RFC represent an existing T-opsem consensus?

It does not represent any FCP done by T-opsem, which is why I've included them here. The claims I make, including those about the impact on the operation semantics, are included in the request for comment and consensus.

The RFC would benefit from some attempt to quantify the impact on performance, though our lack of standardized runtime benchmarks makes that hard.

I recall some perf PR's (using the default rustc-perf suite) being done to determine the impact, which showed negligible impact. I can probably pull them up at some point in the RFC's lifecycle.


(Note that we do not define this type in the public standard library interface, though an implementation of the standard library could define the type locally)

The following are not valid values of type `WellFormed<T>`, and a typed copy that would produce such a value is undefined behaviour:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Reference has been adjusted a while ago to state validity invariants positively, i..e by listing what must be true, instead of listing what must not be false. IMO that's more understandable, and the RFC should be updated to also do that.

@RalfJung
Copy link
Member

RalfJung commented Oct 17, 2024

I agree that whatever happens shouldn't be specific to the Global allocator, though.

There are patterns of using a custom per-Box allocator that are incompatible with the aliasing requirements, at least under our current aliasing models. See rust-lang/miri#3341 for an example. So if we always make Box be unique, we have to declare those allocators to be UB.

Is there a list of optimisations that depend on noalias being emitted for Box’es?

It's "every LLVM optimization that looks at alias information". The question is how much that matters in practice, which is hard to evaluate.

We were curious in our discussions, does this RFC represent an existing T-opsem consensus?

As Connor said, not in any formal sense. Several opsem members have expressed repeatedly that they want to see noalias on Box go, but I don't know whether we have team consensus on this.

My own position is that I love how this simplifies the model and Miri, I am just slightly concerned about this being an irreversible loss of optimization potential that we might regret later. Absence of evidence of optimization benefit is not evidence of absence. Our benchmarks likely just don't have a lot of functions that take Box<T> by value. However, that in itself is an indication that the optimization benefit is likely limited.

Is there a way we can query the ecosystem for functions taking Box<T> by value?

- While the easiest alternative is to do nothing and maintain the status quo, as mentioned this has suprisingly implications both for the operational semantics of Rust
- Alternative 2: Introduce a new type `AlisableBox<T>` which has the same interface as `Box<T>` but lacks the opsem implications that `Box<T>` has.
- This also does not help remove the impact on the opsem model that the current `Box<T>` has, though provides an ergonomically equivalent option for `unsafe` code.
- Alternative 3: We maintain the behaviour only for the unparameterized `Box<T>` type using the `Global` allocator, and remove it for `Box<T,A>` (for any allocator other than `A`), allowing unsafe code to use `Box<T, CustomAllocator>`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually the status quo, since rust-lang/rust#122018

@clarfonthey
Copy link

clarfonthey commented Oct 17, 2024

Just to follow up on some of the discussion, it wasn't immediately clear to me that types similar to Box, like Vec and Arc, genuinely don't have these semantics, even though it is implied by the fact that Box is unique in this regard. I think this is worth emphasising more in the text of the RFC, since in a very real sense, we've been going without these semantics for most[citation needed] parts of Rust totally fine, which further emphasises less will be lost by removing it for Box.

I would still love if there's more data showing the lack of returns on noalias optimisations, since it feels wrong that something with a lot of history and usage isn't helping that much, but the fact that Vec and other types don't have this optimisation at least helps us understand that it's not going to cause any performance issues if removed.

…C++ counterpointer `std::unique_ptr`, in the prior art section
@chorman0773
Copy link
Author

chorman0773 commented Oct 17, 2024

I am just slightly concerned about this being an irreversible loss of optimization potential that we might regret later

FTR, speaking right now as one of the main developers of lccc, my opinion is that the best way to mitigate any loss of future optimization potential is to just be far more granular with &mut. I don't think spamming extra noalias on parameters is necessary if we just emit more metadata on derefs (llvm's scoped noalias and dereferenceable, and lccc's unique and dereferenceable type attributes). If there are problems with doing that, I don't think this is a place where we should necessarily bend the whole language to this one attribute on one backend, especially given the complexity of maintaining the feature as-is, where we keep running into fundamental issues with the special treatment of Box. If there are legitimate optimizations lost as a result of the function level noalias, that can't be made up with more granular scoping on dereferences, that may be a different consideration, but my view is that the language after this change has the semantics necessary to justify at the very least a majority of the optimizations that may ultimately be lost, and its up to rustc and llvm (and other compilers) to make use of those semantics if they wish to.

Just to follow up on some of the discussion, it wasn't immediately clear to me that types similar to Box, like Vec and Arc, genuinely don't have these semantics, even though it is implied by the fact that Box is unique in this regard. I think this is worth emphasising more in the text of the RFC, since in a very real sense, we've been going without these semantics for most parts of Rust totally fine, which further emphasises less will be lost by removing it for Box.

I mentioned Vec<T> in particular, and also mentioned std::unique_ptr<T> from C++, which is the most closely equivalent type, and lacks the same semantic implications (and also optimizations).

@scottmcm
Copy link
Member

Just to follow up on some of the discussion, it wasn't immediately clear to me that types similar to Box, like Vec and Arc, genuinely don't have these semantics

And more than them just not having them, IIRC someone tried to implement Vec<T> as the obvious wrapper around Box<[MaybeUninit<T>]>, and in the process found out that lots of people are depending on Vec not having them.

@RalfJung
Copy link
Member

RalfJung commented Oct 18, 2024 via email

@RalfJung
Copy link
Member

RalfJung commented May 24, 2025

@safinaskar is the reason for you to want this what you wrote here, or is there more?

I agree with @saethlin that the GPU usecase is too domain-specific to justify such a choice for the entire language, given in particular that people writing for the GPU can use &mut to still get all the benefits of noalias.

Judging from this comment, it also does not sound like noalias on Box specifically has a big impact on the GPU ecosystem today. (Nobody is suggesting to get rid of all noalias in the language, after all. So saying that noalias in general is important is too imprecise for this discussion.) If there are benchmarks that show that noalias on Box is important for GPUs or autodiff, please share them! They should definitely be included in the RFC, no matter the final decision.

@safinaskar
Copy link

safinaskar commented May 24, 2025

@RalfJung

or is there more?

Yes, there is more! Even if boxes become non-magical from opsem point of view (i. e. noalias), they will still be magical from other points of view:

  • You can move out of the box
  • As well as I understand, boxes are treated specially by borrow checker
  • As well as I understand, boxes with default allocator are treated as if they have repr(transparent), but other boxes seem not to have this thing. I. e. if we someday make boxes completely non-magical from all points of view, we will still have to invent some kind of #[repr_transparent_but_not_always] attribute

So, we still need this is_box thing, at least in short and middle term, even if we remove noalias from all boxes. But we want this is_box return true for boxes with default allocator only. Only they are magical from all points of view, other boxes are not (update: should not). And that is why I wrote previous comment. I think we need these changes, even if we remove noalias from all boxes. noalias for boxes will come as a nice side effect.


Okay, I spent some time thinking about this more, and now I think that I did a mistake. As well as I understand, borrow checking happens before monomorphization. And boxes are treated specially by borrow checker. Thus borrow checker still needs to be able to call is_box. So here is my new plan: is_box should return true if and only if we know for sure at this compilation stage that this box has default allocator. I. e. before monomorphization is_box for Box<T, A> should return false, even if at monomorphization stage this A will happen to be default allocator.

I think such change will bring many benefits.

And I will again say: consider just removing allocators from boxes and introducing another different type for boxes with allocators

@carbotaniuman
Copy link

Even if boxes become non-magical from opsem point of view

So, we still need this is_box thing, at least in short and middle term, even if we remove noalias from all boxes.

This RFC specifically does not propose to change or remove any other aspect of Box's magic, just it's opsem magic (in other words, noalias). While removing all magic from Box is indeed a useful goal, it is mostly orthogonal to this RFC.

With regards to is_box, I don't really think it really concerns this RFC, and would probably best be pursued as its own change.

@RalfJung
Copy link
Member

RalfJung commented May 24, 2025

So, we still need this is_box thing, at least in short and middle term, even if we remove noalias from all boxes. But we want this is_box return true for boxes with default allocator only. Only they are magical from all points of view, other boxes are not (update: should not). And that is why I wrote previous comment. I think we need these changes, even if remove noalias from all boxes. noalias for boxes will come as a nice side effect.

Getting rid of is_box is not the goal of this RFC. The goal of this RFC is to stop special-casing Box in the operational semantics, not to stop all special-casing everywhere. You are arguing for something else entirely. Please stick to the topic of this discussion.

Maybe you perceive there to be some problems with not special-casing Box in the operational semantics while still special-casing them elsewhere. Which problems do you think that introduces? (Note that we're only interested in new problems created by the RFC. If there's some problem that already exists but that the RFC doesn't solve, that's not an argument against the RFC -- no RFC should try to solve all problems at once.)

@traviscross
Copy link
Contributor

traviscross commented May 24, 2025

Regarding lang team vibes, the vibes we got seemed mostly in favor of the RFC?

I'm not convinced. See, e.g.:

It seems that MaybeDangling<T> solved most of the concrete problems cited as motivation for this RFC.

More broadly, probably I'd prefer, if we could work out how, to go the other direction and to make references (and Box) less special by giving people a way to create noalias wrapper types (see, e.g., here). In that world, it'd seem strange if Box<T> were not one of those noalias wrapper types given the fact that, regardless of what we do here, unique ownership will always be one of its library-level invariants.

Probably, therefore, I see this RFC in this larger context. If we drop noalias from Box<T>, it'd seem we're likely committing to the idea that only references will be special in this way. Given that we have an extensive type system that can cleanly propagate these invariants, it would just seem unfortunate to commit ourselves and our users to not being able to express this to codegen.

@RalfJung
Copy link
Member

RalfJung commented May 24, 2025

It seems that MaybeDangling solved most of the concrete problems cited as motivation for this RFC.

It does solve the ManuallyDrop issues, agreed.

However, it definitely does not solve the "people are surprised to learn about the alias requirements" part, as you have to know about the alias requirements to realize you need to suppress them with MaybeDangling. It also doesn't fix the problems around custom allocators, where we probably do not want noalias as otherwise things get even more surprising (see e.g. rust-lang/miri#3341). We'd have to use entirely different types for Box<T> and Box<T, Custom>. That would also fix a whole bunch of implementation issues, but it's clearly a worse API than just adding an allocator parameter to Box.

@traviscross
Copy link
Contributor

traviscross commented May 24, 2025

It also doesn't fix the problems around custom allocators...

Reading through the issues, it doesn't seem the problem is custom allocators in general, but particular allocator strategies that want to offset from the data pointer to the allocator state. Even for custom allocators, then, it seems we might have other options than turning noalias off across the board, including the one that @oli-obk mentioned:

Maybe we should make the pointer field type an assoc type on allocator, then allocators can choose if they want unique or raw pointer behaviour.

Regarding the potential surprise of it, I think I'm OK if people writing custom allocators or doing other such things need to pore carefully over our documentation. We were still figuring things out ourselves and hadn't provided appropriate tools to people. Once we have, and have documented this carefully, I'm hopeful this kind of surprise will no longer be the same kind of problem.

In general, I think my hesitation is that it still seems we might have other options here that we haven't yet ruled out.

@chorman0773
Copy link
Author

Getting rid of is_box is not the goal of this RFC. The goal of this RFC is to stop special-casing Box in the operational semantics, not to stop all special-casing everywhere

As the title says: "No (opsem) Magic Boxes". I want Box to eventually become non-magical through a combination of removing magic and adding generalizations (like deref_patterns), but this is exclusively focused on the opsem behaviour.

Assuming I read those vibes correctly, the next step here seems to be updating the RFC to resolve all the comments that have accrued, and to include some concrete information on the benchmarking that has been done. @chorman0773 will you have time to do that?

I'd have to find the benchmarks (or perhaps new, up to date, ones should be run, but those kinds of modifications to rustc are far beyond my paygrade). The rest of the comments, I'll address this coming week.

@RalfJung
Copy link
Member

Reading through the issues, it doesn't seem the problem is custom allocators in general, but particular allocator strategies that want to offset from the data pointer to the allocator state.

Correct. That's not an uncommon strategy for allocators. We have some vague ideas for how to support it for custom global allocators, but that relies on deallocation being a special magic operation -- and is likely incompatible with inlining the deallocator. So this doesn't seem viable for custom local allocators.

Even for custom allocators, then, it seems we might have other options than turning noalias off across the board, including the one that @oli-obk rust-lang/rust#122018 (comment):

This would need a form of associated type constructors, so that one allocator can have type Ptr<T> = NoaliasPtr<T> and another allocator can use NonNull, and then Box<T> uses A::Ptr::<T> as its pointer field. And I have no idea how we could possibly make this compatible with all the other special behavior Box has -- we could no longer be sure that Ptr actually is a pointer, after all.

So, I'd say this is quite speculative.

(or perhaps new, up to date, ones should be run, but those kinds of modifications to rustc are far beyond my paygrade).

There's literally a -Z flag for this, so rustc doesn't even need modification. :)

@safinaskar
Copy link

@RalfJung

We'd have to use entirely different types for Box<T> and Box<T, Custom>.

After reading this thread and reading rust-lang/rust#141219 I now agree that we definitely should split Box into these two boxes.

So I kindly ask someone to author RFC or MCP for this (i. e. splitting box). Hopefully it will be accepted, and then eventially someone will implement it.

We totally allowed to do this, because allocator API is unstable. This will be annoyance for people working with custom allocators, but I totally agree that current state causes too much pain for compiler developers

@RalfJung
Copy link
Member

RalfJung commented May 25, 2025 via email

@Jules-Bertholet
Copy link
Contributor

Maybe we should make the pointer field type an assoc type on allocator

The Storage API needs this too.

@afetisov
Copy link

Imho one downside of removing uniqueness guarantees from Box<T> is that it would make it easier to violate aliasing in unsafe code.

The Box API allows safely to do one magic thing: access its contents by value. Surely trying to move data in/out of the Box would assert unique ownership over those contents, and invalidate any other pointers aliasing that memory? If the answer is "yes", then it seems easy to inadvertently violate aliasing in unsafe code, while handling raw pointers to the Box-allocated memory.

While this problem already exists, the current restrictive rules about Box aliasing mean that the error would be caught much earlier, when the Box itself is accessed. It also possibly makes aliasing pointers to its heap memory more rare to begin with.

On the other hand, this problem seems similar to accidentally taking a &mut boxed.field, which can also be done safely and would invalidate aliasing pointers. This would apply to any type implementing DerefMut. Is DerefMut considered a hazard for unsafe code? If not, then perhaps the box issue is also insignificant.

But also, it seems similar to the hazard of handling &mut T in unsafe code, while producing aliasing pointers. Its safe API makes too easy to accidentally create a child &mut, via field access or a method call, and invalidate raw pointers.

@carbotaniuman
Copy link

The Box API allows safely to do one magic thing: access its contents by value. Surely trying to move data in/out of the Box would assert unique ownership over those contents, and invalidate any other pointers aliasing that memory? If the answer is "yes", then it seems easy to inadvertently violate aliasing in unsafe code, while handling raw pointers to the Box-allocated memory.

I don't think this is true, at least with Box::as_mut_ptr.

#![feature(box_as_ptr)]

struct Foo { a: i32 }

fn main() {
    let mut b = Box::new(Foo { a: 0 } );
    let ptr = Box::as_mut_ptr(&mut b);
    
    // This is sound afaik (checked with Miri)
    unsafe {
        *b = Foo {a: 1};
        b.a = 2;
        *ptr = Foo {a: 3};
    }
    
    // Without this RFC, this is unsound
    let b2 = b;
    unsafe {
        *ptr = Foo {a: 4};
    }
}

Probably, therefore, I see this RFC in this larger context. If we drop noalias from Box<T>, it'd seem we're likely committing to the idea that only references will be special in this way.

I'm not sure this follows - we could always add a type that is what Unique wants to be that exposes this to users. I don't think this change closes any door with regards to the opsem, other than that adding such a new feature would need to pass the bar for complexity compared to the status quo. (And if that is the case, arguably Box noalias isn't pulling its weight and shouldn't exist solely as a historical quirk).

@RalfJung
Copy link
Member

I'm not sure this follows - we could always add a type that is what Unique wants to be that exposes this to users. I don't think this change closes any door with regards to the opsem, other than that adding such a new feature would need to pass the bar for complexity compared to the status quo.

Yeah, agreed. If anything, disentangling this from Box helps pave the way to dealing with this in a more modular, one-wrapper-type-per-feature kind of a way.

@traviscross
Copy link
Contributor

Yeah, agreed. If anything, disentangling this from Box helps pave the way to dealing with this in a more modular, one-wrapper-type-per-feature kind of a way.

Since Box<T> would still have the library invariant of uniqueness, I don't think it would make a good orthogonal primitive in this way. To achieve this goal, I think we'd want a different type, and I don't have any reservations about some primitive LeakyNonUniqueBox<T>, without a library-level uniqueness invariant, not implying noalias.

@scottmcm scottmcm added the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label May 30, 2025
@traviscross traviscross added the P-lang-drag-2 Lang team prioritization drag level 2. label May 30, 2025
@RalfJung
Copy link
Member

I don't see how a Box-like type could avoid having uniqueness as a library invariant -- it needs that for drop to be able to deallocate memory. What people keep asking for is a type like C++ unique_ptr: free on drop, but no aliasing requirements. This type would naturally have the same library invariant as our current Box.

@RalfJung
Copy link
Member

See rust-lang/rust#138896 for yet another recent aliasing issue caused by Box being noalias.

@traviscross
Copy link
Contributor

traviscross commented May 31, 2025

I don't see how a Box-like type could avoid having uniqueness as a library invariant -- it needs that for drop to be able to deallocate memory.

Yes, that's why I had called it "leaky". (It's not a proposal though.)

What people keep asking for is a type like C++ unique_ptr: free on drop, but no aliasing requirements.

Yes, and as I understand, that's what we added with MaybeDangling<Box<T>>. We could add a type alias for that, if needed (e.g., as we did with NonZero*).

Though we accepted the RFC over a year ago, we haven't yet stabilized it -- or even implemented it. That's unfortunate, and I imagine that's part of why people are still asking for this. Probably there's a large part of me that wants to see that stabilized first. Then we can see what's left.

@traviscross
Copy link
Contributor

traviscross commented May 31, 2025

See rust-lang/rust#138896 for yet another recent aliasing issue caused by Box being noalias.

If I'm reading the situation there correctly, the problem was not directly that we apply noalias -- it didn't violate the LLVM model -- but that this usage violated the Stacked Borrows model, while being accepted under Tree Borrows.

Godbolt link

Observationally, most of the examples I've seen cited in practice tend to fit this pattern. I wonder whether this might be as much an argument for choosing Tree Borrows -- or something like it -- as it is for changing the semantics of Box<T>.

And, of course, MaybeDangling<Box<T>> solves this for Stacked Borrows (as would push_mut / push_ref).

@RalfJung
Copy link
Member

RalfJung commented May 31, 2025

Tree Borrows accepts this by sacrificing some optimization potential. If we let the opsem be guided by the library invariant for Box/&mut, that's much closer to SB than TB.

@traviscross
Copy link
Contributor

traviscross commented May 31, 2025

Probably I'd suggest that even if we were to remove noalias from Box<T>, that wouldn't spare users from having to learn that it's questionable to write the rust-lang/rust#138896 pattern in that way, because it'd still be wrong when written generically.

Godbolt link

@carbotaniuman
Copy link

because it'd still be wrong when written generically

Right, but I think it would reasonable to assume that arbitrary DerefMut could have arbitrary aliasing behavior, whereas the argument here is that people would assume Box (like Vec) does not have this behavior.

As an example, stable_deref_trait wrongly made this assumption, but it's been a pervasive one that has lingered in the ecosystem. I don't really see this mistake being made with generic T, and there, MaybeDangling<T> does seem like a tool there, though perhaps not a complete one (it wouldn't make the second example of yours valid for instance).

@traviscross
Copy link
Contributor

traviscross commented Jun 1, 2025

Right, but I think it would reasonable to assume that arbitrary DerefMut could have arbitrary aliasing behavior, whereas the argument here is that people would assume Box (like Vec) does not have this behavior.

In the cases I've reviewed, it's really not clear to me that people are making this mistake for that reason as opposed to making it for the same reason they'd make it in the generic case. People write this,

        self.items.push(item);
        self.ptrs.push(&raw mut *item); //~ ERROR value borrowed here after move

and they get the borrow checker error, so then they say, "oh, right, I need to swap those lines",

        self.ptrs.push(&raw mut *item);
        self.items.push(item);

and then it "works". I've caught this in review in places where it wasn't even arguably OK, and it was never a case where the person had some fancy theory about an analogy to the (perhaps accidental) operational semantics of Vec<T>. They just didn't notice or think about the effect of the move at all.

So that's why I say that I don't think that just dropping noalias from Box<T> gets us out from having to teach people carefully about this sort of hazard. Probably I suspect that adding Vec::push_mut (push an element and return a mutable reference to it) will be as big a help as anything, as then we can tell people to write:

        self.ptrs.push(&raw mut **self.items.push_mut(item));

@RalfJung
Copy link
Member

RalfJung commented Jun 1, 2025

So that's why I say that I don't think that just dropping noalias from Box gets us out from having to teach people carefully about this sort of hazard.

It doesn't completely get us out of that, but moving Box to long-term storage is much more common than doing the same for references -- so the amount of code on which the hazard triggers should be much reduced by having noalias for references, but not Box.

There's a potential alternative here of devising an aliasing model where moves don't cause such long-term effects in the same way (see rust-lang/unsafe-code-guidelines#450). The aliasing models we have right now have more UB than LLVM immediately needs for noalias, to leave some room for us to come up with clever optimizations in the future. We could have less UB, in particular for cases like this, if we decided we don't need much more than what noalias already gives us.

@traviscross
Copy link
Contributor

traviscross commented Jun 1, 2025

It doesn't completely get us out of that, but moving Box to long-term storage is much more common than doing the same for references -- so the amount of code on which the hazard triggers should be much reduced by having noalias for references, but not Box.

Something I think about is how that also argues the other way. If someone has a generic Array<T>, as in the example I gave, and then mostly instantiates it with Box<U> (e.g. in the test suite), for the reason you give, then Miri will catch the bug, and the user will be pushed into fixing it for all T. If we tell Miri to ignore it for Box<U>, then it may be more likely to go unnoticed, and then we get UB in the odd case where it's instantiated (maybe downstream) with SmallVec<U> or &mut U instead.

Even without generics (i.e., with human-powered monomorphization), I think about the pedagogic value of this. If someone is running Miri intermittently and mostly writing this pattern with Box<U>, then if Miri doesn't flag it and cause the user to learn something from that, then maybe the user falls into this same pattern and doesn't notice the UB when writing this out for SmallVec<U> or &mut U.

There's a potential alternative here of devising an aliasing model where moves don't cause such long-term effects in the same way (see rust-lang/unsafe-code-guidelines#450). The aliasing models we have right now have more UB than LLVM immediately needs for noalias, to leave some room for us to come up with clever optimizations in the future. We could have less UB, in particular for cases like this, if we decided we don't need much more than what noalias already gives us.

Yes; that's an option too.

@RalfJung
Copy link
Member

RalfJung commented Jun 1, 2025

OTOH there's a lot of code which people can't run in Miri or don't run for other reasons, so relying on Miri as a teaching tool is risky. And all else being equal, less UB means a strictly lower risk of unexpected compilation results.

@traviscross
Copy link
Contributor

OTOH there's a lot of code which people can't run in Miri or don't run for other reasons, so relying on Miri as a teaching tool is risky. And all else being equal, less UB means a strictly lower risk of unexpected compilation results.

Indeed. Many angles to consider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. I-lang-radar Items that are on lang's radar and will need eventual work or consideration. P-lang-drag-2 Lang team prioritization drag level 2. proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. T-lang Relevant to the language team, which will review and decide on the RFC. T-opsem Relevant to the operational semantics team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.