Semantics of MIR assignments, around aliasing, ordering, and primitives.

In today's MIR, an indirect assignment like `*p = *q;` is similar to, but not exactly the same as:
```rust
tmp = *q;
*p = tmp;
```
The differences are:
* performance: they only produce the same codegen for primitives
  * this is assuming `tmp` isn't used elsewhere, allowing codegen to treat it like an SSA value, resulting in `store(p, load(q))`, which is also what `*p = *q` codegens to
  * for non-primitives, the amount of data being copied doubles, as `tmp` must be in memory
* correctness: only `*p = *q;` is UB (AFAIK) if `p..p+size` overlaps `q..q+size`
  * this also likely only affects non-primitives, which have to copy data in memory, but we could decide to ignore the type and always make it UB when they overlap

<hr/>

For the purposes of this discussion, a primitive is:
* scalar (`bool`, `char`, integer, float, or pointer/reference)
* vector (SIMD-enabled array of scalars)

Scalar pairs likely also should/need to be included, due to how easy they are to support in any code that already handles scalars, and also due to their use in wide pointers/references.

<hr/>

What's interesting about primitives, though, is that some kinds of `Rvalue`s (the RHS of the assignment) *always* produce primitive values, because they're primitive operations.

The `Rvalue` variants which are always primitive, today, are:
* `Ref` (`&T` / `&mut T` - may become dependent on custom DSTs in the future)
* `AddressOf` (`*const T` / `*mut T` - may become dependent on custom DSTs in the future)
* `Len` (`usize`)
* `Cast`, other than unsizing (scalar)
* `BinaryOp`, `UnaryOp` (scalar, or maybe also vector)
* `CheckedBinaryOp` (pair of integer and `bool` - only if we consider scalar pairs to be primitive)
* `NullaryOp(SizeOf)` (`usize`)
* `NullaryOp(Box)` (`Box<T>`)
* `Discriminant` (integer)

Which leaves these variants as potentially relying on memory operations to write the result:
* `Use` (any type, one copy)
* `Repeat` (`[T; N]`, `N` copies)
* `Cast`, specifically unsizing (any type implementing `CoerceUnsized`, per-field copies)
* `Aggregate` (any ADT, per-field copies)

If we want to remain conservative, we could ignore types for the latter, and just assume that the destination of the assignment cannot overlap any memory read in the `Operand`s of the `Rvalue`.

We could even cement the distinction by moving the always-primitive operations into a new `PrimOp` enum, and/or move the other `Rvalue`s to their own statements (e.g. introduce `Copy(*p, *q)`), but that's more aesthetic than anything for the time being.

At the very least, we should probably document these differences, and make sure that `miri` only allows overlaps in the cases we don't consider UB (either abstractly, or due to our choice of codegen).

<hr/>

Another interesting aspect of the always-primitive ops is that they're "pure functions" of their operands (other than `NullaryOp(Box)`, I suppose, but that could be replaced with a call to a lang item returning a `Box<MaybeUninit<T>>`, instead).

This means that if we wanted to, we could replace some of the intermediary locals with an `PrimOp` DAG, a bit like SSA but without φ (`phi`) nodes or a strict instruction stream.
All of the necessary ordering would still happen at the statement level (so this is nowhere near as complex as VSDG), but we might see some benefits in scalar-heavy code.

<hr/>

Asides aside, cc @RalfJung @rust-lang/wg-mir-opt

---

The [latest proposal is here](https://github.com/rust-lang/rust/issues/68364#issuecomment-614862820)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semantics of MIR assignments, around aliasing, ordering, and primitives. #68364

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Semantics of MIR assignments, around aliasing, ordering, and primitives. #68364

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions