Description
Moved from #85263 as the PR has been merged.
Summary of safety of pattern matching against Rust union
s
Written by @Smittyvb, copied from #85263 (comment)
The unsafety checker is being to rewritten operate on the THIR instead of the MIR. As a part of that, I was implementing the unsafety rules for union
s, and encountered some weird edge cases with the way the MIR unsafety checker handles union
s. In general, writing to a union is safe but reading is unsafe
. There are some cases where writing to a union is unsafe (such as when that might cause a Drop
call) that are not being implemented in this PR.
Behavior specified by RFC 1444
Unsafe code may pattern match on union fields, using the same syntax as a
struct, without the requirement to mention every field of the union in a match
or use..
:fn f(u: MyUnion) { unsafe { match u { MyUnion { f1: 10 } => { println!("ten"); } MyUnion { f2 } => { println!("{}", f2); } } } }Matching a specific value from a union field makes a refutable pattern; naming
a union field without matching a specific value makes an irrefutable pattern.
Both require unsafe code.Pattern matching may match a union as a field of a larger structure. In
particular, when using a Rust union to implement a C tagged union via FFI, this
allows matching on the tag and the corresponding field simultaneously:#[repr(u32)] enum Tag { I, F } #[repr(C)] union U { i: i32, f: f32, } #[repr(C)] struct Value { tag: Tag, u: U, } fn is_zero(v: Value) -> bool { unsafe { match v { Value { tag: I, u: U { i: 0 } } => true, Value { tag: F, u: U { f: 0.0 } } => true, _ => false, } } }Note that a pattern match on a union field that has a smaller size than the
entire union must not make any assumptions about the value of the union's
memory outside that field. For example, if a union contains au8
and a
u32
, matching on theu8
may not perform au32
-sized comparison over the
entire union.
Actual behavior
The MIR unsafety checker doesn't implement that behavior exactly. Due to the way that the MIR is constructed and optimized, some destructuring patterns against unions that that the RFC specifies to be unsafe are allowed.
The behavior of the MIR is mostly "irrefutable pattern matching against unions without any bindings after desugaring or-patterns is safe", with some extra weird behavior when niches are involved.
Here are some examples of what the MIR considers safe and unsafe. Here is the prelude for all of these examples:
union Foo { bar: i8, zst: (), pizza: Pizza, oneval: OneVal, twoval: TwoVal, khar: char }
#[derive(Copy, Clone)]
struct Pizza { topping: Option<PizzaTopping> }
#[derive(Copy, Clone)]
enum PizzaTopping { Cheese, Pineapple }
#[derive(Copy, Clone)]
#[repr(u8)]
enum OneVal { One = 1 }
#[derive(Copy, Clone)]
#[repr(u8)]
pub enum TwoVal {
One = 1,
Two = 2,
}
let mut foo = Foo { bar: 5 };
Patterns considered safe
match (Foo { bar: 42 }) {
Foo { oneval: OneVal::One } => {
// always run
},
}
match foo {
Foo { bar: _ | _ } => {},
}
match u {
Foo { pizza: Pizza { .. } } => {},
};
match u {
Foo { pizza: Pizza { topping: _ } } => {},
};
match foo {
Foo { zst: () } => {},
}
let Foo { bar: _ } = foo;
match Some(foo) {
Some(Foo { bar: _ }) => 3,
None => 4,
};
Patterns considered unsafe
All of these require an unsafe
block to compile.
match (Foo { bar: 42 }) {
Foo { twoval: TwoVal::One | TwoVal::Two } => {
// always run
},
}
match foo {
Foo {
pizza: Pizza {
topping: Some(PizzaTopping::Cheese) | Some(PizzaTopping::Pineapple) | None
}
} => {},
}
match foo {
Foo { bar: _a } => {},
}
let Foo { bar: inner } = foo;
let (Foo { bar } | Foo { bar }) = foo;
match foo.khar {
'\0'..='\u{D7FF}' | '\u{E000}'..='\u{10FFFF}' => ()
};
match x.b {
'\0'..='\u{10FFFF}' => 1,
};