Description
The following program causes UB:
use std::os::windows::ffi::{OsStrExt, OsStringExt};
use std::ffi::{OsStr, OsString};
fn main() {
let base = "a\té \u{7f}💩\r";
let mut base: Vec<u16> = OsStr::new(base).encode_wide().collect();
base.push(0xD800);
let _res = OsString::from_wide(&base);
}
Miri says:
error: Undefined Behavior: type validation failed: encountered 0x0000d800, but expected a valid unicode codepoint
--> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/src/libcore/char/convert.rs:102:5
|
102 | transmute(i)
| ^^^^^^^^^^^^ type validation failed: encountered 0x0000d800, but expected a valid unicode codepoint
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: inside `std::char::from_u32_unchecked` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/src/libcore/char/convert.rs:102:5
= note: inside `std::sys_common::wtf8::Wtf8Buf::push_code_point_unchecked` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/src/libstd/sys_common/wtf8.rs:204:26
= note: inside `std::sys_common::wtf8::Wtf8Buf::from_wide` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/src/libstd/sys_common/wtf8.rs:194:21
= note: inside `<std::ffi::OsString as std::os::windows::ffi::OsStringExt>::from_wide` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/src/libstd/sys/windows/ext/ffi.rs:101:44
note: inside `main` at wtf8.rs:8:16
--> wtf8.rs:8:16
|
8 | let _res = OsString::from_wide(&base);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^
The problem is this code:
rust/src/libstd/sys_common/wtf8.rs
Lines 293 to 305 in 96dd469
This calls push_code_point_unchecked
unless the new code point is in 0xDC00..=0xDFFF
, but what about surrogates in 0xD800..0xDC00
?
This code is unchanged since its introduction in c5369eb. I am not sure what the intended safety contract of push_code_point_unchecked
is. That method is not marked unsafe
but clearly should be -- it calls char::from_u32_unchecked
. So my guess is the safety precondition is that CodePoint
must not be part of a surrogate pair, but the thing is, push
calls it without actually ensuring that condition. The condition it does ensure is that the codepoint is not in 0xDC00..=0xDFFF
, but that does not help.