Skip to content

OsString::from_wide (in Windows OsStringExt) is unsound #72760

Closed
@RalfJung

Description

@RalfJung

The following program causes UB:

use std::os::windows::ffi::{OsStrExt, OsStringExt};
use std::ffi::{OsStr, OsString};

fn main() {
    let base = "a\té \u{7f}💩\r";
    let mut base: Vec<u16> = OsStr::new(base).encode_wide().collect();
    base.push(0xD800);
    let _res = OsString::from_wide(&base);
}

Miri says:

error: Undefined Behavior: type validation failed: encountered 0x0000d800, but expected a valid unicode codepoint
   --> /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/src/libcore/char/convert.rs:102:5
    |
102 |     transmute(i)
    |     ^^^^^^^^^^^^ type validation failed: encountered 0x0000d800, but expected a valid unicode codepoint
    |
    = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
    = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
            
    = note: inside `std::char::from_u32_unchecked` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/src/libcore/char/convert.rs:102:5
    = note: inside `std::sys_common::wtf8::Wtf8Buf::push_code_point_unchecked` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/src/libstd/sys_common/wtf8.rs:204:26
    = note: inside `std::sys_common::wtf8::Wtf8Buf::from_wide` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/src/libstd/sys_common/wtf8.rs:194:21
    = note: inside `<std::ffi::OsString as std::os::windows::ffi::OsStringExt>::from_wide` at /home/r/.rustup/toolchains/miri/lib/rustlib/src/rust/src/libstd/sys/windows/ext/ffi.rs:101:44
note: inside `main` at wtf8.rs:8:16
   --> wtf8.rs:8:16
    |
8   |     let _res = OsString::from_wide(&base);
    |                ^^^^^^^^^^^^^^^^^^^^^^^^^^

The problem is this code:

pub fn push(&mut self, code_point: CodePoint) {
if let trail @ 0xDC00..=0xDFFF = code_point.to_u32() {
if let Some(lead) = (&*self).final_lead_surrogate() {
let len_without_lead_surrogate = self.len() - 3;
self.bytes.truncate(len_without_lead_surrogate);
self.push_char(decode_surrogate_pair(lead, trail as u16));
return;
}
}
// No newly paired surrogates at the boundary.
self.push_code_point_unchecked(code_point)
}

This calls push_code_point_unchecked unless the new code point is in 0xDC00..=0xDFFF, but what about surrogates in 0xD800..0xDC00?

This code is unchanged since its introduction in c5369eb. I am not sure what the intended safety contract of push_code_point_unchecked is. That method is not marked unsafe but clearly should be -- it calls char::from_u32_unchecked. So my guess is the safety precondition is that CodePoint must not be part of a surrogate pair, but the thing is, push calls it without actually ensuring that condition. The condition it does ensure is that the codepoint is not in 0xDC00..=0xDFFF, but that does not help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-bugCategory: This is a bug.I-unsoundIssue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/SoundnessO-windowsOperating system: WindowsT-libsRelevant to the library team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions