UTF-8 to UTF-16 Path conversions on Windows unnecessarily allocate twice

The Unicode Windows API internally works with (potentially ill-formed) UTF-16 paths that must be converted to from Rust's UTF-8 strings. The conversion happens here:

https://github.com/rust-lang/rust/blob/b04c5329e1e145fb2fb46c5a7e775638712b03aa/library/std/src/sys/windows/mod.rs#L159-L166

`encode_wide` does the translation from WTF-8 to UTF-16, leaving a `Vec<u16>` with `len == cap`. The following calling to `push` will then allocate to reserve additional memory for the null-terminating character.

This could be resolved in two ways:

1) Call `Vec::with_capacity(EncodeWide::size_hint + 1)` beforehand and then call `Vec::extend` on EncodeWide.
2) Make a wrapper around `EncodeWide` that increases `iter::size_hint` by one. Incorporating the final null byte into `iter::next` would add a performance penalty, therefore it would probably be desirable to just adjust the size_hint by one and adding the null-terminator manually.

	let mut maybe_result: Vec<u16> = s.encode_wide().collect();
	if unrolled_find_u16s(0, &maybe_result).is_some() {
	return Err(crate::io::const_io_error!(
	ErrorKind::InvalidInput,
	"strings passed to WinAPI cannot contain NULs",
	));
	}
	maybe_result.push(0);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTF-8 to UTF-16 Path conversions on Windows unnecessarily allocate twice #96297

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

UTF-8 to UTF-16 Path conversions on Windows unnecessarily allocate twice #96297

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions