Skip to content

UTF-8 to UTF-16 Path conversions on Windows unnecessarily allocate twice #96297

Closed
@AronParker

Description

@AronParker

The Unicode Windows API internally works with (potentially ill-formed) UTF-16 paths that must be converted to from Rust's UTF-8 strings. The conversion happens here:

let mut maybe_result: Vec<u16> = s.encode_wide().collect();
if unrolled_find_u16s(0, &maybe_result).is_some() {
return Err(crate::io::const_io_error!(
ErrorKind::InvalidInput,
"strings passed to WinAPI cannot contain NULs",
));
}
maybe_result.push(0);

encode_wide does the translation from WTF-8 to UTF-16, leaving a Vec<u16> with len == cap. The following calling to push will then allocate to reserve additional memory for the null-terminating character.

This could be resolved in two ways:

  1. Call Vec::with_capacity(EncodeWide::size_hint + 1) beforehand and then call Vec::extend on EncodeWide.
  2. Make a wrapper around EncodeWide that increases iter::size_hint by one. Incorporating the final null byte into iter::next would add a performance penalty, therefore it would probably be desirable to just adjust the size_hint by one and adding the null-terminator manually.

Metadata

Metadata

Assignees

Labels

O-windowsOperating system: WindowsT-libsRelevant to the library team, which will review and decide on the PR/issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions