`Path`/`OsStr` incorrectly handle `Formatter` flags

The `Display` implementations for `Path`, `PathBuf`, `OsStr`, and `OsString` incorrectly handle `Formatter<'_>` flags. This is visible on stable (for paths) or with `#![feature(os_str_display)]` (for OS strings).

## Background

On Windows, these types convert from wide strings (`u16`) to WTF-8 and the only invalid sequences are unpaired surrogate halves. On other systems, these types use conventional UTF-8 and may have any kind of invalid sequence. The `Display` implementations replace such invalid sequences with the replacement character U+FFFD. This is done by iterating over valid chunks, where each is followed by an invalid sequence.

## Bugs

In the Windows implementation, [`Wtf8`](https://github.com/rust-lang/rust/blob/c753cb9b42e530950204dd0367525cce12811cdd/library/std/src/sys_common/wtf8.rs#L580), when the full string is valid UTF-8, it forwards the formatter args by calling `Display::fmt(s, f)`, so it is padded as requested in most cases. However, when it is invalid, the formatter args are ignored. In the non-Windows implementation, [`Bytes`](https://github.com/rust-lang/rust/blob/c753cb9b42e530950204dd0367525cce12811cdd/library/std/src/sys/os_str/bytes.rs#L48), the last valid chunk is formatted with `Display::fmt(last_chunk, f)`, but all other chunks with `f.write_str(s)`. This is not only inconsistent, but it can display padding inside the string data.

This illustrates the buggy `Display` implementations in (AFAIK) all the APIs they're exposed in:

```rust
#![feature(os_str_display)]
use std::{ffi::OsString, path::PathBuf};

#[cfg(unix)]
{
    use std::os::unix::ffi::OsStringExt;
    display(OsString::from_vec(b"b\xFFd".to_vec()), "ab�    d     e"); // BUG
    display(OsString::from_vec(b"bcd".to_vec()), "a   bcd    e");
}
#[cfg(windows)]
{
    use std::os::windows::ffi::OsStringExt;
    macro_rules! wtf8[($($c:literal),*) => { &[$($c as u16),*] }];
    display(OsString::from_wide(wtf8![b'b', 0xD800, b'd']), "ab�de"); // BUG
    display(OsString::from_wide(wtf8![b'b', b'c', b'd']), "a   bcd    e");
}
assert_eq!(format!("a{:^10}e", "bcd"), "a   bcd    e");

fn display(s: OsString, expect: &str) {
    let p = PathBuf::from(s.clone());
    assert_eq!(format!("a{:^10}e", s.display()), expect);
    assert_eq!(format!("a{:^10}e", s.as_os_str().display()), expect);
    assert_eq!(format!("a{:^10}e", p.display()), expect);
    assert_eq!(format!("a{:^10}e", p.as_path().display()), expect);
}
```

## History

- [742ca0caf25](https://github.com/rust-lang/rust/commit/742ca0caf25813ee30b70d9e29ab3eacb0076302) (std: Respect formatting flags for str-like OsStr, 2017-08-12): Forward formatter flags for fully valid strings
- [ac96fd77874](https://github.com/rust-lang/rust/commit/ac96fd77874bb7968b7d82bed5a410d3941bc4c8) (Avoid allocations in Debug for os_str, 2017-06-12): Display with `Utf8Chunks`-style iteration
- [45ddf50cebd](https://github.com/rust-lang/rust/commit/45ddf50cebd8f3353383e473327911c1f8b34cd6) (Add new path module, 2015-01-29): Display with `String::from_utf8_lossy`

## Fix

We cannot simply ignore the formatter args, because that would break backwards compatibility, as the common case of fully valid UTF-8 respects padding, matching `str` and `String`.

The simplest fix would be to bring the behavior of non-Windows in line with Windows, i.e., fix the erroneous padding for the last chunk by only forwarding formatter args if the full string is valid.

A simple fix would be to format `self.to_string_lossy()` instead of doing a manual iteration. However, this incurs an allocation when invalid.

I propose doing the formatting by scanning in two passes. The first pass counts the number of Unicode scalars to render. Since there's no escaping, this is relatively inexpensive. Then the left padding is written. Then the second pass writes the string. Finally, the right padding is written. This could be abstracted by making a method on the formatter, which takes a length in Unicode scalars and a callback which writes the data: It would pad left according to the length, write via the callback, then pad right.

I would like to work on fixing this, pending discussion on which direction to take it :).

## Related?

I suspect that other types could have similar issues with inconsistent or misplaced formatter flag handling, but have only surveyed those detailed here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Path`/`OsStr` incorrectly handle `Formatter` flags #136617

Background

Bugs

History

Fix

Related?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Path/OsStr incorrectly handle Formatter flags #136617

Description

Background

Bugs

History

Fix

Related?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`Path`/`OsStr` incorrectly handle `Formatter` flags #136617