Skip to content

Path/OsStr incorrectly handle Formatter flags #136617

@thaliaarchi

Description

@thaliaarchi

The Display implementations for Path, PathBuf, OsStr, and OsString incorrectly handle Formatter<'_> flags. This is visible on stable (for paths) or with #![feature(os_str_display)] (for OS strings).

Background

On Windows, these types convert from wide strings (u16) to WTF-8 and the only invalid sequences are unpaired surrogate halves. On other systems, these types use conventional UTF-8 and may have any kind of invalid sequence. The Display implementations replace such invalid sequences with the replacement character U+FFFD. This is done by iterating over valid chunks, where each is followed by an invalid sequence.

Bugs

In the Windows implementation, Wtf8, when the full string is valid UTF-8, it forwards the formatter args by calling Display::fmt(s, f), so it is padded as requested in most cases. However, when it is invalid, the formatter args are ignored. In the non-Windows implementation, Bytes, the last valid chunk is formatted with Display::fmt(last_chunk, f), but all other chunks with f.write_str(s). This is not only inconsistent, but it can display padding inside the string data.

This illustrates the buggy Display implementations in (AFAIK) all the APIs they're exposed in:

#![feature(os_str_display)]
use std::{ffi::OsString, path::PathBuf};

#[cfg(unix)]
{
    use std::os::unix::ffi::OsStringExt;
    display(OsString::from_vec(b"b\xFFd".to_vec()), "ab�    d     e"); // BUG
    display(OsString::from_vec(b"bcd".to_vec()), "a   bcd    e");
}
#[cfg(windows)]
{
    use std::os::windows::ffi::OsStringExt;
    macro_rules! wtf8[($($c:literal),*) => { &[$($c as u16),*] }];
    display(OsString::from_wide(wtf8![b'b', 0xD800, b'd']), "ab�de"); // BUG
    display(OsString::from_wide(wtf8![b'b', b'c', b'd']), "a   bcd    e");
}
assert_eq!(format!("a{:^10}e", "bcd"), "a   bcd    e");

fn display(s: OsString, expect: &str) {
    let p = PathBuf::from(s.clone());
    assert_eq!(format!("a{:^10}e", s.display()), expect);
    assert_eq!(format!("a{:^10}e", s.as_os_str().display()), expect);
    assert_eq!(format!("a{:^10}e", p.display()), expect);
    assert_eq!(format!("a{:^10}e", p.as_path().display()), expect);
}

History

  • 742ca0caf25 (std: Respect formatting flags for str-like OsStr, 2017-08-12): Forward formatter flags for fully valid strings
  • ac96fd77874 (Avoid allocations in Debug for os_str, 2017-06-12): Display with Utf8Chunks-style iteration
  • 45ddf50cebd (Add new path module, 2015-01-29): Display with String::from_utf8_lossy

Fix

We cannot simply ignore the formatter args, because that would break backwards compatibility, as the common case of fully valid UTF-8 respects padding, matching str and String.

The simplest fix would be to bring the behavior of non-Windows in line with Windows, i.e., fix the erroneous padding for the last chunk by only forwarding formatter args if the full string is valid.

A simple fix would be to format self.to_string_lossy() instead of doing a manual iteration. However, this incurs an allocation when invalid.

I propose doing the formatting by scanning in two passes. The first pass counts the number of Unicode scalars to render. Since there's no escaping, this is relatively inexpensive. Then the left padding is written. Then the second pass writes the string. Finally, the right padding is written. This could be abstracted by making a method on the formatter, which takes a length in Unicode scalars and a callback which writes the data: It would pad left according to the length, write via the callback, then pad right.

I would like to work on fixing this, pending discussion on which direction to take it :).

Related?

I suspect that other types could have similar issues with inconsistent or misplaced formatter flag handling, but have only surveyed those detailed here.

Metadata

Metadata

Assignees

Labels

A-fmtArea: `core::fmt`C-bugCategory: This is a bug.T-libsRelevant to the library team, which will review and decide on the PR/issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions