Skip to content

Tracking issue for improving std::fmt::Arguments and format_args!() #99012

Open
@m-ou-se

Description

@m-ou-se

Earlier this year in the libs team meeting, I presented several different ideas for alternative implementations of std::fmt::Arguments which could result in smaller binary size or higher performance. Now that #93740 is mostly done, I'll be shifting my focus to fmt::Arguments and exploring those ideas.

Currently, fmt::Arguments is the size of six pointers, and refers to three slices:

  • A &'static [&'static str] containing the literal parts around the formatting placeholders. E.g. for "a{}b{}c", these are ["a", "b", "c"].
  • A &[&(ptr, fn_ptr)] which is basically a &[&dyn Display] (but can point to Debug or Hex etc. too), pointing to the arguments. This one is not 'static, as it points to the actual arguments to be formatted.
  • A Option<&'static [FmtArgument]>, where FmtArgument is a struct containing all the options like precision, width, alignment, fill character, etc. This is unused (None) when all placeholders have no options, like in "{} {}", but is used and filled in for all place holders as soon as any placeholder uses any options, like in "{:.5} {}".

Here's a visualisation of that, for a "a{}b{:.5}c" format string:

Diagram

An important part of this design is that most of it can be stored in static storage, to minimize the amount of work that a function that needs to create/pass a fmt::Arguments needs to do. It can just refer to the static data, and only fill in a slice of the arguments.

Some downsides:

  • A fmt::Arguments is still relatively big (six pointers in size), and not a great type to pass by value. It could be just two pointers in size (one to static data, one to dynamic data), such that it fits in a register pair.
  • It costs quite a lot of static storage for some simple format strings. For example, "a{}b{}c" needs a &["a", "b", "c"], which is stored in memory as a (ptr, size) pair referencing three (ptr, size) pairs referencing one byte each, which is a lot of overhead. Small string literals with just a newline or a space are very common in formatting.
  • When even just a single formatting placeholder uses any non-standard options, such as "{:02x}", a relatively large array with all the (mostly default) formatting options is stored for all placeholders.
  • The non-static part that contains the pointers to the arguments contains the pointers to the relevant Display/Debug/etc. implementation as well, even though that second part is constant and could be static. (It's a bit tricky to split those, though.)
  • Even when formatting a simple &str argument with a simple "{}" placeholder, the full Display implementation for &str is pulled in, which include code for all the unused options like padding, alignment, etc.

Issues like those are often reason to avoid formatting in some situations, which is a shame.

None of these things are trivial to fix, and all involve a trade off between compile time, code size, runtime performance, and implementation complexity. It's also very tricky to make these tradeoffs for many different use cases at once, as the ways in which formatting is used in a program differs vastly per type of Rust program.

Still, there are many ideas that are worth exploring. It's hard to predict which one will end up being best, so this will involve several different implementations to test and benchmark.

I'll explain the different ideas one by one in the comments below as I explore them.


To do:

Metadata

Metadata

Assignees

Labels

A-fmtArea: `core::fmt`C-tracking-issueCategory: An issue tracking the progress of sth. like the implementation of an RFCI-heavyIssue: Problems and improvements with respect to binary size of generated code.I-slowIssue: Problems and improvements with respect to performance of generated code.T-libsRelevant to the library team, which will review and decide on the PR/issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions