Description
Earlier this year in the libs team meeting, I presented several different ideas for alternative implementations of std::fmt::Arguments
which could result in smaller binary size or higher performance. Now that #93740 is mostly done, I'll be shifting my focus to fmt::Arguments and exploring those ideas.
Currently, fmt::Arguments is the size of six pointers, and refers to three slices:
- A
&'static [&'static str]
containing the literal parts around the formatting placeholders. E.g. for"a{}b{}c"
, these are["a", "b", "c"]
. - A
&[&(ptr, fn_ptr)]
which is basically a&[&dyn Display]
(but can point toDebug
orHex
etc. too), pointing to the arguments. This one is not'static
, as it points to the actual arguments to be formatted. - A
Option<&'static [FmtArgument]>
, whereFmtArgument
is a struct containing all the options like precision, width, alignment, fill character, etc. This is unused (None
) when all placeholders have no options, like in"{} {}"
, but is used and filled in for all place holders as soon as any placeholder uses any options, like in"{:.5} {}"
.
Here's a visualisation of that, for a "a{}b{:.5}c"
format string:
An important part of this design is that most of it can be stored in static
storage, to minimize the amount of work that a function that needs to create/pass a fmt::Arguments needs to do. It can just refer to the static data, and only fill in a slice of the arguments.
Some downsides:
- A fmt::Arguments is still relatively big (six pointers in size), and not a great type to pass by value. It could be just two pointers in size (one to static data, one to dynamic data), such that it fits in a register pair.
- It costs quite a lot of static storage for some simple format strings. For example,
"a{}b{}c"
needs a&["a", "b", "c"]
, which is stored in memory as a (ptr, size) pair referencing three (ptr, size) pairs referencing one byte each, which is a lot of overhead. Small string literals with just a newline or a space are very common in formatting. - When even just a single formatting placeholder uses any non-standard options, such as
"{:02x}"
, a relatively large array with all the (mostly default) formatting options is stored for all placeholders. - The non-static part that contains the pointers to the arguments contains the pointers to the relevant Display/Debug/etc. implementation as well, even though that second part is constant and could be static. (It's a bit tricky to split those, though.)
- Even when formatting a simple
&str
argument with a simple"{}"
placeholder, the fullDisplay
implementation for&str
is pulled in, which include code for all the unused options like padding, alignment, etc.
Issues like those are often reason to avoid formatting in some situations, which is a shame.
None of these things are trivial to fix, and all involve a trade off between compile time, code size, runtime performance, and implementation complexity. It's also very tricky to make these tradeoffs for many different use cases at once, as the ways in which formatting is used in a program differs vastly per type of Rust program.
Still, there are many ideas that are worth exploring. It's hard to predict which one will end up being best, so this will involve several different implementations to test and benchmark.
I'll explain the different ideas one by one in the comments below as I explore them.
To do:
- Simplify and clean up format_args!() implementation so we can more easily work on it.
- Simplify format_args builtin macro implementation. #100277
- Rewrite and refactor format_args!() builtin macro. #100996
- Separate CountIsStar from CountIsParam in rustc_parse_format. #101000
- Replace format flags u32 by enums and bools. #106806
- Don't re-export private/unstable ArgumentV1 from
alloc
. #101569 - Change formatting items to lang items (part of Move format_args!() into AST (and expand it during AST lowering) #106745)
- Remove all public exports of the lang items, after the beta 1.69 bump: Remove public doc(hidden) core::fmt::rt::v1 #110616
- Remove the
V1
suffix from theArgumentV1
andFlagV1
types: More core::fmt::rt cleanup. #110766
- Move format args into the AST, so we can do more with it.
- Remove Clippy's dependence on format_args's implementation details, to allow refactoring format_args without having to fix clippy every time.
- Zulip discussion
- Update lint implementations Migrate format_args!() lints to to ast::FormatArgs rust-clippy#10233
- UselessFormat lint: Replace remaining usage of
FormatArgsExpn
rust-clippy#10561 - FormatArgs lints: Migrate
format_args.rs
torustc_ast::FormatArgs
rust-clippy#10484 - FormatImpl lints: Replace remaining usage of
FormatArgsExpn
rust-clippy#10561 - ManualAssert lint: Don't depend on FormatArgsExpn in ManualAssert. rust-clippy#10276
- ExplicitWrite lint: Replace remaining usage of
FormatArgsExpn
rust-clippy#10561 - Write lints: Migrate
write.rs
torustc_ast::FormatArgs
rust-clippy#10275 - expect_fun_call lint: Replace remaining usage of
FormatArgsExpn
rust-clippy#10561
- UselessFormat lint: Replace remaining usage of
- Synchronize the cilppy subtree: Update Clippy #110003
- Optimizations for format_args!() within the compiler (independent of std::fmt::Arguments)
- Inlining and flattening: format_args!() could 'inline' literal arguments #78356
- Update API docs to allow this for std::fmt::Arguments::as_str(): Allow fmt::Arguments::as_str() to return more Some(_). #106823
- Implementation as unstable option: Flatten/inline format_args!() and (string and int) literal arguments into format_args!() #106824
- Enable it by default: Enable flatten-format-args by default. #109999
- Accidentally introduce a bug: format_args!() inlining/flattening allows for longer lifetimes #110769
- Fix the bug: Limit lifetime of format_args!() with inlined args. #110770
- Inlining and flattening: format_args!() could 'inline' literal arguments #78356
- Reduce size of FormattingOptions
- New fmt::Arguments representation to reduce size and allow conversion from
&str
: New fmt::Arguments representation. #115129 - Reduce amount of (unnecessary) fmt code pulled in
- Make sure
write!(f, "literal")
is just as efficient asf.write_str("literal")
- Experiment with various ideas for changing format_args and fmt::Arguments.
- Implement and try out the closure idea. Experiment: fmt::Arguments as closure #101568
(Impact on compile time too big, because of all the extra code generation.) - Implement and try out the "encoded formatting instructions" idea.
- First partial attempt: [do not merge] fmt::Arguments experiment #84823
- ...
- Implement and try out the "two pointers" idea: Experiment: New format_args!() representation #137294
- Implement and try out the closure idea. Experiment: fmt::Arguments as closure #101568
- Implement the winning design
- ... (?)