Skip to content

Commit 27b4d10

Browse files
committed
Explain what ifmt! is all about
1 parent 1f6afa8 commit 27b4d10

File tree

1 file changed

+301
-0
lines changed

1 file changed

+301
-0
lines changed

src/libstd/fmt/mod.rs

+301
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,307 @@
88
// option. This file may not be copied, modified, or distributed
99
// except according to those terms.
1010

11+
/**!
12+
13+
# The Formatting Module
14+
15+
This module contains the runtime support for the `ifmt!` syntax extension. This
16+
macro is implemented in the compiler to emit calls to this module in order to
17+
format arguments at runtime into strings and streams.
18+
19+
The functions contained in this module should not normally be used in everyday
20+
use cases of `ifmt!`. The assumptions made by these functions are unsafe for all
21+
inputs, and the compiler performs a large amount of validation on the arguments
22+
to `ifmt!` in order to ensure safety at runtime. While it is possible to call
23+
these functions directly, it is not recommended to do so in the general case.
24+
25+
## Usage
26+
27+
The `ifmt!` macro is intended to be familiar to those coming from C's
28+
printf/sprintf functions or Python's `str.format` function. In its current
29+
revision, the `ifmt!` macro returns a `~str` type which is the result of the
30+
formatting. In the future it will also be able to pass in a stream to format
31+
arguments directly while performing minimal allocations.
32+
33+
Some examples of the `ifmt!` extension are:
34+
35+
~~~{.rust}
36+
ifmt!("Hello") // => ~"Hello"
37+
ifmt!("Hello, {:s}!", "world") // => ~"Hello, world!"
38+
ifmt!("The number is {:d}", 1) // => ~"The number is 1"
39+
ifmt!("{}", ~[3, 4]) // => ~"~[3, 4]"
40+
ifmt!("{value}", value=4) // => ~"4"
41+
ifmt!("{} {}", 1, 2) // => ~"1 2"
42+
~~~
43+
44+
From these, you can see that the first argument is a format string. It is
45+
required by the compiler for this to be a string literal; it cannot be a
46+
variable passed in (in order to perform validity checking). The compiler will
47+
then parse the format string and determine if the list of arguments provided is
48+
suitable to pass to this format string.
49+
50+
### Positional parameters
51+
52+
Each formatting argument is allowed to specify which value argument it's
53+
referencing, and if omitted it is assumed to be "the next argument". For
54+
example, the format string `{} {} {}` would take three parameters, and they
55+
would be formatted in the same order as they're given. The format string
56+
`{2} {1} {0}`, however, would format arguments in reverse order.
57+
58+
A format string is required to use all of its arguments, otherwise it is a
59+
compile-time error. You may refer to the same argument more than once in the
60+
format string, although it must always be referred to with the same type.
61+
62+
### Named parameters
63+
64+
Rust itself does not have a Python-like equivalent of named parameters to a
65+
function, but the `ifmt!` macro is a syntax extension which allows it to
66+
leverage named parameters. Named parameters are listed at the end of the
67+
argument list and have the syntax:
68+
69+
~~~
70+
identifier '=' expression
71+
~~~
72+
73+
It is illegal to put positional parameters (those without names) after arguments
74+
which have names. Like positional parameters, it is illegal to provided named
75+
parameters that are unused by the format string.
76+
77+
### Argument types
78+
79+
Each argument's type is dictated by the format string. It is a requirement that
80+
every argument is only ever referred to by one type. When specifying the format
81+
of an argument, however, a string like `{}` indicates no type. This is allowed,
82+
and if all references to one argument do not provide a type, then the format `?`
83+
is used (the type's rust-representation is printed). For example, this is an
84+
invalid format string:
85+
86+
~~~
87+
{0:d} {0:s}
88+
~~~
89+
90+
Because the first argument is both referred to as an integer as well as a
91+
string.
92+
93+
Because formatting is done via traits, there is no requirement that the
94+
`d` format actually takes an `int`, but rather it simply requires a type which
95+
ascribes to the `Signed` formatting trait. There are various parameters which do
96+
require a particular type, however. Namely if the sytnax `{:.*s}` is used, then
97+
the number of characters to print from the string precedes the actual string and
98+
must have the type `uint`. Although a `uint` can be printed with `{:u}`, it is
99+
illegal to reference an argument as such. For example, this is another invalid
100+
format string:
101+
102+
~~~
103+
{:.*s} {0:u}
104+
~~~
105+
106+
### Formatting traits
107+
108+
When requesting that an argument be formatted with a particular type, you are
109+
actually requesting that an argument ascribes to a particular trait. This allows
110+
multiple actual types to be formatted via `{:d}` (like `i8` as well as `int`).
111+
The current mapping of types to traits is:
112+
113+
* `?` => Poly
114+
* `d` => Signed
115+
* `i` => Signed
116+
* `u` => Unsigned
117+
* `b` => Bool
118+
* `c` => Char
119+
* `o` => Octal
120+
* `x` => LowerHex
121+
* `X` => UpperHex
122+
* `s` => String
123+
* `p` => Pointer
124+
* `t` => Binary
125+
126+
What this means is that any type of argument which implements the
127+
`std::fmt::Binary` trait can then be formatted with `{:t}`. Implementations are
128+
provided for these traits for a number of primitive types by the standard
129+
library as well. Again, the default formatting type (if no other is specified)
130+
is `?` which is defined for all types by default.
131+
132+
When implementing a format trait for your own time, you will have to implement a
133+
method of the signature:
134+
135+
~~~
136+
fn fmt(value: &T, f: &mut std::fmt::Formatter);
137+
~~~
138+
139+
Your type will be passed by-reference in `value`, and then the function should
140+
emit output into the `f.buf` stream. It is up to each format trait
141+
implementation to correctly adhere to the requested formatting parameters. The
142+
values of these parameters will be listed in the fields of the `Formatter`
143+
struct. In order to help with this, the `Formatter` struct also provides some
144+
helper methods.
145+
146+
## Internationalization
147+
148+
The formatting syntax supported by the `ifmt!` extension supports
149+
internationalization by providing "methods" which execute various differnet
150+
outputs depending on the input. The syntax and methods provided are similar to
151+
other internationalization systems, so again nothing should seem alien.
152+
Currently two methods are supported by this extension: "select" and "plural".
153+
154+
Each method will execute one of a number of clauses, and then the value of the
155+
clause will become what's the result of the argument's format. Inside of the
156+
cases, nested argument strings may be provided, but all formatting arguments
157+
must not be done through implicit positional means. All arguments inside of each
158+
case of a method must be explicitly selected by their name or their integer
159+
position.
160+
161+
Furthermore, whenever a case is running, the special character `#` can be used
162+
to reference the string value of the argument which was selected upon. As an
163+
example:
164+
165+
~~~
166+
ifmt!("{0, select, other{#}}", "hello") // => ~"hello"
167+
~~~
168+
169+
This example is the equivalent of `{0:s}` essentially.
170+
171+
### Select
172+
173+
The select method is a switch over a `&str` parameter, and the parameter *must*
174+
be of the type `&str`. An example of the syntax is:
175+
176+
~~~
177+
{0, select, male{...} female{...} other{...}}
178+
~~~
179+
180+
Breaking this down, the `0`-th argument is selected upon with the `select`
181+
method, and then a number of cases follow. Each case is preceded by an
182+
identifier which is the match-clause to execute the given arm. In this case,
183+
there are two explicit cases, `male` and `female`. The case will be executed if
184+
the string argument provided is an exact match to the case selected.
185+
186+
The `other` case is also a required case for all `select` methods. This arm will
187+
be executed if none of the other arms matched the word being selected over.
188+
189+
### Plural
190+
191+
The plural method is a switch statement over a `uint` parameter, and the
192+
parameter *must* be a `uint`. A plural method in its full glory can be specified
193+
as:
194+
195+
~~~
196+
{0, plural, offset=1 =1{...} two{...} many{...} other{...}}
197+
~~~
198+
199+
To break this down, the first `0` indicates that this method is selecting over
200+
the value of the first positional parameter to the format string. Next, the
201+
`plural` method is being executed. An optionally-supplied `offset` is then given
202+
which indicates a number to subtract from argument `0` when matching. This is
203+
then followed by a list of cases.
204+
205+
Each case is allowed to supply a specific value to match upon with the syntax
206+
`=N`. This case is executed if the value at argument `0` matches N exactly,
207+
without taking the offset into account. A case may also be specified by one of
208+
five keywords: `zero`, `one`, `two`, `few`, and `many`. These cases are matched
209+
on after argument `0` has the offset taken into account. Currently the
210+
definitions of `many` and `few` are hardcoded, but they are in theory defined by
211+
the current locale.
212+
213+
Finally, all `plural` methods must have an `other` case supplied which will be
214+
executed if none of the other cases match.
215+
216+
## Syntax
217+
218+
The syntax for the formatting language used is drawn from other languages, so it
219+
should not be too alien. Arguments are formatted with python-like syntax,
220+
meaning that arguments are surrounded by `{}` instead of the C-like `%`. The
221+
actual grammar for the formatting syntax is:
222+
223+
~~~
224+
format_string := <text> [ format <text> ] *
225+
format := '{' [ argument ] [ ':' format_spec ] [ ',' function_spec ] '}'
226+
argument := integer | identifier
227+
228+
format_spec := [[fill]align][sign]['#'][0][width]['.' precision][type]
229+
fill := character
230+
align := '<' | '>'
231+
sign := '+' | '-'
232+
width := count
233+
precision := count | '*'
234+
type := identifier | ''
235+
count := parameter | integer
236+
parameter := integer '$'
237+
238+
function_spec := plural | select
239+
select := 'select' ',' ( identifier arm ) *
240+
plural := 'plural' ',' [ 'offset:' integer ] ( selector arm ) *
241+
selector := '=' integer | keyword
242+
keyword := 'zero' | 'one' | 'two' | 'few' | 'many' | 'other'
243+
arm := '{' format_string '}'
244+
~~~
245+
246+
## Formatting Parameters
247+
248+
Each argument being formatted can be transformed by a number of formatting
249+
parameters (corresponding to `format_spec` in the syntax above). These
250+
parameters affect the string representation of what's being formatted. This
251+
syntax draws heavily from Python's, so it may seem a bit familiar.
252+
253+
### Fill/Alignment
254+
255+
The fill character is provided normally in conjunction with the `width`
256+
parameter. This indicates that if the value being formatted is smaller than
257+
`width` some extra characters will be printed around it. The extra characters
258+
are specified by `fill`, and the alignment can be one of two options:
259+
260+
* `<` - the argument is left-aligned in `width` columns
261+
* `>` - the argument is right-aligned in `width` columns
262+
263+
### Sign/#/0
264+
265+
These can all be interpreted as flags for a particular formatter.
266+
267+
* '+' - This is intended for numeric types and indicates that the sign should
268+
always be printed. Positive signs are never printed by default, and the
269+
negative sign is only printed by default for the `Signed` trait. This
270+
flag indicates that the correct sign (+ or -) should always be printed.
271+
* '-' - Currently not used
272+
* '#' - This flag is indicates that the "alternate" form of printing should be
273+
used. By default, this only applies to the integer formatting traits and
274+
performs like:
275+
* `x` - precedes the argument with a "0x"
276+
* `X` - precedes the argument with a "0x"
277+
* `t` - precedes the argument with a "0b"
278+
* `o` - precedes the argument with a "0o"
279+
* '0' - This is used to indicate for integer formats that the padding should
280+
both be done with a `0` character as well as be sign-aware. A format
281+
like `{:08d}` would yield `00000001` for the integer `1`, while the same
282+
format would yield `-0000001` for the integer `-1`. Notice that the
283+
negative version has one fewer zero than the positive version.
284+
285+
### Width
286+
287+
This is a parameter for the "minimum width" that the format should take up. If
288+
the value's string does not fill up this many characters, then the padding
289+
specified by fill/alignment will be used to take up the required space.
290+
291+
The default fill/alignment for non-numerics is a space and left-aligned. The
292+
defaults for numeric formatters is also a space but with right-alignment. If the
293+
'0' flag is specified for numerics, then the implicit fill character is '0'.
294+
295+
The value for the width can also be provided as a `uint` in the list of
296+
parameters by using the `2$` syntax indicating that the second argument is a
297+
`uint` specifying the width.
298+
299+
### Precision
300+
301+
For non-numeric types, this can be considered a "maximum width". If the
302+
resulting string is longer than this width, then it is truncated down to this
303+
many characters and only those are emitted.
304+
305+
For integral types, this has no meaning currently.
306+
307+
For floating-point types, this indicates how many digits after the decimal point
308+
should be printed.
309+
310+
*/
311+
11312
use prelude::*;
12313

13314
use cast;

0 commit comments

Comments
 (0)