|
8 | 8 | // option. This file may not be copied, modified, or distributed
|
9 | 9 | // except according to those terms.
|
10 | 10 |
|
| 11 | +/**! |
| 12 | +
|
| 13 | +# The Formatting Module |
| 14 | +
|
| 15 | +This module contains the runtime support for the `ifmt!` syntax extension. This |
| 16 | +macro is implemented in the compiler to emit calls to this module in order to |
| 17 | +format arguments at runtime into strings and streams. |
| 18 | +
|
| 19 | +The functions contained in this module should not normally be used in everyday |
| 20 | +use cases of `ifmt!`. The assumptions made by these functions are unsafe for all |
| 21 | +inputs, and the compiler performs a large amount of validation on the arguments |
| 22 | +to `ifmt!` in order to ensure safety at runtime. While it is possible to call |
| 23 | +these functions directly, it is not recommended to do so in the general case. |
| 24 | +
|
| 25 | +## Usage |
| 26 | +
|
| 27 | +The `ifmt!` macro is intended to be familiar to those coming from C's |
| 28 | +printf/sprintf functions or Python's `str.format` function. In its current |
| 29 | +revision, the `ifmt!` macro returns a `~str` type which is the result of the |
| 30 | +formatting. In the future it will also be able to pass in a stream to format |
| 31 | +arguments directly while performing minimal allocations. |
| 32 | +
|
| 33 | +Some examples of the `ifmt!` extension are: |
| 34 | +
|
| 35 | +~~~{.rust} |
| 36 | +ifmt!("Hello") // => ~"Hello" |
| 37 | +ifmt!("Hello, {:s}!", "world") // => ~"Hello, world!" |
| 38 | +ifmt!("The number is {:d}", 1) // => ~"The number is 1" |
| 39 | +ifmt!("{}", ~[3, 4]) // => ~"~[3, 4]" |
| 40 | +ifmt!("{value}", value=4) // => ~"4" |
| 41 | +ifmt!("{} {}", 1, 2) // => ~"1 2" |
| 42 | +~~~ |
| 43 | +
|
| 44 | +From these, you can see that the first argument is a format string. It is |
| 45 | +required by the compiler for this to be a string literal; it cannot be a |
| 46 | +variable passed in (in order to perform validity checking). The compiler will |
| 47 | +then parse the format string and determine if the list of arguments provided is |
| 48 | +suitable to pass to this format string. |
| 49 | +
|
| 50 | +### Positional parameters |
| 51 | +
|
| 52 | +Each formatting argument is allowed to specify which value argument it's |
| 53 | +referencing, and if omitted it is assumed to be "the next argument". For |
| 54 | +example, the format string `{} {} {}` would take three parameters, and they |
| 55 | +would be formatted in the same order as they're given. The format string |
| 56 | +`{2} {1} {0}`, however, would format arguments in reverse order. |
| 57 | +
|
| 58 | +A format string is required to use all of its arguments, otherwise it is a |
| 59 | +compile-time error. You may refer to the same argument more than once in the |
| 60 | +format string, although it must always be referred to with the same type. |
| 61 | +
|
| 62 | +### Named parameters |
| 63 | +
|
| 64 | +Rust itself does not have a Python-like equivalent of named parameters to a |
| 65 | +function, but the `ifmt!` macro is a syntax extension which allows it to |
| 66 | +leverage named parameters. Named parameters are listed at the end of the |
| 67 | +argument list and have the syntax: |
| 68 | +
|
| 69 | +~~~ |
| 70 | +identifier '=' expression |
| 71 | +~~~ |
| 72 | +
|
| 73 | +It is illegal to put positional parameters (those without names) after arguments |
| 74 | +which have names. Like positional parameters, it is illegal to provided named |
| 75 | +parameters that are unused by the format string. |
| 76 | +
|
| 77 | +### Argument types |
| 78 | +
|
| 79 | +Each argument's type is dictated by the format string. It is a requirement that |
| 80 | +every argument is only ever referred to by one type. When specifying the format |
| 81 | +of an argument, however, a string like `{}` indicates no type. This is allowed, |
| 82 | +and if all references to one argument do not provide a type, then the format `?` |
| 83 | +is used (the type's rust-representation is printed). For example, this is an |
| 84 | +invalid format string: |
| 85 | +
|
| 86 | +~~~ |
| 87 | +{0:d} {0:s} |
| 88 | +~~~ |
| 89 | +
|
| 90 | +Because the first argument is both referred to as an integer as well as a |
| 91 | +string. |
| 92 | +
|
| 93 | +Because formatting is done via traits, there is no requirement that the |
| 94 | +`d` format actually takes an `int`, but rather it simply requires a type which |
| 95 | +ascribes to the `Signed` formatting trait. There are various parameters which do |
| 96 | +require a particular type, however. Namely if the sytnax `{:.*s}` is used, then |
| 97 | +the number of characters to print from the string precedes the actual string and |
| 98 | +must have the type `uint`. Although a `uint` can be printed with `{:u}`, it is |
| 99 | +illegal to reference an argument as such. For example, this is another invalid |
| 100 | +format string: |
| 101 | +
|
| 102 | +~~~ |
| 103 | +{:.*s} {0:u} |
| 104 | +~~~ |
| 105 | +
|
| 106 | +### Formatting traits |
| 107 | +
|
| 108 | +When requesting that an argument be formatted with a particular type, you are |
| 109 | +actually requesting that an argument ascribes to a particular trait. This allows |
| 110 | +multiple actual types to be formatted via `{:d}` (like `i8` as well as `int`). |
| 111 | +The current mapping of types to traits is: |
| 112 | +
|
| 113 | +* `?` => Poly |
| 114 | +* `d` => Signed |
| 115 | +* `i` => Signed |
| 116 | +* `u` => Unsigned |
| 117 | +* `b` => Bool |
| 118 | +* `c` => Char |
| 119 | +* `o` => Octal |
| 120 | +* `x` => LowerHex |
| 121 | +* `X` => UpperHex |
| 122 | +* `s` => String |
| 123 | +* `p` => Pointer |
| 124 | +* `t` => Binary |
| 125 | +
|
| 126 | +What this means is that any type of argument which implements the |
| 127 | +`std::fmt::Binary` trait can then be formatted with `{:t}`. Implementations are |
| 128 | +provided for these traits for a number of primitive types by the standard |
| 129 | +library as well. Again, the default formatting type (if no other is specified) |
| 130 | +is `?` which is defined for all types by default. |
| 131 | +
|
| 132 | +When implementing a format trait for your own time, you will have to implement a |
| 133 | +method of the signature: |
| 134 | +
|
| 135 | +~~~ |
| 136 | +fn fmt(value: &T, f: &mut std::fmt::Formatter); |
| 137 | +~~~ |
| 138 | +
|
| 139 | +Your type will be passed by-reference in `value`, and then the function should |
| 140 | +emit output into the `f.buf` stream. It is up to each format trait |
| 141 | +implementation to correctly adhere to the requested formatting parameters. The |
| 142 | +values of these parameters will be listed in the fields of the `Formatter` |
| 143 | +struct. In order to help with this, the `Formatter` struct also provides some |
| 144 | +helper methods. |
| 145 | +
|
| 146 | +## Internationalization |
| 147 | +
|
| 148 | +The formatting syntax supported by the `ifmt!` extension supports |
| 149 | +internationalization by providing "methods" which execute various differnet |
| 150 | +outputs depending on the input. The syntax and methods provided are similar to |
| 151 | +other internationalization systems, so again nothing should seem alien. |
| 152 | +Currently two methods are supported by this extension: "select" and "plural". |
| 153 | +
|
| 154 | +Each method will execute one of a number of clauses, and then the value of the |
| 155 | +clause will become what's the result of the argument's format. Inside of the |
| 156 | +cases, nested argument strings may be provided, but all formatting arguments |
| 157 | +must not be done through implicit positional means. All arguments inside of each |
| 158 | +case of a method must be explicitly selected by their name or their integer |
| 159 | +position. |
| 160 | +
|
| 161 | +Furthermore, whenever a case is running, the special character `#` can be used |
| 162 | +to reference the string value of the argument which was selected upon. As an |
| 163 | +example: |
| 164 | +
|
| 165 | +~~~ |
| 166 | +ifmt!("{0, select, other{#}}", "hello") // => ~"hello" |
| 167 | +~~~ |
| 168 | +
|
| 169 | +This example is the equivalent of `{0:s}` essentially. |
| 170 | +
|
| 171 | +### Select |
| 172 | +
|
| 173 | +The select method is a switch over a `&str` parameter, and the parameter *must* |
| 174 | +be of the type `&str`. An example of the syntax is: |
| 175 | +
|
| 176 | +~~~ |
| 177 | +{0, select, male{...} female{...} other{...}} |
| 178 | +~~~ |
| 179 | +
|
| 180 | +Breaking this down, the `0`-th argument is selected upon with the `select` |
| 181 | +method, and then a number of cases follow. Each case is preceded by an |
| 182 | +identifier which is the match-clause to execute the given arm. In this case, |
| 183 | +there are two explicit cases, `male` and `female`. The case will be executed if |
| 184 | +the string argument provided is an exact match to the case selected. |
| 185 | +
|
| 186 | +The `other` case is also a required case for all `select` methods. This arm will |
| 187 | +be executed if none of the other arms matched the word being selected over. |
| 188 | +
|
| 189 | +### Plural |
| 190 | +
|
| 191 | +The plural method is a switch statement over a `uint` parameter, and the |
| 192 | +parameter *must* be a `uint`. A plural method in its full glory can be specified |
| 193 | +as: |
| 194 | +
|
| 195 | +~~~ |
| 196 | +{0, plural, offset=1 =1{...} two{...} many{...} other{...}} |
| 197 | +~~~ |
| 198 | +
|
| 199 | +To break this down, the first `0` indicates that this method is selecting over |
| 200 | +the value of the first positional parameter to the format string. Next, the |
| 201 | +`plural` method is being executed. An optionally-supplied `offset` is then given |
| 202 | +which indicates a number to subtract from argument `0` when matching. This is |
| 203 | +then followed by a list of cases. |
| 204 | +
|
| 205 | +Each case is allowed to supply a specific value to match upon with the syntax |
| 206 | +`=N`. This case is executed if the value at argument `0` matches N exactly, |
| 207 | +without taking the offset into account. A case may also be specified by one of |
| 208 | +five keywords: `zero`, `one`, `two`, `few`, and `many`. These cases are matched |
| 209 | +on after argument `0` has the offset taken into account. Currently the |
| 210 | +definitions of `many` and `few` are hardcoded, but they are in theory defined by |
| 211 | +the current locale. |
| 212 | +
|
| 213 | +Finally, all `plural` methods must have an `other` case supplied which will be |
| 214 | +executed if none of the other cases match. |
| 215 | +
|
| 216 | +## Syntax |
| 217 | +
|
| 218 | +The syntax for the formatting language used is drawn from other languages, so it |
| 219 | +should not be too alien. Arguments are formatted with python-like syntax, |
| 220 | +meaning that arguments are surrounded by `{}` instead of the C-like `%`. The |
| 221 | +actual grammar for the formatting syntax is: |
| 222 | +
|
| 223 | +~~~ |
| 224 | +format_string := <text> [ format <text> ] * |
| 225 | +format := '{' [ argument ] [ ':' format_spec ] [ ',' function_spec ] '}' |
| 226 | +argument := integer | identifier |
| 227 | +
|
| 228 | +format_spec := [[fill]align][sign]['#'][0][width]['.' precision][type] |
| 229 | +fill := character |
| 230 | +align := '<' | '>' |
| 231 | +sign := '+' | '-' |
| 232 | +width := count |
| 233 | +precision := count | '*' |
| 234 | +type := identifier | '' |
| 235 | +count := parameter | integer |
| 236 | +parameter := integer '$' |
| 237 | +
|
| 238 | +function_spec := plural | select |
| 239 | +select := 'select' ',' ( identifier arm ) * |
| 240 | +plural := 'plural' ',' [ 'offset:' integer ] ( selector arm ) * |
| 241 | +selector := '=' integer | keyword |
| 242 | +keyword := 'zero' | 'one' | 'two' | 'few' | 'many' | 'other' |
| 243 | +arm := '{' format_string '}' |
| 244 | +~~~ |
| 245 | +
|
| 246 | +## Formatting Parameters |
| 247 | +
|
| 248 | +Each argument being formatted can be transformed by a number of formatting |
| 249 | +parameters (corresponding to `format_spec` in the syntax above). These |
| 250 | +parameters affect the string representation of what's being formatted. This |
| 251 | +syntax draws heavily from Python's, so it may seem a bit familiar. |
| 252 | +
|
| 253 | +### Fill/Alignment |
| 254 | +
|
| 255 | +The fill character is provided normally in conjunction with the `width` |
| 256 | +parameter. This indicates that if the value being formatted is smaller than |
| 257 | +`width` some extra characters will be printed around it. The extra characters |
| 258 | +are specified by `fill`, and the alignment can be one of two options: |
| 259 | +
|
| 260 | +* `<` - the argument is left-aligned in `width` columns |
| 261 | +* `>` - the argument is right-aligned in `width` columns |
| 262 | +
|
| 263 | +### Sign/#/0 |
| 264 | +
|
| 265 | +These can all be interpreted as flags for a particular formatter. |
| 266 | +
|
| 267 | +* '+' - This is intended for numeric types and indicates that the sign should |
| 268 | + always be printed. Positive signs are never printed by default, and the |
| 269 | + negative sign is only printed by default for the `Signed` trait. This |
| 270 | + flag indicates that the correct sign (+ or -) should always be printed. |
| 271 | +* '-' - Currently not used |
| 272 | +* '#' - This flag is indicates that the "alternate" form of printing should be |
| 273 | + used. By default, this only applies to the integer formatting traits and |
| 274 | + performs like: |
| 275 | + * `x` - precedes the argument with a "0x" |
| 276 | + * `X` - precedes the argument with a "0x" |
| 277 | + * `t` - precedes the argument with a "0b" |
| 278 | + * `o` - precedes the argument with a "0o" |
| 279 | +* '0' - This is used to indicate for integer formats that the padding should |
| 280 | + both be done with a `0` character as well as be sign-aware. A format |
| 281 | + like `{:08d}` would yield `00000001` for the integer `1`, while the same |
| 282 | + format would yield `-0000001` for the integer `-1`. Notice that the |
| 283 | + negative version has one fewer zero than the positive version. |
| 284 | +
|
| 285 | +### Width |
| 286 | +
|
| 287 | +This is a parameter for the "minimum width" that the format should take up. If |
| 288 | +the value's string does not fill up this many characters, then the padding |
| 289 | +specified by fill/alignment will be used to take up the required space. |
| 290 | +
|
| 291 | +The default fill/alignment for non-numerics is a space and left-aligned. The |
| 292 | +defaults for numeric formatters is also a space but with right-alignment. If the |
| 293 | +'0' flag is specified for numerics, then the implicit fill character is '0'. |
| 294 | +
|
| 295 | +The value for the width can also be provided as a `uint` in the list of |
| 296 | +parameters by using the `2$` syntax indicating that the second argument is a |
| 297 | +`uint` specifying the width. |
| 298 | +
|
| 299 | +### Precision |
| 300 | +
|
| 301 | +For non-numeric types, this can be considered a "maximum width". If the |
| 302 | +resulting string is longer than this width, then it is truncated down to this |
| 303 | +many characters and only those are emitted. |
| 304 | +
|
| 305 | +For integral types, this has no meaning currently. |
| 306 | +
|
| 307 | +For floating-point types, this indicates how many digits after the decimal point |
| 308 | +should be printed. |
| 309 | +
|
| 310 | +*/ |
| 311 | + |
11 | 312 | use prelude::*;
|
12 | 313 |
|
13 | 314 | use cast;
|
|
0 commit comments