Skip to content

Commit 70fbdb9

Browse files
committed
Merge pull request #4206 from paulstansifer/macro_tutorial_improvements
Macro tutorial improvements
2 parents ceca0e8 + 7c103f2 commit 70fbdb9

File tree

1 file changed

+200
-9
lines changed

1 file changed

+200
-9
lines changed

doc/tutorial-macros.md

Lines changed: 200 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,9 @@ match input_2 {
2525
# }
2626
~~~~
2727

28-
This code could become tiresome if repeated many times. However, there is no
29-
straightforward way to rewrite it without the repeated code, using functions
30-
alone. There is a solution, though: defining a macro to solve the problem. Macros are
28+
This code could become tiresome if repeated many times. However, no function
29+
can capture its functionality to make it possible to rewrite the repetition
30+
away. Rust's macro system, however, can eliminate the repetition. Macros are
3131
lightweight custom syntax extensions, themselves defined using the
3232
`macro_rules!` syntax extension. The following `early_return` macro captures
3333
the pattern in the above code:
@@ -65,7 +65,7 @@ macro. It appears on the left-hand side of the `=>` in a macro definition. It
6565
conforms to the following rules:
6666

6767
1. It must be surrounded by parentheses.
68-
2. `$` has special meaning.
68+
2. `$` has special meaning (described below).
6969
3. The `()`s, `[]`s, and `{}`s it contains must balance. For example, `([)` is
7070
forbidden.
7171

@@ -118,10 +118,11 @@ expression, `() => (let $x=$val)` is a macro that expands to a statement, and
118118
`() => (1,2,3)` is a macro that expands to a syntax errror).
119119

120120
Except for permissibility of `$name` (and `$(...)*`, discussed below), the
121-
right-hand side of a macro definition follows the same rules as ordinary
122-
Rust syntax. In particular, macro invocations (including invocations of the
123-
macro currently being defined) are permitted in expression, statement, and
124-
item locations.
121+
right-hand side of a macro definition is ordinary Rust syntax. In particular,
122+
macro invocations (including invocations of the macro currently being defined)
123+
are permitted in expression, statement, and item locations. However, nothing
124+
else about the code is examined or executed by the macro system; execution
125+
still has to wait until runtime.
125126

126127
## Interpolation location
127128

@@ -199,7 +200,196 @@ parsing `e`. Changing the invocation syntax to require a distinctive token in
199200
front can solve the problem. In the above example, `$(T $t:ty)* E $e:exp`
200201
solves the problem.
201202

202-
## A final note
203+
# Macro argument pattern matching
204+
205+
Now consider code like the following:
206+
207+
## Motivation
208+
209+
~~~~
210+
# enum t1 { good_1(t2, uint), bad_1 };
211+
# pub struct t2 { body: t3 }
212+
# enum t3 { good_2(uint), bad_2};
213+
# fn f(x: t1) -> uint {
214+
match x {
215+
good_1(g1, val) => {
216+
match g1.body {
217+
good_2(result) => {
218+
// complicated stuff goes here
219+
return result + val;
220+
},
221+
_ => fail ~"Didn't get good_2"
222+
}
223+
}
224+
_ => return 0 // default value
225+
}
226+
# }
227+
~~~~
228+
229+
All the complicated stuff is deeply indented, and the error-handling code is
230+
separated from matches that fail. We'd like to write a macro that performs
231+
a match, but with a syntax that suits the problem better. The following macro
232+
can solve the problem:
233+
234+
~~~~
235+
macro_rules! biased_match (
236+
// special case: `let (x) = ...` is illegal, so use `let x = ...` instead
237+
( ($e:expr) ~ ($p:pat) else $err:stmt ;
238+
binds $bind_res:ident
239+
) => (
240+
let $bind_res = match $e {
241+
$p => ( $bind_res ),
242+
_ => { $err }
243+
};
244+
);
245+
// more than one name; use a tuple
246+
( ($e:expr) ~ ($p:pat) else $err:stmt ;
247+
binds $( $bind_res:ident ),*
248+
) => (
249+
let ( $( $bind_res ),* ) = match $e {
250+
$p => ( $( $bind_res ),* ),
251+
_ => { $err }
252+
};
253+
)
254+
)
255+
256+
# enum t1 { good_1(t2, uint), bad_1 };
257+
# pub struct t2 { body: t3 }
258+
# enum t3 { good_2(uint), bad_2};
259+
# fn f(x: t1) -> uint {
260+
biased_match!((x) ~ (good_1(g1, val)) else { return 0 };
261+
binds g1, val )
262+
biased_match!((g1.body) ~ (good_2(result) )
263+
else { fail ~"Didn't get good_2" };
264+
binds result )
265+
// complicated stuff goes here
266+
return result + val;
267+
# }
268+
~~~~
269+
270+
This solves the indentation problem. But if we have a lot of chained matches
271+
like this, we might prefer to write a single macro invocation. The input
272+
pattern we want is clear:
273+
~~~~
274+
# macro_rules! b(
275+
( $( ($e:expr) ~ ($p:pat) else $err:stmt ; )*
276+
binds $( $bind_res:ident ),*
277+
)
278+
# => (0))
279+
~~~~
280+
281+
However, it's not possible to directly expand to nested match statements. But
282+
there is a solution.
283+
284+
## The recusive approach to macro writing
285+
286+
A macro may accept multiple different input grammars. The first one to
287+
successfully match the actual argument to a macro invocation is the one that
288+
"wins".
289+
290+
291+
In the case of the example above, we want to write a recursive macro to
292+
process the semicolon-terminated lines, one-by-one. So, we want the following
293+
input patterns:
294+
295+
~~~~
296+
# macro_rules! b(
297+
( binds $( $bind_res:ident ),* )
298+
# => (0))
299+
~~~~
300+
...and:
301+
302+
~~~~
303+
# macro_rules! b(
304+
( ($e :expr) ~ ($p :pat) else $err :stmt ;
305+
$( ($e_rest:expr) ~ ($p_rest:pat) else $err_rest:stmt ; )*
306+
binds $( $bind_res:ident ),*
307+
)
308+
# => (0))
309+
~~~~
310+
311+
The resulting macro looks like this. Note that the separation into
312+
`biased_match!` and `biased_match_rec!` occurs only because we have an outer
313+
piece of syntax (the `let`) which we only want to transcribe once.
314+
315+
~~~~
316+
317+
macro_rules! biased_match_rec (
318+
// Handle the first layer
319+
( ($e :expr) ~ ($p :pat) else $err :stmt ;
320+
$( ($e_rest:expr) ~ ($p_rest:pat) else $err_rest:stmt ; )*
321+
binds $( $bind_res:ident ),*
322+
) => (
323+
match $e {
324+
$p => {
325+
// Recursively handle the next layer
326+
biased_match_rec!($( ($e_rest) ~ ($p_rest) else $err_rest ; )*
327+
binds $( $bind_res ),*
328+
)
329+
}
330+
_ => { $err }
331+
}
332+
);
333+
( binds $( $bind_res:ident ),* ) => ( ($( $bind_res ),*) )
334+
)
335+
336+
// Wrap the whole thing in a `let`.
337+
macro_rules! biased_match (
338+
// special case: `let (x) = ...` is illegal, so use `let x = ...` instead
339+
( $( ($e:expr) ~ ($p:pat) else $err:stmt ; )*
340+
binds $bind_res:ident
341+
) => (
342+
let ( $( $bind_res ),* ) = biased_match_rec!(
343+
$( ($e) ~ ($p) else $err ; )*
344+
binds $bind_res
345+
);
346+
);
347+
// more than one name: use a tuple
348+
( $( ($e:expr) ~ ($p:pat) else $err:stmt ; )*
349+
binds $( $bind_res:ident ),*
350+
) => (
351+
let ( $( $bind_res ),* ) = biased_match_rec!(
352+
$( ($e) ~ ($p) else $err ; )*
353+
binds $( $bind_res ),*
354+
);
355+
)
356+
)
357+
358+
359+
# enum t1 { good_1(t2, uint), bad_1 };
360+
# pub struct t2 { body: t3 }
361+
# enum t3 { good_2(uint), bad_2};
362+
# fn f(x: t1) -> uint {
363+
biased_match!(
364+
(x) ~ (good_1(g1, val)) else { return 0 };
365+
(g1.body) ~ (good_2(result) ) else { fail ~"Didn't get good_2" };
366+
binds val, result )
367+
// complicated stuff goes here
368+
return result + val;
369+
# }
370+
~~~~
371+
372+
This technique is applicable in many cases where transcribing a result "all
373+
at once" is not possible. It resembles ordinary functional programming in some
374+
respects, but it is important to recognize the differences.
375+
376+
The first difference is important, but also easy to forget: the transcription
377+
(right-hand) side of a `macro_rules!` rule is literal syntax, which can only
378+
be executed at run-time. If a piece of transcription syntax does not itself
379+
appear inside another macro invocation, it will become part of the final
380+
program. If it is inside a macro invocation (for example, the recursive
381+
invocation of `biased_match_rec!`), it does have the opprotunity to affect
382+
transcription, but only through the process of attempted pattern matching.
383+
384+
The second difference is related: the evaluation order of macros feels
385+
"backwards" compared to ordinary programming. Given an invocation
386+
`m1!(m2!())`, the expander first expands `m1!`, giving it as input the literal
387+
syntax `m2!()`. If it transcribes its argument unchanged into an appropriate
388+
position (in particular, not as an argument to yet another macro invocation),
389+
the expander will then proceed to evaluate `m2!()` (along with any other macro
390+
invocations `m1!(m2!())` produced).
391+
392+
# A final note
203393

204394
Macros, as currently implemented, are not for the faint of heart. Even
205395
ordinary syntax errors can be more difficult to debug when they occur inside a
@@ -208,3 +398,4 @@ tricky. Invoking the `log_syntax!` macro can help elucidate intermediate
208398
states, invoking `trace_macros!(true)` will automatically print those
209399
intermediate states out, and passing the flag `--pretty expanded` as a
210400
command-line argument to the compiler will show the result of expansion.
401+

0 commit comments

Comments
 (0)