Open
Description
Proc macros operate on tokens, including string/character/byte-string/byte literal tokens, which they can get from various sources.
- Source 1: Lexer.
This is the most reliable source, the token is passed to a macro precisely like it was written in source code.
"C"
will be passed as"C"
, but the same C in escaped form"\x43"
will be passed as"\x43"
.
Proc macros can observe the difference becauseToString
(the only way to get the literal contents in proc macro API) also prints the literal precisely. - Source 2: Proc macro API.
Literal::string(s: &str)
will make you a string literal containing datas
, approximately.
The precise token (returned byToString
) will contain:escape_debug(s)
for string literals (Literal::string
)escape_unicode(s)
for character literals (Literal::character
)escape_default(s)
for byte string literals (Literal::byte_string
)
- Source 3: Recovered from non-attribute AST
AST goes through pretty-printing first, then re-tokenized.
The precise token (returned byToString
) will contain:- precise
s
for raw AST strings escape_debug(s)
for non-raw AST stringsescape_default(s)
for AST characters, bytes and byte strings (both raw and non-raw)
- precise
- Source 4: Recovered from attribute AST
Just an ad-hoc recovery without pretty-printing.
The precise token (returned byToString
) will contain:- precise
s
for raw AST strings escape_default(s)
for non-raw AST strings, AST characters, bytes and byte strings (both raw and non-raw)
- precise
EDIT: Also doc comments go through escape_debug
when converted to #[doc = "content"]
tokens for proc macros.
It would be nice to
- Figure out what escaping we actually want (perhaps none?) and document the motivation behind the escaping choices.
- Get rid of the escaping differences between token sources, so that at least literals of the same kind are escaped identically.