Skip to content

Allow underscores in unicode escapes #43692

Closed
@behnam

Description

@behnam

Underscores are supported in all numeric types in Rust, and a recent clippy update actually warns on not using underscores for long numbers.

However, underscores are not supported for Unicode Escape literals. Therefore, it is not possible to write the first character of the 16th plane as \u{10_0000}, while \u{100000} is not easy to read and can be misread as \u{10000}.

At the moment, underscore results in an error like this:

error: invalid character in unicode escape: _
  --> unic/tests/basics_test.rs:31:23
   |
31 |         Age::of('\u{10_FFFF}'),
   |                       ^

meaning that there's no backward-compatibility issue and supporting underscores would be a compatible enhancement.

Unicode already has clear definition of Planes, numbered 0 to 16, which hint to write literals as \u{<plane>_<4-hex-digits>} sequences.

Optionally, we can opt-in to only allow underscore in a specific position, like the aforementioned format. But I think that would just make it too complicate for no apparent reason. Such a check could be a clippy rule, of course.

What do you think?

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-UnicodeArea: UnicodeC-feature-requestCategory: A feature request, i.e: not implemented / a PR.T-langRelevant to the language team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions