Skip to content

support Unicode grapheme clusters #54

Closed
@tbu-

Description

@tbu-

The regex engine doesn't consider characters (graphemes) that consist of multiple code points correctly.

For example the letter 'ä' has two representations, that should both be matched by the regex ., howver only the latter is.

Bash                 | Rust       | Codepoints
echo $'\x61\xcc\x88' | "\u{e4}"   | U+00e4
echo $'\xc3\xa4'     | "a\u{308}" | U+0061 U+0308

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions