Skip to content

language-rust lexer rejects Unicode symbols that rustc accepts #3

Closed
@RyanGlScott

Description

@RyanGlScott

Per the Rust Reference, Rust permits any identifier that meets the specification in Unicode Standard Annex #31 for Unicode version 15.0. For example, rustc accepts the following program:

// test.rs
fn main() {
    let 𝑂_𝑂 = ();
    𝑂_𝑂
}

language-rust, on the other hand, fails to lex this program:

-- Main.hs
module Main (main) where

import Language.Rust.Data.InputStream
import Language.Rust.Parser
import Language.Rust.Syntax

main :: IO ()
main = do
  is <- readInputStream "test.rs"
  print $ parse @(SourceFile Span) is
$ runghc Main.hs
Left (parse failure at 3:9 (lexical error))

My guess is that this part of the lexer needs to be updated to support Unicode 15.0.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions