Skip to content

Lint against non-NFC items? #120723

Open
Open
@workingjubilee

Description

@workingjubilee

Code

struct Mask();
const FAÇADE: Mask = Mask();
const FAÇADE: Mask = Mask();

Current output

error[E0428]: the name `FAÇADE` is defined multiple times
 --> src/lib.rs:3:1
  |
2 | const FAÇADE: Mask = Mask();
  | ---------------------------- previous definition of the value `FAÇADE` here
3 | const FAÇADE: Mask = Mask();
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `FAÇADE` redefined here
  |
  = note: `FAÇADE` must be defined only once in the value namespace of this module

Desired output

warning[EIEIO]: the name `FAÇADE` was normalized to `FAÇADE`
3 | const FAÇADE: Mask = Mask();
  |       ^^^^^^ `FAÇADE` defined here
  |
  = note: rustc applies Normalization Form C to identifiers

error[E0428]: the name `FAÇADE` is defined multiple times
 --> src/lib.rs:3:1
  |
2 | const FAÇADE: Mask = Mask();
  | ---------------------------- previous definition of the value `FAÇADE` here
3 | const FAÇADE: Mask = Mask();
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `FAÇADE` redefined here
  |
  = note: `FAÇADE` must be defined only once in the value namespace of this module

Rationale and extra context

Spinoff of #120697

Not the same as uncommon_codepoints! We NFC-normalize idents, as described in RFC #2457. In the (admittedly unlikely) case where someone actually includes an ident that is normalized to match another ident, this can result in a "you wrote (or more likely, mechanically emitted) something, then a nuance of that got silently ignored by the compiler, and now get compiler errors". Only a very, very unusual source file would want to even try to separately include both the NFD and NFC forms of an ident, and I think this would only happen due to machine-generated code or multiple splat-includes, so this is no harm done in reality. But "the byte strings for two idents can not match, yet still resolve to the same value" can still be surprising, especially to reason about in the machine-generation case where a programmer is likely to reason significantly about equating identifiers and strings, so we could probably emit a warning when we normalize an ident in a source file and it results in an actual difference, much like we emit warnings when we leave other data unused.

Other cases

No response

Rust Version

rustc 1.75.0 (82e1608df 2023-12-21)
binary: rustc
commit-hash: 82e1608dfa6e0b5569232559e3d385fea5a93112
commit-date: 2023-12-21
host: x86_64-unknown-linux-gnu
release: 1.75.0
LLVM version: 17.0.6

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-UnicodeArea: UnicodeA-diagnosticsArea: Messages for errors, warnings, and lintsA-lintsArea: Lints (warnings about flaws in source code) such as unused_mut.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions