Skip to content

RFC: Rename char to make it clearer that it is a unicode codepoint/scalar value #12730

Closed
@huonw

Description

@huonw

Our char type is a Unicode scalar value (codepoint excluding the surrogate range), which can lead to confusion because (a) it differs to other languages and (b) it doesn't directly encourage good unicode hygiene ("Oh, a character? that's what the user sees").

Possible names include codepoint, ucs4, or rune like Go.

Other languages names for a unicode scalar value/what char means:

  • Haskell: Char is a codepoint (although surrogates are allowed)
  • D: dchar (char is a "UTF-8 code unit" and wchar is a "UTF-16 code-unit" (i.e. aliases for u8 and u16?): http://dlang.org/type.html)
  • Go: rune
  • C#/Java/Scala etc.: char is a 16-bit integer (i.e. UTF-16 code unit)
  • C/C++: char is (normally) a byte, i.e. a UTF-8 code unit.

(Other languages like Python don't have a type for a single character and don't have a type called char, and so aren't meaningful for this comparison.)

(This issue brought to you by reddit.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions