Closed
Description
Our char
type is a Unicode scalar value (codepoint excluding the surrogate range), which can lead to confusion because (a) it differs to other languages and (b) it doesn't directly encourage good unicode hygiene ("Oh, a character? that's what the user sees").
Possible names include codepoint
, ucs4
, or rune
like Go.
Other languages names for a unicode scalar value/what char
means:
- Haskell:
Char
is a codepoint (although surrogates are allowed) - D:
dchar
(char
is a "UTF-8 code unit" andwchar
is a "UTF-16 code-unit" (i.e. aliases foru8
andu16
?): http://dlang.org/type.html) - Go:
rune
- C#/Java/Scala etc.:
char
is a 16-bit integer (i.e. UTF-16 code unit) - C/C++:
char
is (normally) a byte, i.e. a UTF-8 code unit.
(Other languages like Python don't have a type for a single character and don't have a type called char
, and so aren't meaningful for this comparison.)
(This issue brought to you by reddit.)