`Char::to_{lower,upper}case` should return `Option<&'static str>` instead of `char`

These two methods should not be stabilized as-is. They should be changed to return a variable number of code points (between one and three), per Unicode’s [`SpecialCasing.txt`](http://www.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt).

Such results could be represented as `&'static str` or `&'static [char]` slices of a static table in libunicode. The former avoid re-encoding to UTF-8 when accumulating results in a `String`. To avoid having an entry in that table for every one of the 1114111 code points, the return type could be an `Option`, where `None` means that the code point is unchanged by the mapping. (This is by large the common case.) Or it could be a new special-purpose type like `enum CaseMappingResult { Unchanged, MappedTo(&'static str) }`.

Since the `Char` methods become less convenient to use, there should be `str::to_{lower,upper}case() -> String` wrappers.

`SpecialCasing.txt` also defines some language-sensitive mappings for Turkish and Lithuanian, but I suggest not including them, for a few reasons:
- Using the system’s locale is a very bad idea. Programs behaving differently on different systems is a source of countless bugs, and the system’s locale may not even be that of the end users (e.g for server-side software.)
- Forcing users to specify a language is counter-productive since it might often end up being hard-coded to English or something. There should be a default.
- Users who _do_ care about language-specific tailoring may want to do more anyway. `SpecialCasing.txt` says:
  
  > Note that the preferred mechanism for defining tailored casing operations is the Unicode Common Locale Data Repository (CLDR).

Finally, there are conditional mappings that depend on the context of surrounding code points, but not on the language. They could be special cases in the `str` methods, but I don’t know if it’s worth the bother since there is currently only one such special case. (Greek capital sigma at the end of a word.)

More background on Unicode case mappings:

http://unicode.org/faq/casemap_charprop.html
http://www.unicode.org/reports/tr44/tr44-14.html#Casemapping

CC @huonw, @aturon


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Char::to_{lower,upper}case` should return `Option<&'static str>` instead of `char` #20333

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Char::to_{lower,upper}case should return Option<&'static str> instead of char #20333

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`Char::to_{lower,upper}case` should return `Option<&'static str>` instead of `char` #20333