Closed
Description
The Unicode specification defines aliases for some of the general category names. For example the category "L" has alias "Letter".
The regexp package supports \p{L} but not \p{Letter}, because there is nothing in the Unicode tables that lets regexp know about Letter.
In order to support \p{Letter}, I propose to add a new, small table to unicode,
var CategoryAliases = map[string]string{
"Other": "C",
"Control": "Cc",
...,
"Letter": "L",
...
}
This would be auto-generated from the Unicode database like all our other tables. For Unicode 15, the table would have only 38 entries, listed below.
% grep '^gc' PropertyValueAliases.txt
gc ; C ; Other # Cc | Cf | Cn | Co | Cs
gc ; Cc ; Control ; cntrl
gc ; Cf ; Format
gc ; Cn ; Unassigned
gc ; Co ; Private_Use
gc ; Cs ; Surrogate
gc ; L ; Letter # Ll | Lm | Lo | Lt | Lu
gc ; LC ; Cased_Letter # Ll | Lt | Lu
gc ; Ll ; Lowercase_Letter
gc ; Lm ; Modifier_Letter
gc ; Lo ; Other_Letter
gc ; Lt ; Titlecase_Letter
gc ; Lu ; Uppercase_Letter
gc ; M ; Mark ; Combining_Mark # Mc | Me | Mn
gc ; Mc ; Spacing_Mark
gc ; Me ; Enclosing_Mark
gc ; Mn ; Nonspacing_Mark
gc ; N ; Number # Nd | Nl | No
gc ; Nd ; Decimal_Number ; digit
gc ; Nl ; Letter_Number
gc ; No ; Other_Number
gc ; P ; Punctuation ; punct # Pc | Pd | Pe | Pf | Pi | Po | Ps
gc ; Pc ; Connector_Punctuation
gc ; Pd ; Dash_Punctuation
gc ; Pe ; Close_Punctuation
gc ; Pf ; Final_Punctuation
gc ; Pi ; Initial_Punctuation
gc ; Po ; Other_Punctuation
gc ; Ps ; Open_Punctuation
gc ; S ; Symbol # Sc | Sk | Sm | So
gc ; Sc ; Currency_Symbol
gc ; Sk ; Modifier_Symbol
gc ; Sm ; Math_Symbol
gc ; So ; Other_Symbol
gc ; Z ; Separator # Zl | Zp | Zs
gc ; Zl ; Line_Separator
gc ; Zp ; Paragraph_Separator
gc ; Zs ; Space_Separator
%
Metadata
Metadata
Assignees
Type
Projects
Status
Accepted