Skip to content

Commit 6c5b6e4

Browse files
committed
ext/mbstring: update to Unicode 15
Updates UCD to Unicode 15.1 (released 2023 Sept). The upcoming Unicode 16 version will be released roughly on 2024 Sept. Previously: 0fdffc1, php#7502 UCD 15.1 `DerivedNormalizationProps` contains multiple properties in the same line, which breaks the parser. This also updates the `ucgendat.php` script to allow 2 or three fields in each line, and to look for the `Cased` and `Case_Ignorable` properties in either of the fields to mimic the previous behavior.
1 parent f65918d commit 6c5b6e4

File tree

5 files changed

+1192
-1126
lines changed

5 files changed

+1192
-1126
lines changed

NEWS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,7 @@ PHP NEWS
123123
- MBString:
124124
. Added mb_trim, mb_ltrim and mb_rtrim. (Yuya Hamada)
125125
. Added mb_ucfirst and mb_lcfirst. (Yuya Hamada)
126+
. Updated Unicode data tables to Unicode 15.1. (Ayesh Karunaratne)
126127

127128
- MySQLnd:
128129
. Fixed bug GH-13440 (PDO quote bottleneck). (nielsdos)

UPGRADING

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -684,6 +684,9 @@ PHP 8.4 UPGRADE NOTES
684684
$domain name is empty or too long, and if $variant is not
685685
INTL_IDNA_VARIANT_UTS46.
686686

687+
- MBString:
688+
. Unicode data tables have been updated to Unicode 15.1.
689+
687690
- OpenSSL:
688691
. The OpenSSL extension now requires at least OpenSSL 1.1.1.
689692

ext/mbstring/libmbfl/mbfl/eaw_table.h

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -58,14 +58,13 @@ static const struct {
5858
{ 0x2e80, 0x2e99 },
5959
{ 0x2e9b, 0x2ef3 },
6060
{ 0x2f00, 0x2fd5 },
61-
{ 0x2ff0, 0x2ffb },
62-
{ 0x3000, 0x303e },
61+
{ 0x2ff0, 0x303e },
6362
{ 0x3041, 0x3096 },
6463
{ 0x3099, 0x30ff },
6564
{ 0x3105, 0x312f },
6665
{ 0x3131, 0x318e },
6766
{ 0x3190, 0x31e3 },
68-
{ 0x31f0, 0x321e },
67+
{ 0x31ef, 0x321e },
6968
{ 0x3220, 0x3247 },
7069
{ 0x3250, 0x4dbf },
7170
{ 0x4e00, 0xa48c },
@@ -88,7 +87,9 @@ static const struct {
8887
{ 0x1aff5, 0x1affb },
8988
{ 0x1affd, 0x1affe },
9089
{ 0x1b000, 0x1b122 },
90+
{ 0x1b132, 0x1b132 },
9191
{ 0x1b150, 0x1b152 },
92+
{ 0x1b155, 0x1b155 },
9293
{ 0x1b164, 0x1b167 },
9394
{ 0x1b170, 0x1b2fb },
9495
{ 0x1f004, 0x1f004 },
@@ -122,23 +123,21 @@ static const struct {
122123
{ 0x1f6cc, 0x1f6cc },
123124
{ 0x1f6d0, 0x1f6d2 },
124125
{ 0x1f6d5, 0x1f6d7 },
125-
{ 0x1f6dd, 0x1f6df },
126+
{ 0x1f6dc, 0x1f6df },
126127
{ 0x1f6eb, 0x1f6ec },
127128
{ 0x1f6f4, 0x1f6fc },
128129
{ 0x1f7e0, 0x1f7eb },
129130
{ 0x1f7f0, 0x1f7f0 },
130131
{ 0x1f90c, 0x1f93a },
131132
{ 0x1f93c, 0x1f945 },
132133
{ 0x1f947, 0x1f9ff },
133-
{ 0x1fa70, 0x1fa74 },
134-
{ 0x1fa78, 0x1fa7c },
135-
{ 0x1fa80, 0x1fa86 },
136-
{ 0x1fa90, 0x1faac },
137-
{ 0x1fab0, 0x1faba },
138-
{ 0x1fac0, 0x1fac5 },
139-
{ 0x1fad0, 0x1fad9 },
140-
{ 0x1fae0, 0x1fae7 },
141-
{ 0x1faf0, 0x1faf6 },
134+
{ 0x1fa70, 0x1fa7c },
135+
{ 0x1fa80, 0x1fa88 },
136+
{ 0x1fa90, 0x1fabd },
137+
{ 0x1fabf, 0x1fac5 },
138+
{ 0x1face, 0x1fadb },
139+
{ 0x1fae0, 0x1fae8 },
140+
{ 0x1faf0, 0x1faf8 },
142141
{ 0x20000, 0x2fffd },
143142
{ 0x30000, 0x3fffd },
144143
};
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
--TEST--
2+
mbstring Unicode Data tests
3+
--EXTENSIONS--
4+
mbstring
5+
--FILE--
6+
<?php
7+
8+
print "ASCII (PHP): " . mb_strwidth('PHP', 'UTF-8') . "\n";
9+
10+
print "Vietnamese (Xin chào): " . mb_strwidth('Xin chào', 'UTF-8') . "\n";
11+
12+
print "Traditional Chinese (你好): " . mb_strwidth('你好', 'UTF-8') . "\n";
13+
14+
print "Sinhalese (අයේෂ්): " . mb_strwidth('අයේෂ්', 'UTF-8') . "\n";
15+
16+
print "Emoji (\u{1F418}): " . mb_strwidth("\u{1F418}", 'UTF-8') . "\n";
17+
18+
// New in Unicode 15.0, width=2
19+
print "Emoji (\u{1F6DC}): " . mb_strwidth("\u{1F6DC}", 'UTF-8') . "\n";
20+
21+
?>
22+
--EXPECT--
23+
ASCII (PHP): 3
24+
Vietnamese (Xin chào): 8
25+
Traditional Chinese (你好): 4
26+
Sinhalese (අයේෂ්): 5
27+
Emoji (🐘): 2
28+
Emoji (🛜): 2

0 commit comments

Comments
 (0)