-
Notifications
You must be signed in to change notification settings - Fork 7.9k
PHP 8.0: Make stream wrapper and windows drive checks locale-independent #7820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: PHP-8.0
Are you sure you want to change the base?
Conversation
PHP-7.3 is EOL; even PHP-7.4 is open for security fixes only. |
9768c2f
to
15eb502
Compare
I forgot about that and was assuming CONTRIBUTING.md was up to date. Updated, and created #7822 to update CONTRIBUTING.md |
@TysonAndre, can you please rebase your branch onto current php:PHP-8.0 and force-push, so CI doesn't get confused? |
aed35a9
to
cdfd72c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Zend/zend_operators.c
Outdated
@@ -66,6 +66,47 @@ static const unsigned char tolower_map[256] = { | |||
|
|||
#define zend_tolower_ascii(c) (tolower_map[(unsigned char)(c)]) | |||
|
|||
/* ctype's isalpha varies based on locale, which is not what we want for many use cases. | |||
* This is what it'd be in the "C" locale. */ | |||
ZEND_API const bool zend_isalpha_map[256] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we use bool
only as of PHP 8.1.0, so probably better:
ZEND_API const bool zend_isalpha_map[256] = { | |
ZEND_API const zend_bool zend_isalpha_map[256] = { |
We might consider to use a "packed array" instead (i.e. char[32]).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, I don't think we need this map. An add and compare is sufficient to check isalpha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, I don't think we need this map. An add and compare is sufficient to check isalpha.
and
and compare? c &= ~0x20; return 'A' <= c && c <= 'Z';
in a static zend_always_inline
helper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ZEND_API bool zend_is_smart_branch(const zend_op *opline) /* {{{ */
We started adopting bool
in php-src earlier than that, I think 8.1 or later was when php finished getting rid of zend_bool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Girgias, can you clarify the usage of bool
instead of zend_bool
in PHP-8.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PHP 8.0 requires C99 so one can use bool
in 8.0, but I'm not sure when the mass change from zend_bool
to bool
happened, so both were in usage when 8.0 was in dev.
}; | ||
|
||
/* ctype's isalnum is isalpha + isdigit(0-9) */ | ||
ZEND_API const bool zend_isalnum_map[256] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ZEND_API const bool zend_isalnum_map[256] = { | |
ZEND_API const zend_bool zend_isalnum_map[256] = { |
Zend/zend_operators.h
Outdated
@@ -414,6 +414,13 @@ ZEND_API int ZEND_FASTCALL string_compare_function(zval *op1, zval *op2); | |||
ZEND_API int ZEND_FASTCALL string_case_compare_function(zval *op1, zval *op2); | |||
ZEND_API int ZEND_FASTCALL string_locale_compare_function(zval *op1, zval *op2); | |||
|
|||
/* NOTE: The locale-independent alternatives to ctype(isalpha/isalnum) were added to fix bugs in php 7.3 patch releases, and should not be used externally until php 8.2 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why shouldn't that be used externally?
Anyway:
/* NOTE: The locale-independent alternatives to ctype(isalpha/isalnum) were added to fix bugs in php 7.3 patch releases, and should not be used externally until php 8.2 */ | |
/* NOTE: The locale-independent alternatives to ctype(isalpha/isalnum) were added to fix bugs in PHP 8.0 patch releases, and should not be used externally until php 8.2 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some LTS php releases distributed by OS package managers might only distribute some patches but not others (e.g. ubuntu is shipping 7.4.3 with security fixes backported).
So if a PECL author targeted the latest patch versions it wouldn't work/wouldn't be installable in OS releases that didn't backport the bug fixes.
if (!setlocale(LC_ALL, "de_DE")) { | ||
print "skip Can't find german locale"; | ||
} | ||
?> | ||
--FILE-- | ||
<?php | ||
var_dump(filter_var('٪', FILTER_VALIDATE_DOMAIN, FILTER_FLAG_HOSTNAME)); | ||
setlocale(LC_ALL, "de_DE"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To not unnecessarily skipt the test on Windows:
if (!setlocale(LC_ALL, "de_DE")) { | |
print "skip Can't find german locale"; | |
} | |
?> | |
--FILE-- | |
<?php | |
var_dump(filter_var('٪', FILTER_VALIDATE_DOMAIN, FILTER_FLAG_HOSTNAME)); | |
setlocale(LC_ALL, "de_DE"); | |
if (!setlocale(LC_ALL, "de_DE", "de-DE")) { | |
print "skip Can't find german locale"; | |
} | |
?> | |
--FILE-- | |
<?php | |
var_dump(filter_var('٪', FILTER_VALIDATE_DOMAIN, FILTER_FLAG_HOSTNAME)); | |
setlocale(LC_ALL, "de_DE", "de-DE"); |
And maybe add some other locales, like "fr_FR" etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like trying to support too many locales would make this harder to reason about and maintain, e.g. if nobody ever tries this on Mac OS in fr_FR until a year from now and that turns out to actually not work.
(EDIT: it'd be locale-independent now, I mean turns out to actually not have previously been buggy)
@@ -0,0 +1,26 @@ | |||
--TEST-- | |||
Stream wrappers should not be locale dependent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test passes on Windows also without the patch for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the locale files might depend on OS too, for some reason, I remember seeing something about that while looking at other tests
- Avoid registering/detecting stream wrappers in locale-independent ways. - Avoid locale dependence for Windows drive letter names in zend_virtual_cwd - Make parse_url stop depending on locale Related to https://bugs.php.net/bug.php?id=52923 iscntrl is locale-dependent which seems to corrupt certain bytes. - Make FILTER_VALIDATE_HOSTNAME with flag FILTER_VALIDATE_DOMAIN locale-independent Somewhat related to https://wiki.php.net/rfc/strtolower-ascii but I don't think most of these should have been locale-dependent in the first place - the code may not have considered locales E.g. on Linux, `setlocale(LC_ALL, 'de_DE');` (if the locale is installed and it succeeds) will have some values for alpha/cntrl in the range 128-256 where the C locale has no values. To avoid this locale-dependence in older php versions, applications can set `setlocale(LC_CTYPE, 'C')`.
cdfd72c
to
0e8d196
Compare
03cd770
to
b4b1bf5
Compare
I think the non-optional review comments were addressed, this is ready for another look |
Why does this target PHP-8.0 and not master? Does it address a bug? |
The bug is that some characters are treated as possible drive names or stream wrapper uses that aren't, so paths can be parsed differently depending on the locale (e.g. regexes to prevent using stream wrappers may not work properly as a result)
|
Related to https://bugs.php.net/bug.php?id=52923
iscntrl is locale-dependent which seems to corrupt certain bytes.
Somewhat related to https://wiki.php.net/rfc/strtolower-ascii
but I don't think most of these should have been locale-dependent in the first
place - the code may not have considered locales
E.g. on Linux,
setlocale(LC_ALL, 'de_DE');
(if the locale is installed and it succeeds)
will have some values for alpha/cntrl in the range 128-256 where the C locale
has no values.
To avoid this locale-dependence in older php versions,
applications can set
setlocale(LC_CTYPE, 'C')
.Use
zend_bool
since the switch tobool
was not made in 7.3Split out of https://github.com/php/php-src/pull/7802/files