Skip to content

PHP 8.0: Make stream wrapper and windows drive checks locale-independent #7820

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: PHP-8.0
Choose a base branch
from

Conversation

TysonAndre
Copy link
Contributor

  • Avoid registering/detecting stream wrappers in locale-independent ways.
  • Avoid locale dependence for Windows drive letter names in zend_virtual_cwd
  • Make parse_url stop depending on locale
    Related to https://bugs.php.net/bug.php?id=52923
    iscntrl is locale-dependent which seems to corrupt certain bytes.
  • Make FILTER_VALIDATE_HOSTNAME with flag FILTER_VALIDATE_DOMAIN locale-independent

Somewhat related to https://wiki.php.net/rfc/strtolower-ascii
but I don't think most of these should have been locale-dependent in the first
place - the code may not have considered locales

E.g. on Linux, setlocale(LC_ALL, 'de_DE');
(if the locale is installed and it succeeds)
will have some values for alpha/cntrl in the range 128-256 where the C locale
has no values.

To avoid this locale-dependence in older php versions,
applications can set setlocale(LC_CTYPE, 'C').

Use zend_bool since the switch to bool was not made in 7.3

Split out of https://github.com/php/php-src/pull/7802/files

@TysonAndre TysonAndre changed the base branch from master to PHP-7.3 December 23, 2021 14:54
@cmb69
Copy link
Member

cmb69 commented Dec 23, 2021

PHP-7.3 is EOL; even PHP-7.4 is open for security fixes only.

@TysonAndre TysonAndre force-pushed the PHP-7.3-fix-locale-dependence branch from 9768c2f to 15eb502 Compare December 23, 2021 15:23
@TysonAndre TysonAndre changed the base branch from PHP-7.3 to PHP-8.0 December 23, 2021 15:24
@TysonAndre TysonAndre changed the title PHP 7.3: Make stream wrapper and windows drive checks locale-independent PHP 8.0: Make stream wrapper and windows drive checks locale-independent Dec 23, 2021
@TysonAndre
Copy link
Contributor Author

PHP-7.3 is EOL; even PHP-7.4 is open for security fixes only.

I forgot about that and was assuming CONTRIBUTING.md was up to date. Updated, and created #7822 to update CONTRIBUTING.md

@cmb69
Copy link
Member

cmb69 commented Dec 23, 2021

@TysonAndre, can you please rebase your branch onto current php:PHP-8.0 and force-push, so CI doesn't get confused?

@TysonAndre TysonAndre force-pushed the PHP-7.3-fix-locale-dependence branch 2 times, most recently from aed35a9 to cdfd72c Compare December 24, 2021 00:04
Copy link
Member

@cmb69 cmb69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@@ -66,6 +66,47 @@ static const unsigned char tolower_map[256] = {

#define zend_tolower_ascii(c) (tolower_map[(unsigned char)(c)])

/* ctype's isalpha varies based on locale, which is not what we want for many use cases.
* This is what it'd be in the "C" locale. */
ZEND_API const bool zend_isalpha_map[256] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we use bool only as of PHP 8.1.0, so probably better:

Suggested change
ZEND_API const bool zend_isalpha_map[256] = {
ZEND_API const zend_bool zend_isalpha_map[256] = {

We might consider to use a "packed array" instead (i.e. char[32]).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, I don't think we need this map. An add and compare is sufficient to check isalpha.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, I don't think we need this map. An add and compare is sufficient to check isalpha.

and and compare? c &= ~0x20; return 'A' <= c && c <= 'Z'; in a static zend_always_inline helper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZEND_API bool zend_is_smart_branch(const zend_op *opline) /* {{{ */

We started adopting bool in php-src earlier than that, I think 8.1 or later was when php finished getting rid of zend_bool

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Girgias, can you clarify the usage of bool instead of zend_bool in PHP-8.0?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PHP 8.0 requires C99 so one can use bool in 8.0, but I'm not sure when the mass change from zend_bool to bool happened, so both were in usage when 8.0 was in dev.

};

/* ctype's isalnum is isalpha + isdigit(0-9) */
ZEND_API const bool zend_isalnum_map[256] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ZEND_API const bool zend_isalnum_map[256] = {
ZEND_API const zend_bool zend_isalnum_map[256] = {

@@ -414,6 +414,13 @@ ZEND_API int ZEND_FASTCALL string_compare_function(zval *op1, zval *op2);
ZEND_API int ZEND_FASTCALL string_case_compare_function(zval *op1, zval *op2);
ZEND_API int ZEND_FASTCALL string_locale_compare_function(zval *op1, zval *op2);

/* NOTE: The locale-independent alternatives to ctype(isalpha/isalnum) were added to fix bugs in php 7.3 patch releases, and should not be used externally until php 8.2 */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why shouldn't that be used externally?

Anyway:

Suggested change
/* NOTE: The locale-independent alternatives to ctype(isalpha/isalnum) were added to fix bugs in php 7.3 patch releases, and should not be used externally until php 8.2 */
/* NOTE: The locale-independent alternatives to ctype(isalpha/isalnum) were added to fix bugs in PHP 8.0 patch releases, and should not be used externally until php 8.2 */

Copy link
Contributor Author

@TysonAndre TysonAndre Dec 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some LTS php releases distributed by OS package managers might only distribute some patches but not others (e.g. ubuntu is shipping 7.4.3 with security fixes backported).

So if a PECL author targeted the latest patch versions it wouldn't work/wouldn't be installable in OS releases that didn't backport the bug fixes.

Comment on lines 7 to 14
if (!setlocale(LC_ALL, "de_DE")) {
print "skip Can't find german locale";
}
?>
--FILE--
<?php
var_dump(filter_var('٪', FILTER_VALIDATE_DOMAIN, FILTER_FLAG_HOSTNAME));
setlocale(LC_ALL, "de_DE");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To not unnecessarily skipt the test on Windows:

Suggested change
if (!setlocale(LC_ALL, "de_DE")) {
print "skip Can't find german locale";
}
?>
--FILE--
<?php
var_dump(filter_var('٪', FILTER_VALIDATE_DOMAIN, FILTER_FLAG_HOSTNAME));
setlocale(LC_ALL, "de_DE");
if (!setlocale(LC_ALL, "de_DE", "de-DE")) {
print "skip Can't find german locale";
}
?>
--FILE--
<?php
var_dump(filter_var('٪', FILTER_VALIDATE_DOMAIN, FILTER_FLAG_HOSTNAME));
setlocale(LC_ALL, "de_DE", "de-DE");

And maybe add some other locales, like "fr_FR" etc.

Copy link
Contributor Author

@TysonAndre TysonAndre Jan 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like trying to support too many locales would make this harder to reason about and maintain, e.g. if nobody ever tries this on Mac OS in fr_FR until a year from now and that turns out to actually not work.

(EDIT: it'd be locale-independent now, I mean turns out to actually not have previously been buggy)

@@ -0,0 +1,26 @@
--TEST--
Stream wrappers should not be locale dependent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test passes on Windows also without the patch for some reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the locale files might depend on OS too, for some reason, I remember seeing something about that while looking at other tests

- Avoid registering/detecting stream wrappers in locale-independent ways.
- Avoid locale dependence for Windows drive letter names in zend_virtual_cwd
- Make parse_url stop depending on locale
  Related to https://bugs.php.net/bug.php?id=52923
  iscntrl is locale-dependent which seems to corrupt certain bytes.
- Make FILTER_VALIDATE_HOSTNAME with flag FILTER_VALIDATE_DOMAIN locale-independent

Somewhat related to https://wiki.php.net/rfc/strtolower-ascii
but I don't think most of these should have been locale-dependent in the first
place - the code may not have considered locales

E.g. on Linux, `setlocale(LC_ALL, 'de_DE');`
(if the locale is installed and it succeeds)
will have some values for alpha/cntrl in the range 128-256 where the C locale
has no values.

To avoid this locale-dependence in older php versions,
applications can set `setlocale(LC_CTYPE, 'C')`.
@TysonAndre TysonAndre force-pushed the PHP-7.3-fix-locale-dependence branch from cdfd72c to 0e8d196 Compare December 28, 2021 16:36
@TysonAndre TysonAndre force-pushed the PHP-7.3-fix-locale-dependence branch from 03cd770 to b4b1bf5 Compare December 29, 2021 14:48
@TysonAndre
Copy link
Contributor Author

I think the non-optional review comments were addressed, this is ready for another look

@ramsey
Copy link
Member

ramsey commented May 27, 2022

Why does this target PHP-8.0 and not master? Does it address a bug?

@TysonAndre
Copy link
Contributor Author

Why does this target PHP-8.0 and not master? Does it address a bug?

The bug is that some characters are treated as possible drive names or stream wrapper uses that aren't, so paths can be parsed differently depending on the locale (e.g. regexes to prevent using stream wrappers may not work properly as a result)

E.g. on Linux, setlocale(LC_ALL, 'de_DE');
(if the locale is installed and it succeeds)
will have some values for alpha/cntrl alnum in the range 128-256 where the C locale
has no values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants