Use locale-independent alternatives to isalpha/isalnum/isctrl #7802
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Avoid registering/detecting stream wrappers in locale-independent ways. (low importance, hopefully, since non-ascii wrappers would still have to be registered by an application to be used. This would just result in notices in some locales and typically failing attempts to read files in others)
Avoid this in https://www.php.net/manual/en/function.finfo-file.php for libmagic for detecting magic file headers.
I don't believe these should be locale dependent - note that ext/fileinfo code is only used in https://www.php.net/manual/en/function.finfo-file.php and other modules are unaffected
Avoid locale dependence for http/ftp/network protocols.
Avoid locale dependence for Windows drive letter names in zend_virtual_cwd
Make parse_url stop depending on locale
Related to https://bugs.php.net/bug.php?id=52923
iscntrl is locale-dependent which seems to corrupt certain bytes of utf-8 codepoints in the original bug report when replacing control characters with
_
(not completely sure)Make ini file line parsing locale-independent with respect to these functions
mb_send_mail is acting on individual bytes, so this seems like it would corrupt the
To
headerother changes
filter_var FILTER_VALIDATE_DOMAIN in ext/filter/logical_filters.c
TODO: isprint is also different in de_DE - but use in ext/fileinfo is limited to creating human-readable strings. Maybe this should always octally escape things that are non-printable in the C locale, if this assumes multi-byte codepoint representations such as utf-8 is used to display in many locales
Somewhat related to https://wiki.php.net/rfc/strtolower-ascii
but I don't think most of these should have been locale-dependent in the first
place - the code may not have considered locales
Some of these seem like bugs that should be fixed in earlier php versions as well
Note that this affects the non-utf-8 (unicode codepoints represented as single-byte) versions of some locales, e.g. 'de_DE' but not 'de_DE.UTF-8'
E.g. on Linux,
setlocale(LC_ALL, 'de_DE');
(if the locale is installed and it succeeds)
will have some values for alpha/cntrl in the range 128-256 where the C locale
has no values.
To avoid this locale-dependence in older php versions,
applications can set
setlocale(LC_CTYPE, 'C')
.