Description
Description
Since PHP 8.1, mb_check_encoding returns true for many incorrect but interpretable ISO-2022-JP (JIS) byte sequences.
For example, IETF RFC 1468, often referenced as the definition of ISO-2022-JP, says "the text must end in ASCII." https://datatracker.ietf.org/doc/html/rfc1468
This means that an ISO-2022-JP byte sequence must end with the escape sequence 0x1b 0x28 0x42 to switch to ASCII.
However, mb_check_encoding() returns true without the escape sequence in PHP 8.1 and later.
The documentation says it returns true when "valid", but what should mb_check_encoding return in such a case?
https://www.php.net/manual/en/function.mb-check-encoding.php
3v4l:
https://3v4l.org/9i19F
The following code:
<?php
$jis_bytes = '1b244224221b2842'; // 'あ' in ISO-2022-JP
$jis_bytes_without_esc = '1b24422422'; // 'あ' in ISO-2022-JP without escape sequence
var_dump(mb_check_encoding(hex2bin($jis_bytes), 'JIS'));
var_dump(mb_check_encoding(hex2bin($jis_bytes_without_esc), 'JIS'));
Resulted in this output:
bool(true)
bool(true)
But I expected this output instead:
bool(true)
bool(false)
PHP Version
PHP 8.1.16
Operating System
No response