Skip to content

mb_check_encoding() returns true for incorrect but interpretable ISO-2022-JP byte sequences #10648

Closed
@pakutoma

Description

@pakutoma

Description

Since PHP 8.1, mb_check_encoding returns true for many incorrect but interpretable ISO-2022-JP (JIS) byte sequences.
For example, IETF RFC 1468, often referenced as the definition of ISO-2022-JP, says "the text must end in ASCII." https://datatracker.ietf.org/doc/html/rfc1468
This means that an ISO-2022-JP byte sequence must end with the escape sequence 0x1b 0x28 0x42 to switch to ASCII.
However, mb_check_encoding() returns true without the escape sequence in PHP 8.1 and later.

The documentation says it returns true when "valid", but what should mb_check_encoding return in such a case?
https://www.php.net/manual/en/function.mb-check-encoding.php

3v4l:
https://3v4l.org/9i19F

The following code:

<?php

$jis_bytes = '1b244224221b2842'; // 'あ' in ISO-2022-JP
$jis_bytes_without_esc = '1b24422422'; // 'あ' in ISO-2022-JP without escape sequence
var_dump(mb_check_encoding(hex2bin($jis_bytes), 'JIS'));
var_dump(mb_check_encoding(hex2bin($jis_bytes_without_esc), 'JIS'));

Resulted in this output:

bool(true)
bool(true)

But I expected this output instead:

bool(true)
bool(false)

PHP Version

PHP 8.1.16

Operating System

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions