-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Imply UTF8 validity in explode function #10805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
03ff708
to
7c70cd0
Compare
9f08029
to
856dac2
Compare
Any idea why for the following test case:
the result is marked as NOT valid UTF-8 string? The real result is an array with single character: https://3v4l.org/oRGNM/rfc#vgit.master, maybe due some interning? |
Both the delimiter and string don't have the "valid UTF8" flag. Using |
If the delimiter is valid UTF-8 and the string is valid as well, the result "is the same but has the delimiter cut away", ie. leading to a valid UTF-8 string. What do you exactly mean? |
Print the flags of the delimiter and the input string and you'll see they both don't have the UTF-8 valid flag. They were interned by the compiler and the compiler doesn't set that flag. Trying to add that flag using mbstring wouldn't work either because a flag can't be added once they're interned. |
Thank you for the hint. I created #10853 feature request, as the check is quite cheap, all interned/compiled strings should be checked upfront to utilize opcache. |
You're welcome. I think Alex was going to look into something like this. I'm not sure anymore but I thought I saw him say something like that in a comment on a PR some time ago... |
maybe later |
like #10780, needs #10853/#10870
when valid UTF-8 string is exploded by valid UTF-8 delimiter, the result is always valid UTF-8 string
invalid UTF-8 string might became valid, this case requires check on the whole string, thus cannot be implied from the source