-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Optimize stripos #7852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize stripos #7852
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
75f3258
to
23c4443
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this improves some cases, it will regress others. For large strings this makes the search quadratic, while the previous implementation is linear (if you will allow you me the imprecision).
@nikic Yeah, right now it would be quadratic in the worst-case scenario (something like |
It should be linear now. Performance is still approximately the same for me. Let me know if you think the add complexity is worth it. |
I might be late to the party, but I've tried to improve this with 2 concurrent searches, somewhat close to what you did in the last commit, but I might have slightly better performance overall, and it doesn't dip with modified test: $haystack = str_repeat('A', 1e+6) . "BCB";
$haystack = str_repeat($haystack, 10) . 'BBB';
//...
stripos($haystack, 'bbb'); Take a look if you will: https://gist.github.com/KapitanOczywisty/5c8c053ceb3ba687cbab41266120dac6 Also I think that more complex algorithm should be used with |
This is certainly an interesting approach (thanks, @iluuu1994!), but I still have doubts that the performance gain outweighs the added complexity. Is stripos() used often with long haystacks? And even if, that code still could use PCRE instead (i.e. document that in the manual). I also wonder about the performance of something like Anyhow, if we do this optimization, we also should consider doing it for |
@cmb69 @KapitanOczywisty's version doesn't look too complicated. I'll check if the same can be applied to
Yes. Although the same applies for |
2ab84b5
to
acad59f
Compare
@KapitanOczywisty Thanks for your implementation. There was a small bug here (only @cmb69 It works fine for I'll create the benchmarks next. |
@iluuu1994 This was fun challenge, since I'm not really working with C and glad I could help.
Yeah, I wasn't sure about that part so I left it in the code for people smarter than me to clean up :) |
Posted the results in the wrong issue. #7847 (comment) Anyway, I think this is worth merging because all cases seem to be faster, and the implementation doesn't seem terribly complex. |
Closes phpGH-7847 Closes phpGH-7852 Previously stripos/stristr would lowercase both the haystack and the needle to reuse strpos. The approach in this PR is similar to strpos. memchr is highly optimized so we're using it to search for the first character of the needle in the haystack. If we find it we compare the remaining characters of the needle manually. The new implementation seems to perform about half as well as strpos (as two memchr calls are necessary to find the next candidate).
acad59f
to
f311aff
Compare
Any objections to merging this? |
|
||
const char first_lower = zend_tolower_ascii(*needle); | ||
const char first_upper = zend_toupper_ascii(*needle); | ||
const char *p_lower = (const char *)memchr(haystack, first_lower, end - haystack); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future scope: I think there may be a way to optimize this with SSE for really long strings, but it may not be worth it, especially with stripos not being used that often.
(both checking if the first character exists case-insensitively, and checking if long needles match)
Definitely not something to add to this PR.
(*needle_byte ^ *haystack_byte) == *needle_mask
, where needle_mask is 0xff or 0xcf (~0x20), since 'a' ^ 'A'
is 0x20)
For a block of 8/16/32 bytes, that could be checked by:
- Computing the needle and mask and extending it to 8 bytes
- Bitwise xoring the needle block with the haystack mask
- Masking the xored block (0xff or 0xcf, depending on whether this was an alphabet character.
0xff ^ first_lower ^ first_upper
) - Checking for 0 bytes in the mask
- Combining to a word
- Checking if the combination was non-zero
- Searching in that block normally for the first character
This would make the worst case of case-insensitive search faster (stripos(str_repeat('A', 1000000), 'a')
) by stopping at the first lowercase or uppercase byte . The memchr could also start at whatever that result was if needle_len > 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know nothing about SSE, but as you mentioned, the new implementation should already be much better for large strings, we can always improve it at a later point.
Closes phpGH-7847 Closes phpGH-7852 Previously stripos/stristr would lowercase both the haystack and the needle to reuse strpos. The approach in this PR is similar to strpos. memchr is highly optimized so we're using it to search for the first character of the needle in the haystack. If we find it we compare the remaining characters of the needle manually. The new implementation seems to perform about half as well as strpos (as two memchr calls are necessary to find the next candidate).
f311aff
to
70bd296
Compare
I decided to actually alter |
348297d
to
1b800b9
Compare
Closes GH-7847
Closes GH-7852
Previously stripos would lowercase both the haystack and needle to reuse
strpos. The approach in this PR is similar to strpos. memchr is highly
optimized so we're using it to search for the first character of the
needle in the haystack. If we find it we compare the remaining
characters of the needle manually.
The new implementation seems to perform about half as well as strpos (as
two memchr calls are necessary).
@cmb69 Thoughts?