ext/bcmath: Use SIMD for trailing zero counts during conversion #14166

SakiTakamachi · 2024-05-07T13:34:52Z

benchmark: #14132

before

// 1
Time (mean ± σ):     589.3 ms ±  12.8 ms    [User: 584.2 ms, System: 4.3 ms]
Range (min … max):   574.5 ms … 614.4 ms    10 runs
 
// 2
Time (mean ± σ):     638.1 ms ±   9.2 ms    [User: 634.1 ms, System: 3.2 ms]
Range (min … max):   628.0 ms … 658.0 ms    10 runs
 
// 3
Time (mean ± σ):     713.0 ms ±   6.3 ms    [User: 709.0 ms, System: 3.2 ms]
Range (min … max):   704.6 ms … 724.2 ms    10 runs

Final state (after removing unnecessary code)

// 1
Time (mean ± σ):     566.5 ms ±   4.7 ms    [User: 563.4 ms, System: 2.4 ms]
Range (min … max):   558.6 ms … 572.7 ms    10 runs
 
// 2
Time (mean ± σ):     603.4 ms ±   6.1 ms    [User: 599.2 ms, System: 3.5 ms]
Range (min … max):   594.0 ms … 613.6 ms    10 runs
 
// 3
Time (mean ± σ):     583.3 ms ±   8.0 ms    [User: 579.3 ms, System: 3.3 ms]
Range (min … max):   568.0 ms … 595.1 ms    10 runs

after SIMD

// 1
Time (mean ± σ):     591.4 ms ±   7.6 ms    [User: 587.5 ms, System: 3.1 ms]
Range (min … max):   579.6 ms … 605.6 ms    10 runs
 
// 2
Time (mean ± σ):     650.3 ms ±   7.5 ms    [User: 644.8 ms, System: 4.7 ms]
Range (min … max):   642.0 ms … 667.1 ms    10 runs
 
// 3
Time (mean ± σ):     618.2 ms ±  14.3 ms    [User: 614.2 ms, System: 3.1 ms]
Range (min … max):   602.2 ms … 642.1 ms    10 runs

after UNEXPECTED

// 1
Time (mean ± σ):     572.0 ms ±   8.0 ms    [User: 567.5 ms, System: 3.7 ms]
Range (min … max):   560.7 ms … 581.6 ms    10 runs
 
// 2
Time (mean ± σ):     603.6 ms ±   7.0 ms    [User: 599.9 ms, System: 3.1 ms]
Range (min … max):   594.1 ms … 616.5 ms    10 runs
 
// 3
Time (mean ± σ):     584.8 ms ±  13.2 ms    [User: 580.2 ms, System: 3.8 ms]
Range (min … max):   574.2 ms … 615.4 ms    10 runs

FYI: without SIMD

// 1
Time (mean ± σ):     604.4 ms ±  23.9 ms    [User: 599.6 ms, System: 4.0 ms]
Range (min … max):   588.7 ms … 669.7 ms    10 runs
 
// 2
Time (mean ± σ):     658.2 ms ±  13.6 ms    [User: 654.8 ms, System: 2.7 ms]
Range (min … max):   644.0 ms … 692.3 ms    10 runs
 
// 3
Time (mean ± σ):     789.0 ms ±   8.6 ms    [User: 784.5 ms, System: 3.7 ms]
Range (min … max):   779.7 ms … 809.5 ms    10 runs

SakiTakamachi · 2024-05-07T13:53:44Z

ext/bcmath/libbcmath/src/str2num.c

+		if (EXPECTED(mask != 0xffff)) {
+			/* Move the pointer back and check each character in loop. */
+			str += sizeof(__m128i);
+			break;
+		}


I can also use code like the following, but a while loop has always been faster. This may be because the number of calculations increases by one.

return str + sizeof(__m128i) - __builtin_clz(~mask);

nielsdos

It feels a bit weird to optimize for something that (hopefully) shouldn't happen a lot. I see a slight performance decrease for benchmark 2, a small increase in bench 1 and a huge increase in bench 3. Do we think trailing zeros is common?
Note though that I am completely fine with removing the ineffective code and using UNEXPECTED.

nielsdos · 2024-05-07T17:55:06Z

ext/bcmath/libbcmath/src/str2num.c

+{
+	/* Check in bulk */
+#ifdef __SSE2__
+	const __m128i c_zero_repeat = _mm_set1_epi8((signed char) '0');


Casting this to signed char shouldn't be necessary.

nielsdos · 2024-05-07T17:55:39Z

ext/bcmath/libbcmath/src/str2num.c

@@ -76,6 +76,35 @@ static const char *bc_count_digits(const char *str, const char *end)
 	return str;
 }

+static inline const char *bc_skip_zero_reverse(const char *str, const char *end)


The argument names are swapped, which makes it very confusing.

SakiTakamachi · 2024-05-07T23:06:05Z

@nielsdos

Does this patch mean that Benchmark 2 is a bit slower in your environment?

Could it be that the speedup in my measurements with this patch is due to the use of UNEXPECTED and the removal of unnecessary code, and that SIMD has a negative effect on patch 2?
(I am concerned that, as you said before, measurements may be faster or slower immediately after compilation.)

(edit)

Or maybe the order of the measurements in the description is confusing? The first is before applying the patch, the second is the final state, and the rest are commit units, so you should compare the first and second.

SakiTakamachi · 2024-05-08T08:03:08Z

Do we think trailing zeros is common?

Trailing zeros are probably quite common given the use cases for BCMath, but 16 decimal digits is probably quite rare.

I've opened a PR on this as it improved performance in all cases in my environment, but if not, I wouldn't be picky about using SIMD here.

nielsdos · 2024-05-08T17:34:04Z

These are the results I'm getting:

Benchmark 1: ./sapi/cli/php 1.php
  Time (mean ± σ):     468.3 ms ±   9.7 ms    [User: 463.6 ms, System: 1.9 ms]
  Range (min … max):   457.7 ms … 486.1 ms    10 runs
 
Benchmark 2: ./sapi/cli/php_old 1.php
  Time (mean ± σ):     450.9 ms ±   3.6 ms    [User: 448.1 ms, System: 2.5 ms]
  Range (min … max):   446.3 ms … 457.0 ms    10 runs
 
Summary
  ./sapi/cli/php_old 1.php ran
    1.04 ± 0.02 times faster than ./sapi/cli/php 1.php

Benchmark 1: ./sapi/cli/php 2.php
  Time (mean ± σ):     535.1 ms ±  18.0 ms    [User: 531.7 ms, System: 2.8 ms]
  Range (min … max):   517.8 ms … 578.0 ms    10 runs
 
Benchmark 2: ./sapi/cli/php_old 2.php
  Time (mean ± σ):     527.9 ms ±  12.1 ms    [User: 525.9 ms, System: 1.5 ms]
  Range (min … max):   517.1 ms … 552.8 ms    10 runs
 
Summary
  ./sapi/cli/php_old 2.php ran
    1.01 ± 0.04 times faster than ./sapi/cli/php 2.php

Benchmark 1: ./sapi/cli/php 3.php
  Time (mean ± σ):     496.5 ms ±   8.1 ms    [User: 493.8 ms, System: 2.2 ms]
  Range (min … max):   490.5 ms … 515.1 ms    10 runs
 
Benchmark 2: ./sapi/cli/php_old 3.php
  Time (mean ± σ):     613.2 ms ±  19.1 ms    [User: 610.5 ms, System: 2.2 ms]
  Range (min … max):   602.9 ms … 666.7 ms    10 runs

Summary
  ./sapi/cli/php 3.php ran
    1.24 ± 0.04 times faster than ./sapi/cli/php_old 3.php

nielsdos

I'm fine with accepting this, the degradation for 2.php isn't severe and there are improvements for the other cases.
I'm fine with the argument names on second thought, but please remove the redundant case upon merging. Thanks.

SakiTakamachi · 2024-05-08T22:20:23Z

Thx, as #14180, I will prepare a more stable benchmark environment and try measuring again.

SakiTakamachi · 2024-05-09T00:00:31Z

@nielsdos
I also merged the latest master to this and compared with EC2.

master:

hyperfine "php 1.php" --warmup 10
Time (mean ± σ):     654.7 ms ±   3.0 ms    [User: 650.5 ms, System: 2.6 ms]
Range (min … max):   650.1 ms … 659.2 ms    10 runs

hyperfine "php 2.php" --warmup 10
Time (mean ± σ):     769.4 ms ±   5.4 ms    [User: 765.8 ms, System: 2.0 ms]
Range (min … max):   762.6 ms … 781.8 ms    10 runs

hyperfine "php 3.php" --warmup 10
Time (mean ± σ):     910.3 ms ±  13.3 ms    [User: 905.6 ms, System: 2.8 ms]
Range (min … max):   896.8 ms … 934.6 ms    10 runs

php old.php // my old bench
1.6298861503601
1.9048039913177
2.2188358306885

this branch:

hyperfine "php 1.php" --warmup 10
Time (mean ± σ):     643.6 ms ±   6.5 ms    [User: 638.7 ms, System: 3.2 ms]
Range (min … max):   637.0 ms … 656.7 ms    10 runs

hyperfine "php 2.php" --warmup 10
Time (mean ± σ):     749.7 ms ±   6.4 ms    [User: 745.4 ms, System: 2.4 ms]
Range (min … max):   742.3 ms … 766.6 ms    10 runs

hyperfine "php 3.php" --warmup 10
Time (mean ± σ):     684.7 ms ±  10.7 ms    [User: 680.8 ms, System: 2.5 ms]
Range (min … max):   673.4 ms … 707.8 ms    10 runs

php old.php // my old bench
1.5792031288147
1.8460278511047
1.6792199611664

SakiTakamachi · 2024-05-09T00:18:25Z

I removed the unnecessary cast and changed the variable name slightly. If the variable names are okay, merge this.

nielsdos · 2024-05-09T08:24:28Z

Scanner is written with double n

SakiTakamachi · 2024-05-09T09:00:50Z

Thanks, I didn't notice at all

SakiTakamachi added 3 commits May 7, 2024 21:57

use SIMD

364a698

use UNEXPECTED

570268d

Remove ineffective code

1189f4f

github-actions bot added the Extension: bcmath label May 7, 2024

Added comments

fc7f7cb

SakiTakamachi force-pushed the refactor_bcmath_str2num branch from e55e0e2 to fc7f7cb Compare May 7, 2024 13:44

SakiTakamachi commented May 7, 2024

View reviewed changes

SakiTakamachi marked this pull request as ready for review May 7, 2024 13:54

SakiTakamachi requested review from Girgias and nielsdos as code owners May 7, 2024 13:54

nielsdos requested changes May 7, 2024

View reviewed changes

nielsdos approved these changes May 8, 2024

View reviewed changes

Merge branch 'master' into refactor_bcmath_str2num

87e9d63

address comments

323e144

SakiTakamachi force-pushed the refactor_bcmath_str2num branch from 03bc6bb to 323e144 Compare May 9, 2024 00:23

typo

275abd0

nielsdos approved these changes May 9, 2024

View reviewed changes

SakiTakamachi merged commit 1a3d870 into php:master May 9, 2024
10 checks passed

SakiTakamachi deleted the refactor_bcmath_str2num branch May 9, 2024 10:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ext/bcmath: Use SIMD for trailing zero counts during conversion #14166

ext/bcmath: Use SIMD for trailing zero counts during conversion #14166

Uh oh!

SakiTakamachi commented May 7, 2024

Uh oh!

SakiTakamachi May 7, 2024

Uh oh!

nielsdos left a comment

Uh oh!

nielsdos May 7, 2024

Uh oh!

nielsdos May 7, 2024

Uh oh!

SakiTakamachi commented May 7, 2024 •

edited

Loading

Uh oh!

SakiTakamachi commented May 8, 2024

Uh oh!

nielsdos commented May 8, 2024

Uh oh!

nielsdos left a comment •

edited

Loading

Uh oh!

SakiTakamachi commented May 8, 2024

Uh oh!

SakiTakamachi commented May 9, 2024 •

edited

Loading

Uh oh!

SakiTakamachi commented May 9, 2024

Uh oh!

nielsdos commented May 9, 2024

Uh oh!

SakiTakamachi commented May 9, 2024

Uh oh!

Uh oh!

Uh oh!

ext/bcmath: Use SIMD for trailing zero counts during conversion #14166

ext/bcmath: Use SIMD for trailing zero counts during conversion #14166

Uh oh!

Conversation

SakiTakamachi commented May 7, 2024

before

Final state (after removing unnecessary code)

after SIMD

after UNEXPECTED

FYI: without SIMD

Uh oh!

SakiTakamachi May 7, 2024

Choose a reason for hiding this comment

Uh oh!

nielsdos left a comment

Choose a reason for hiding this comment

Uh oh!

nielsdos May 7, 2024

Choose a reason for hiding this comment

Uh oh!

nielsdos May 7, 2024

Choose a reason for hiding this comment

Uh oh!

SakiTakamachi commented May 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SakiTakamachi commented May 8, 2024

Uh oh!

nielsdos commented May 8, 2024

Uh oh!

nielsdos left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SakiTakamachi commented May 8, 2024

Uh oh!

SakiTakamachi commented May 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SakiTakamachi commented May 9, 2024

Uh oh!

nielsdos commented May 9, 2024

Uh oh!

SakiTakamachi commented May 9, 2024

Uh oh!

Uh oh!

Uh oh!

SakiTakamachi commented May 7, 2024 •

edited

Loading

nielsdos left a comment •

edited

Loading

SakiTakamachi commented May 9, 2024 •

edited

Loading