Skip to content

Micro-optimize double comparison #11061

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 14, 2023
Merged

Conversation

nielsdos
Copy link
Member

@nielsdos nielsdos commented Apr 11, 2023

When using ZEND_NORMALIZE_BOOL(a - b) where a and b are doubles, this generates the following instruction sequence on x64:
subsd xmm0, xmm1
pxor xmm1, xmm1
comisd xmm0, xmm1
...

whereas if we use ZEND_THREEWAY_COMPARE we get two instructions less: ucomisd xmm0, xmm1

The only difference is that the threeway compare uses ucomisd instead of comisd. The difference is that it will cause a FP signal if a signaling NAN is used, but as far as I'm aware this doesn't matter for our use case.

Similarly, the amount of instructions on AArch64 is also quite a bit lower for this code compared to the old code.

Results

Using the benchmark https://gist.github.com/nielsdos/b36517d81a1af74d96baa3576c2b70df
I used hyperfine: hyperfine --runs 25 --warmup 3 './sapi/cli/php sort_double.php'
No extensions such as opcache used during benchmarking.

Before this patch

Time (mean ± σ): 255.5 ms ± 2.2 ms [User: 251.0 ms, System: 2.5 ms]
Range (min … max): 251.5 ms … 260.7 ms 25 runs

After this patch

Time (mean ± σ): 236.2 ms ± 2.8 ms [User: 228.9 ms, System: 5.0 ms]
Range (min … max): 231.5 ms … 242.7 ms 25 runs

When using ZEND_NORMALIZE_BOOL(a - b) where a and b are doubles, this
generates the following instruction sequence on x64:
subsd   xmm0, xmm1
pxor    xmm1, xmm1
comisd  xmm0, xmm1
...

whereas if we use ZEND_THREEWAY_COMPARE we get two instructions less:
ucomisd xmm0, xmm1

The only difference is that the threeway compare uses *u*comisd instead
of comisd. The difference is that it will cause a FP signal if a
signaling NAN is used, but as far as I'm aware this doesn't matter for
our use case.

Similarly, the amount of instructions on AArch64 is also quite a bit
lower for this code compared to the old code.

** Results **

Using the benchmark https://gist.github.com/nielsdos/b36517d81a1af74d96baa3576c2b70df
I used hyperfine: hyperfine --runs 25 --warmup 3 './sapi/cli/php sort_double.php'
No extensions such as opcache used during benchmarking.

BEFORE THIS PATCH
-----------------
  Time (mean ± σ):     255.5 ms ±   2.2 ms    [User: 251.0 ms, System: 2.5 ms]
  Range (min … max):   251.5 ms … 260.7 ms    25 runs

AFTER THIS PATCH
----------------
  Time (mean ± σ):     236.2 ms ±   2.8 ms    [User: 228.9 ms, System: 5.0 ms]
  Range (min … max):   231.5 ms … 242.7 ms    25 runs
Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, also the 3 way compare seems to make way more sense then the normalize bool macro, even without the micro optimizaton.

@nielsdos nielsdos merged commit a0476fd into php:master Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants