Micro-optimize double comparison #11061
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When using ZEND_NORMALIZE_BOOL(a - b) where a and b are doubles, this generates the following instruction sequence on x64:
subsd xmm0, xmm1
pxor xmm1, xmm1
comisd xmm0, xmm1
...
whereas if we use ZEND_THREEWAY_COMPARE we get two instructions less: ucomisd xmm0, xmm1
The only difference is that the threeway compare uses ucomisd instead of comisd. The difference is that it will cause a FP signal if a signaling NAN is used, but as far as I'm aware this doesn't matter for our use case.
Similarly, the amount of instructions on AArch64 is also quite a bit lower for this code compared to the old code.
Results
Using the benchmark https://gist.github.com/nielsdos/b36517d81a1af74d96baa3576c2b70df
I used hyperfine:
hyperfine --runs 25 --warmup 3 './sapi/cli/php sort_double.php'
No extensions such as opcache used during benchmarking.
Before this patch
Time (mean ± σ): 255.5 ms ± 2.2 ms [User: 251.0 ms, System: 2.5 ms]
Range (min … max): 251.5 ms … 260.7 ms 25 runs
After this patch
Time (mean ± σ): 236.2 ms ± 2.8 ms [User: 228.9 ms, System: 5.0 ms]
Range (min … max): 231.5 ms … 242.7 ms 25 runs