Closed
Description
The speedup obtained by the optimizations from #45333 which hit stable in 1.23 were mostly lost again from 1.25 onwards, because the compiler started emitting a branch instead of a cmov.
1.24
14 cmp qword ptr [rsi + 8*rax], rcx
15 cmova rax, r8
1.25
18 cmp qword ptr [rsi + 8*r9], rcx
19 ja .LBB0_7
20 mov r8, r9
21 .LBB0_7:
From godbolt: https://godbolt.org/z/DUDl-T
Bench comparison with 1.23 nightly (2017-11-20, 9 days after PR merge)
name old_1.23 ns/iter new_1.23 ns/iter diff ns/iter diff % speedup
binary_search_l1 70 28 -42 -60.00% x 2.50
binary_search_l1_with_dups 46 28 -18 -39.13% x 1.64
binary_search_l2 94 42 -52 -55.32% x 2.24
binary_search_l2_with_dups 68 42 -26 -38.24% x 1.62
binary_search_l3 267 234 -33 -12.36% x 1.14
binary_search_l3_with_dups 198 236 38 19.19% x 0.84
and status quo with 1.30 nightly (2018-08-15)
name old_1.30 ns/iter new_1.30 ns/iter diff ns/iter diff % speedup
binary_search_l1 75 68 -7 -9.33% x 1.10
binary_search_l1_with_dups 46 61 15 32.61% x 0.75
binary_search_l2 99 89 -10 -10.10% x 1.11
binary_search_l2_with_dups 71 89 18 25.35% x 0.80
binary_search_l3 272 245 -27 -9.93% x 1.11
binary_search_l3_with_dups 212 256 44 20.75% x 0.83
Metadata
Metadata
Assignees
Labels
Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Issue: Problems and improvements with respect to performance of generated code.Low priorityRelevant to the compiler team, which will review and decide on the PR/issue.Performance or correctness regression from one stable version to another.