You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[libc++] Optimize vector push_back to avoid continuous load and store of end pointer
Credits: this change is based on analysis and a proof of concept by
[email protected].
Before, the compiler loses track of end as 'this' and other references
possibly escape beyond the compiler's scope. This can be see in the
generated assembly:
16.28 │200c80: mov %r15d,(%rax)
60.87 │200c83: add $0x4,%rax
│200c87: mov %rax,-0x38(%rbp)
0.03 │200c8b: → jmpq 200d4e
...
...
1.69 │200d4e: cmp %r15d,%r12d
│200d51: → je 200c40
16.34 │200d57: inc %r15d
0.05 │200d5a: mov -0x38(%rbp),%rax
3.27 │200d5e: mov -0x30(%rbp),%r13
1.47 │200d62: cmp %r13,%rax
│200d65: → jne 200c80
We fix this by always explicitly storing the loaded local and pointer
back at the end of push back. This generates some slight source 'noise',
but creates nice and compact fast path code, i.e.:
32.64 │200760: mov %r14d,(%r12)
9.97 │200764: add $0x4,%r12
6.97 │200768: mov %r12,-0x38(%rbp)
32.17 │20076c: add $0x1,%r14d
2.36 │200770: cmp %r14d,%ebx
│200773: → je 200730
8.98 │200775: mov -0x30(%rbp),%r13
6.75 │200779: cmp %r13,%r12
│20077c: → jne 200760
Now there is a single store for the push_back value (as before), and a
single store for the end without a reload (dependency).
For fully local vectors, (i.e., not referenced elsewhere), the capacity
load and store inside the loop could also be removed, but this requires
more substantial refactoring inside vector.
Differential Revision: https://reviews.llvm.org/D80588
0 commit comments