Description
The order in which code is located in binaries has an influence on how fast the binary executes because (as I understand it) it affects instruction cache locality and how efficiently the code is paged in from disk. Many linkers support specifying this order (e.g. LLD via --symbol-ordering-file
and MSVC via -ORDER
). The hard part, though, is to find an order that will actually improve things. The chromium project has a tool for thisand somewhere else I've read that valgrind could be used for this too. The expected speedups are a few percent.
Prerequisites:
- Support function instrumentation in
rustc
(if using the chromium tool) similar to what GCC's-finstrument-functions
does. - Compile an instrumented version of the compiler
- Run the instrumented version of the compiler for a realistic test program (this should be less sensitive than full PGO)
- Use the generated ordering file for building release artifacts
The first point shouldn't be too hard. The rest, however, would big a big infrastructure investment. I hope that we'll get PGO support for our CI at some point. This symbol ordering business could then be part of that.
cc @glandium @rust-lang/wg-compiler-performance @rust-lang/infra