Description
There are several ways to speed up rustc by changing its build configuration, without changing its code: use a single codegen unit (CGU), profile-guided optimization (PGO), link-time optimization (LTO), post-link optimization (via BOLT), and using a better allocator (e.g. jemalloc or mimalloc).
This is a tracking issue for doing these for the most popular Tier 1 platforms: Linux64 (x86_64-unknown-linux-gnu
), Win64 (x86_64-pc-windows-msvc
), and Mac (x86_64-apple-darwin
, and more recently aarch64-apple-darwin
).
Items marked with [2022] are on the Compiler performance roadmap for 2022.
Single CGU
Benefits: rustc is faster, uses less memory, has a smaller binary.
Costs: rustc takes longer to build.
- Linux x64: Build
rustc
with a single CGU on x64 Linux #115554, merged 2023-10-01. - Linux AArch64
- Win64: Build
rustc
with 1CGU onx86_64-pc-windows-msvc
#112267, merged 2024-03-12. - Mac:
- intel: Build
rustc
with 1CGU onx86_64-apple-darwin
#112268, merged 2024-03-12. - aarch64: build
rustc
with 1 CGU onaarch64-apple-darwin
#133747, merged 2024-12-03.
- intel: Build
PGO
Benefits: rustc is faster.
Costs: rustc takes longer to build.
- Linux x64: Utilize PGO for rustc linux dist builds #80262, merged 2020-12-23.
- Linux AArch64: ci: Enable opt-dist for dist-aarch64-linux builds #133807, merged 2025-01-15
- Win64 [2022]: Utilize PGO for windows x64 rustc dist builds #96978, merged 2022-07-12.
- Mac [2022]:
- Problems with symbols not being matched correctly in PGO profiles.
Other PGO attempts:
- Call-site aware PGO for LLVM: Use CS PGO for LLVM #111806, no speed-up measured, seems like its benefits are superseded by BOLT.
- PGO for
libstd
: Apply PGO to libstd on CI #97038, no speed-up measured.
LTO
Benefits: rustc is faster.
Costs: rustc takes longer to build.
- Linux x64:
- rustc front-end: Enable LTO for rustc_driver.so #101403, merged 2022-10-24.
- LLVM: done some time ago.
- Linux AArch64: ci: Enable opt-dist for dist-aarch64-linux builds #133807
- Win64:
- rustc front-end: Enable ThinLTO for rustc on x64 msvc #103591, merged 2022-12-11. Caused a miscompilation and was reverted on 2023-03-14.
- LLVM [2022]: currently statically linked, which prevents LTO, but this could be changed
- Mac:
- rustc front-end: Enable ThinLTO for rustc on
x86_64-apple-darwin
#103647 and Re-enable ThinLTO for rustc onx86_64-apple-darwin
#105845, merged 2022-12-19. - LLVM [2022]: currently statically linked.
- rustc front-end: Enable ThinLTO for rustc on
This is all thin LTO, which gets most of the benefits of fat LTO with a much lower link-time cost.
Other LTO attempts:
- LTO for
rustdoc
: [perftest] Use LTO for compilingrustdoc
#102885, no speed-up measured. - Fat LTO: Use fat LTO for compiling
rustc
#103453, no speed-up measured, large CI build cost.
BOLT
Benefits: rustc is faster.
Costs: rustc takes longer to build.
- Linux x64:
- rustc front-end: Optimize
librustc_driver.so
with BOLT #116352, merged 2023-10-14. - LLVM: Use BOLT in CI to optimize LLVM #94381, merged 2022-10-10.
- rustc front-end: Optimize
- Linux AArch64: waiting for BOLT bugs to be fixed on ARM
- Win64: N/A
- Mac: N/A
Bolt only works on ELF binaries, and thus is Linux-only.
Instruction set
Benefits: rustc is faster?
Costs: rustc won't run on old CPUs.
- x86_64: Update to v2/v3/APX sometime in the future. So far, the perf. wins haven't been convincing enough to upgrade, because it will reduce compatibility for older CPUs. Some perf. results can be found here.
Linker
Benefits: rustc (linking) is faster.
Costs: hard to get working.
- using
lld
by default onx86_64-unknown-linux-gnu
:- on nightly: Enable
rust-lld
on nightlyx86_64-unknown-linux-gnu
#124129, merged 2024-05-17 - on stable
- on nightly: Enable
Better allocator
Benefits: rustc (linking) is faster.
Costs: rustc uses more memory?
- Linux64: jemalloc, done some time ago.
- Win64 [2022]
- Mac: jemalloc, done some time ago.
Note: #92249 and #92317 tried using two different versions of mimalloc (one 1.7-based, one 2.0-based) instead of jemalloc, but the speed/memory tradeoffs in both cases were deemed inferior (the max-rss regressions expected to be fixed in the 2.x series still exist as of 2.0.6, see #103944).
Note: we use a better allocator by simply overriding malloc/free, rather than using #[global_allocator]
. See this Zulip thread for some discussion about the sub-optimality of this.
About tracking issues
Tracking issues are used to record the overall progress of implementation.
They are also used as hubs connecting to other relevant issues, e.g., bugs or open design questions.
A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature.
Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.