Description
Cachegrind profiles indicate that the Rust compiler often spends 3-6% of its executed instructions within memcpy
(specifically __memcpy_avx_unaligned_erms
on my Linux box), which is pretty incredible.
I have modified DHAT to track memcpy
/memmove
calls and have discovered that a lot are caused by obligation types, such as PendingPredicateObligations
and PendingObligations
, which are quite large (160 bytes and 136 bytes respectively on my Linux64 machine).
For example, for the keccak
benchmark, 33% of the copied bytes occur in the swap
call in the compress
function:
rust/src/librustc_data_structures/obligation_forest/mod.rs
Lines 607 to 620 in a6624ed
For serde
, 11% of the copied bytes occur constructing this vector of obligations:
Lines 150 to 157 in a6624ed
and 5% occur appending to this vector of obligations:
rust/src/librustc/traits/project.rs
Lines 570 to 574 in ac21131
It also looks like some functions such as FulfillmentContext::register_predicate_obligation()
might be passed a PredicateObligation
by value (using a memcpy
) rather than by reference, though I'm not sure about that.
I have some ideas to shrink these types a little, and improve how they're used, but these changes will be tinkering around the edges. It's possible that more fundamental changes to how the obligation system works could elicit bigger wins.