Description
#59693 is a nice speed-up for rustc, reducing instruction counts by as much as 12%. #59693 (comment) shows that approximately half the speedup is from avoiding TLS lookups.
So I thought: what else is using TLS lookups? I did some profiling and found that syntax_pos::GLOBALS
accounts for most of it. It has three pieces, symbol_interner
, hygiene_data
, span_interner
. I did some profiling of the places where they are accessed via GLOBALS::with
:
rustc:
791545069 counts:
( 1) 499029030 (63.0%, 63.0%): symbol_interner
( 2) 181386140 (22.9%, 86.0%): hygiene_data
( 3) 109861627 (13.9%, 99.8%): span_interner
ripgrep:
5455319 counts:
( 1) 2819190 (51.7%, 51.7%): symbol_interner
( 2) 2015746 (37.0%, 88.6%): hygiene_data
( 3) 599975 (11.0%, 99.6%): span_interner
style-servo
79839701 counts:
( 1) 36436621 (45.6%, 45.6%): hygiene_data
( 2) 31539114 (39.5%, 85.1%): symbol_interner
( 3) 11562409 (14.5%, 99.6%): span_interner
webrender
27006839 counts:
( 1) 11021232 (40.8%, 40.8%): hygiene_data
( 2) 9218693 (34.1%, 74.9%): symbol_interner
( 3) 6707365 (24.8%, 99.8%): span_interner
These measurements are from a rustc that didn't have #59693's change applied, which avoids almost all of the span_interner
accesses. And those accesses were only 11.0-24.8% of the syntax_pos::GLOBALS
accesses. In other words, if we could eliminate most or all of the hygiene_data
and symbol_interner
accesses, we'd get even bigger wins than what we saw in #59693.
I admit that I don't understand how syntax_pos::GLOBALS
works, why the TLS reference is needed for a global value.
One possible idea is to increase the size of Symbol
from 4 bytes to 8 bytes, and then store short symbols (7 bytes or less) inline. Some preliminary profiling suggests this could capture roughly half of the symbols. hygiene_data
is a harder nut to crack, being a more complicated structure.
cc @rust-lang/wg-compiler-performance