Skip to content

TLS lookups in libsyntax_pos are expensive #59718

Closed
@nnethercote

Description

@nnethercote

#59693 is a nice speed-up for rustc, reducing instruction counts by as much as 12%. #59693 (comment) shows that approximately half the speedup is from avoiding TLS lookups.

So I thought: what else is using TLS lookups? I did some profiling and found that syntax_pos::GLOBALS accounts for most of it. It has three pieces, symbol_interner, hygiene_data, span_interner. I did some profiling of the places where they are accessed via GLOBALS::with:

rustc:
791545069 counts:
(  1) 499029030 (63.0%, 63.0%):     symbol_interner
(  2) 181386140 (22.9%, 86.0%):     hygiene_data
(  3) 109861627 (13.9%, 99.8%):     span_interner

ripgrep:
5455319 counts:
(  1)  2819190 (51.7%, 51.7%):     symbol_interner
(  2)  2015746 (37.0%, 88.6%):     hygiene_data
(  3)   599975 (11.0%, 99.6%):     span_interner

style-servo
79839701 counts:
(  1) 36436621 (45.6%, 45.6%):     hygiene_data
(  2) 31539114 (39.5%, 85.1%):     symbol_interner
(  3) 11562409 (14.5%, 99.6%):     span_interner

webrender
27006839 counts:
(  1) 11021232 (40.8%, 40.8%):     hygiene_data
(  2)  9218693 (34.1%, 74.9%):     symbol_interner
(  3)  6707365 (24.8%, 99.8%):     span_interner

These measurements are from a rustc that didn't have #59693's change applied, which avoids almost all of the span_interner accesses. And those accesses were only 11.0-24.8% of the syntax_pos::GLOBALS accesses. In other words, if we could eliminate most or all of the hygiene_data and symbol_interner accesses, we'd get even bigger wins than what we saw in #59693.

I admit that I don't understand how syntax_pos::GLOBALS works, why the TLS reference is needed for a global value.

One possible idea is to increase the size of Symbol from 4 bytes to 8 bytes, and then store short symbols (7 bytes or less) inline. Some preliminary profiling suggests this could capture roughly half of the symbols. hygiene_data is a harder nut to crack, being a more complicated structure.

cc @rust-lang/wg-compiler-performance

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-thread-localsArea: Thread local storage (TLS)C-enhancementCategory: An issue proposing an enhancement or a PR with one.I-compiletimeIssue: Problems and improvements with respect to compile times.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions