Skip to content

Poor optimization of thread local globals on OSX #41067

Open
@llvmbot

Description

@llvmbot
Bugzilla Link 41722
Version 8.0
OS MacOS X
Reporter LLVM Bugzilla Contributor
CC @TNorthover

Extended Description

Multiple calls to tlv_get_addr are (often) generated per usage of a thread local variable on OSX. This issue was discovered by looking at the assembly generated by rustc, and is discussed in more detail here:

rust-lang/rust#60341 (comment)

I know very little about llvm - so hopefully this all makes sense. The linked IR 1 demonstrates the issue. Often, the optimizer spits out IR which references thread_local globals multiple times when the unoptimized IR only references them once. Often associated with getelementptr.

In the final assembly the asm does the tlv_get_addr dance twice.

movq _foo@TLVP(%rip), %rdi
callq *(%rdi)

For larger structures with multiple members, the problem gets worse, resulting in many redundant calls to tlv_get_addr. In contrast, when targeting linux, __tls_get_addr@PLT, is only invoked once.

Maybe there's a good reason the address isn't cached on OSX, but I'm hoping there isn't :).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions